Related
I need to do some parsing of large (5-10 Gb)logfiles in Javascript/Node.js (I'm using Cube).
The logline looks something like:
10:00:43.343423 I'm a friendly log message. There are 5 cats, and 7 dogs. We are in state "SUCCESS".
We need to read each line, do some parsing (e.g. strip out 5, 7 and SUCCESS), then pump this data into Cube (https://github.com/square/cube) using their JS client.
Firstly, what is the canonical way in Node to read in a file, line by line?
It seems to be fairly common question online:
http://www.quora.com/What-is-the-best-way-to-read-a-file-line-by-line-in-node-js
Read a file one line at a time in node.js?
A lot of the answers seem to point to a bunch of third-party modules:
https://github.com/nickewing/line-reader
https://github.com/jahewson/node-byline
https://github.com/pkrumins/node-lazy
https://github.com/Gagle/Node-BufferedReader
However, this seems like a fairly basic task - surely, there's a simple way within the stdlib to read in a textfile, line-by-line?
Secondly, I then need to process each line (e.g. convert the timestamp into a Date object, and extract useful fields).
What's the best way to do this, maximising throughput? Is there some way that won't block on either reading in each line, or on sending it to Cube?
Thirdly - I'm guessing using string splits, and the JS equivalent of contains (IndexOf != -1?) will be a lot faster than regexes? Has anybody had much experience in parsing massive amounts of text data in Node.js?
I searched for a solution to parse very large files (gbs) line by line using a stream. All the third-party libraries and examples did not suit my needs since they processed the files not line by line (like 1 , 2 , 3 , 4 ..) or read the entire file to memory
The following solution can parse very large files, line by line using stream & pipe. For testing I used a 2.1 gb file with 17.000.000 records. Ram usage did not exceed 60 mb.
First, install the event-stream package:
npm install event-stream
Then:
var fs = require('fs')
, es = require('event-stream');
var lineNr = 0;
var s = fs.createReadStream('very-large-file.csv')
.pipe(es.split())
.pipe(es.mapSync(function(line){
// pause the readstream
s.pause();
lineNr += 1;
// process line here and call s.resume() when rdy
// function below was for logging memory usage
logMemoryUsage(lineNr);
// resume the readstream, possibly from a callback
s.resume();
})
.on('error', function(err){
console.log('Error while reading file.', err);
})
.on('end', function(){
console.log('Read entire file.')
})
);
Please let me know how it goes!
You can use the inbuilt readline package, see docs here. I use stream to create a new output stream.
var fs = require('fs'),
readline = require('readline'),
stream = require('stream');
var instream = fs.createReadStream('/path/to/file');
var outstream = new stream;
outstream.readable = true;
outstream.writable = true;
var rl = readline.createInterface({
input: instream,
output: outstream,
terminal: false
});
rl.on('line', function(line) {
console.log(line);
//Do your stuff ...
//Then write to output stream
rl.write(line);
});
Large files will take some time to process. Do tell if it works.
I really liked #gerard answer which is actually deserves to be the correct answer here. I made some improvements:
Code is in a class (modular)
Parsing is included
Ability to resume is given to the outside in case there is an asynchronous job is chained to reading the CSV like inserting to DB, or a HTTP request
Reading in chunks/batche sizes that
user can declare. I took care of encoding in the stream too, in case
you have files in different encoding.
Here's the code:
'use strict'
const fs = require('fs'),
util = require('util'),
stream = require('stream'),
es = require('event-stream'),
parse = require("csv-parse"),
iconv = require('iconv-lite');
class CSVReader {
constructor(filename, batchSize, columns) {
this.reader = fs.createReadStream(filename).pipe(iconv.decodeStream('utf8'))
this.batchSize = batchSize || 1000
this.lineNumber = 0
this.data = []
this.parseOptions = {delimiter: '\t', columns: true, escape: '/', relax: true}
}
read(callback) {
this.reader
.pipe(es.split())
.pipe(es.mapSync(line => {
++this.lineNumber
parse(line, this.parseOptions, (err, d) => {
this.data.push(d[0])
})
if (this.lineNumber % this.batchSize === 0) {
callback(this.data)
}
})
.on('error', function(){
console.log('Error while reading file.')
})
.on('end', function(){
console.log('Read entirefile.')
}))
}
continue () {
this.data = []
this.reader.resume()
}
}
module.exports = CSVReader
So basically, here is how you will use it:
let reader = CSVReader('path_to_file.csv')
reader.read(() => reader.continue())
I tested this with a 35GB CSV file and it worked for me and that's why I chose to build it on #gerard's answer, feedbacks are welcomed.
I used https://www.npmjs.com/package/line-by-line for reading more than 1 000 000 lines from a text file. In this case, an occupied capacity of RAM was about 50-60 megabyte.
const LineByLineReader = require('line-by-line'),
lr = new LineByLineReader('big_file.txt');
lr.on('error', function (err) {
// 'err' contains error object
});
lr.on('line', function (line) {
// pause emitting of lines...
lr.pause();
// ...do your asynchronous line processing..
setTimeout(function () {
// ...and continue emitting lines.
lr.resume();
}, 100);
});
lr.on('end', function () {
// All lines are read, file is closed now.
});
The Node.js Documentation offers a very elegant example using the Readline module.
Example: Read File Stream Line-by-Line
const { once } = require('node:events');
const fs = require('fs');
const readline = require('readline');
const rl = readline.createInterface({
input: fs.createReadStream('sample.txt'),
crlfDelay: Infinity
});
rl.on('line', (line) => {
console.log(`Line from file: ${line}`);
});
await once(rl, 'close');
Note: we use the crlfDelay option to recognize all instances of CR LF ('\r\n') as a single line break.
Apart from read the big file line by line, you also can read it chunk by chunk. For more refer to this article
var offset = 0;
var chunkSize = 2048;
var chunkBuffer = new Buffer(chunkSize);
var fp = fs.openSync('filepath', 'r');
var bytesRead = 0;
while(bytesRead = fs.readSync(fp, chunkBuffer, 0, chunkSize, offset)) {
offset += bytesRead;
var str = chunkBuffer.slice(0, bytesRead).toString();
var arr = str.split('\n');
if(bytesRead = chunkSize) {
// the last item of the arr may be not a full line, leave it to the next chunk
offset -= arr.pop().length;
}
lines.push(arr);
}
console.log(lines);
I had the same problem yet. After comparing several modules that seem to have this feature, I decided to do it myself, it's simpler than I thought.
gist: https://gist.github.com/deemstone/8279565
var fetchBlock = lineByline(filepath, onEnd);
fetchBlock(function(lines, start){ ... }); //lines{array} start{int} lines[0] No.
It cover the file opened in a closure, that fetchBlock() returned will fetch a block from the file, end split to array (will deal the segment from last fetch).
I've set the block size to 1024 for each read operation. This may have bugs, but code logic is obvious, try it yourself.
Reading / Writing files using stream with the native nodejs modules (fs, readline):
const fs = require('fs');
const readline = require('readline');
const rl = readline.createInterface({
input: fs.createReadStream('input.json'),
output: fs.createWriteStream('output.json')
});
rl.on('line', function(line) {
console.log(line);
// Do any 'line' processing if you want and then write to the output file
this.output.write(`${line}\n`);
});
rl.on('close', function() {
console.log(`Created "${this.output.path}"`);
});
Based on this questions answer I implemented a class you can use to read a file synchronously line-by-line with fs.readSync(). You can make this "pause" and "resume" by using a Q promise (jQuery seems to require a DOM so cant run it with nodejs):
var fs = require('fs');
var Q = require('q');
var lr = new LineReader(filenameToLoad);
lr.open();
var promise;
workOnLine = function () {
var line = lr.readNextLine();
promise = complexLineTransformation(line).then(
function() {console.log('ok');workOnLine();},
function() {console.log('error');}
);
}
workOnLine();
complexLineTransformation = function (line) {
var deferred = Q.defer();
// ... async call goes here, in callback: deferred.resolve('done ok'); or deferred.reject(new Error(error));
return deferred.promise;
}
function LineReader (filename) {
this.moreLinesAvailable = true;
this.fd = undefined;
this.bufferSize = 1024*1024;
this.buffer = new Buffer(this.bufferSize);
this.leftOver = '';
this.read = undefined;
this.idxStart = undefined;
this.idx = undefined;
this.lineNumber = 0;
this._bundleOfLines = [];
this.open = function() {
this.fd = fs.openSync(filename, 'r');
};
this.readNextLine = function () {
if (this._bundleOfLines.length === 0) {
this._readNextBundleOfLines();
}
this.lineNumber++;
var lineToReturn = this._bundleOfLines[0];
this._bundleOfLines.splice(0, 1); // remove first element (pos, howmany)
return lineToReturn;
};
this.getLineNumber = function() {
return this.lineNumber;
};
this._readNextBundleOfLines = function() {
var line = "";
while ((this.read = fs.readSync(this.fd, this.buffer, 0, this.bufferSize, null)) !== 0) { // read next bytes until end of file
this.leftOver += this.buffer.toString('utf8', 0, this.read); // append to leftOver
this.idxStart = 0
while ((this.idx = this.leftOver.indexOf("\n", this.idxStart)) !== -1) { // as long as there is a newline-char in leftOver
line = this.leftOver.substring(this.idxStart, this.idx);
this._bundleOfLines.push(line);
this.idxStart = this.idx + 1;
}
this.leftOver = this.leftOver.substring(this.idxStart);
if (line !== "") {
break;
}
}
};
}
node-byline uses streams, so i would prefer that one for your huge files.
for your date-conversions i would use moment.js.
for maximising your throughput you could think about using a software-cluster. there are some nice-modules which wrap the node-native cluster-module quite well. i like cluster-master from isaacs. e.g. you could create a cluster of x workers which all compute a file.
for benchmarking splits vs regexes use benchmark.js. i havent tested it until now. benchmark.js is available as a node-module
import * as csv from 'fast-csv';
import * as fs from 'fs';
interface Row {
[s: string]: string;
}
type RowCallBack = (data: Row, index: number) => object;
export class CSVReader {
protected file: string;
protected csvOptions = {
delimiter: ',',
headers: true,
ignoreEmpty: true,
trim: true
};
constructor(file: string, csvOptions = {}) {
if (!fs.existsSync(file)) {
throw new Error(`File ${file} not found.`);
}
this.file = file;
this.csvOptions = Object.assign({}, this.csvOptions, csvOptions);
}
public read(callback: RowCallBack): Promise < Array < object >> {
return new Promise < Array < object >> (resolve => {
const readStream = fs.createReadStream(this.file);
const results: Array < any > = [];
let index = 0;
const csvStream = csv.parse(this.csvOptions).on('data', async (data: Row) => {
index++;
results.push(await callback(data, index));
}).on('error', (err: Error) => {
console.error(err.message);
throw err;
}).on('end', () => {
resolve(results);
});
readStream.pipe(csvStream);
});
}
}
import { CSVReader } from '../src/helpers/CSVReader';
(async () => {
const reader = new CSVReader('./database/migrations/csv/users.csv');
const users = await reader.read(async data => {
return {
username: data.username,
name: data.name,
email: data.email,
cellPhone: data.cell_phone,
homePhone: data.home_phone,
roleId: data.role_id,
description: data.description,
state: data.state,
};
});
console.log(users);
})();
I have made a node module to read large file asynchronously text or JSON.
Tested on large files.
var fs = require('fs')
, util = require('util')
, stream = require('stream')
, es = require('event-stream');
module.exports = FileReader;
function FileReader(){
}
FileReader.prototype.read = function(pathToFile, callback){
var returnTxt = '';
var s = fs.createReadStream(pathToFile)
.pipe(es.split())
.pipe(es.mapSync(function(line){
// pause the readstream
s.pause();
//console.log('reading line: '+line);
returnTxt += line;
// resume the readstream, possibly from a callback
s.resume();
})
.on('error', function(){
console.log('Error while reading file.');
})
.on('end', function(){
console.log('Read entire file.');
callback(returnTxt);
})
);
};
FileReader.prototype.readJSON = function(pathToFile, callback){
try{
this.read(pathToFile, function(txt){callback(JSON.parse(txt));});
}
catch(err){
throw new Error('json file is not valid! '+err.stack);
}
};
Just save the file as file-reader.js, and use it like this:
var FileReader = require('./file-reader');
var fileReader = new FileReader();
fileReader.readJSON(__dirname + '/largeFile.json', function(jsonObj){/*callback logic here*/});
I am trying to read a large file one line at a time. I found a question on Quora that dealt with the subject but I'm missing some connections to make the whole thing fit together.
var Lazy=require("lazy");
new Lazy(process.stdin)
.lines
.forEach(
function(line) {
console.log(line.toString());
}
);
process.stdin.resume();
The bit that I'd like to figure out is how I might read one line at a time from a file instead of STDIN as in this sample.
I tried:
fs.open('./VeryBigFile.csv', 'r', '0666', Process);
function Process(err, fd) {
if (err) throw err;
// DO lazy read
}
but it's not working. I know that in a pinch I could fall back to using something like PHP, but I would like to figure this out.
I don't think the other answer would work as the file is much larger than the server I'm running it on has memory for.
Since Node.js v0.12 and as of Node.js v4.0.0, there is a stable readline core module. Here's the easiest way to read lines from a file, without any external modules:
const fs = require('fs');
const readline = require('readline');
async function processLineByLine() {
const fileStream = fs.createReadStream('input.txt');
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
// Note: we use the crlfDelay option to recognize all instances of CR LF
// ('\r\n') in input.txt as a single line break.
for await (const line of rl) {
// Each line in input.txt will be successively available here as `line`.
console.log(`Line from file: ${line}`);
}
}
processLineByLine();
Or alternatively:
var lineReader = require('readline').createInterface({
input: require('fs').createReadStream('file.in')
});
lineReader.on('line', function (line) {
console.log('Line from file:', line);
});
The last line is read correctly (as of Node v0.12 or later), even if there is no final \n.
UPDATE: this example has been added to Node's API official documentation.
For such a simple operation there shouldn't be any dependency on third-party modules. Go easy.
var fs = require('fs'),
readline = require('readline');
var rd = readline.createInterface({
input: fs.createReadStream('/path/to/file'),
output: process.stdout,
console: false
});
rd.on('line', function(line) {
console.log(line);
});
Update in 2019
An awesome example is already posted on official Nodejs documentation. here
This requires the latest Nodejs is installed on your machine. >11.4
const fs = require('fs');
const readline = require('readline');
async function processLineByLine() {
const fileStream = fs.createReadStream('input.txt');
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
// Note: we use the crlfDelay option to recognize all instances of CR LF
// ('\r\n') in input.txt as a single line break.
for await (const line of rl) {
// Each line in input.txt will be successively available here as `line`.
console.log(`Line from file: ${line}`);
}
}
processLineByLine();
You don't have to open the file, but instead, you have to create a ReadStream.
fs.createReadStream
Then pass that stream to Lazy
require('fs').readFileSync('file.txt', 'utf-8').split(/\r?\n/).forEach(function(line){
console.log(line);
})
there is a very nice module for reading a file line by line, it's called line-reader
with it you simply just write:
var lineReader = require('line-reader');
lineReader.eachLine('file.txt', function(line, last) {
console.log(line);
// do whatever you want with line...
if(last){
// or check if it's the last one
}
});
you can even iterate the file with a "java-style" interface, if you need more control:
lineReader.open('file.txt', function(reader) {
if (reader.hasNextLine()) {
reader.nextLine(function(line) {
console.log(line);
});
}
});
Old topic, but this works:
var rl = readline.createInterface({
input : fs.createReadStream('/path/file.txt'),
output: process.stdout,
terminal: false
})
rl.on('line',function(line){
console.log(line) //or parse line
})
Simple. No need for an external module.
You can always roll your own line reader. I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\n'
var last = "";
process.stdin.on('data', function(chunk) {
var lines, i;
lines = (last+chunk).split("\n");
for(i = 0; i < lines.length - 1; i++) {
console.log("line: " + lines[i]);
}
last = lines[i];
});
process.stdin.on('end', function() {
console.log("line: " + last);
});
process.stdin.resume();
I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash.
Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest .
With the carrier module:
var carrier = require('carrier');
process.stdin.resume();
carrier.carry(process.stdin, function(line) {
console.log('got one line: ' + line);
});
I ended up with a massive, massive memory leak using Lazy to read line by line when trying to then process those lines and write them to another stream due to the way drain/pause/resume in node works (see: http://elegantcode.com/2011/04/06/taking-baby-steps-with-node-js-pumping-data-between-streams/ (i love this guy btw)). I haven't looked closely enough at Lazy to understand exactly why, but I couldn't pause my read stream to allow for a drain without Lazy exiting.
I wrote the code to process massive csv files into xml docs, you can see the code here: https://github.com/j03m/node-csv2xml
If you run the previous revisions with Lazy line it leaks. The latest revision doesn't leak at all and you can probably use it as the basis for a reader/processor. Though I have some custom stuff in there.
Edit: I guess I should also note that my code with Lazy worked fine until I found myself writing large enough xml fragments that drain/pause/resume because a necessity. For smaller chunks it was fine.
In most cases this should be enough:
const fs = require("fs")
fs.readFile('./file', 'utf-8', (err, file) => {
const lines = file.split('\n')
for (let line of lines)
console.log(line)
});
Edit:
Use a transform stream.
With a BufferedReader you can read lines.
new BufferedReader ("lorem ipsum", { encoding: "utf8" })
.on ("error", function (error){
console.log ("error: " + error);
})
.on ("line", function (line){
console.log ("line: " + line);
})
.on ("end", function (){
console.log ("EOF");
})
.read ();
I was frustrated by the lack of a comprehensive solution for this, so I put together my own attempt (git / npm). Copy-pasted list of features:
Interactive line processing (callback-based, no loading the entire file into RAM)
Optionally, return all lines in an array (detailed or raw mode)
Interactively interrupt streaming, or perform map/filter like processing
Detect any newline convention (PC/Mac/Linux)
Correct eof / last line treatment
Correct handling of multi-byte UTF-8 characters
Retrieve byte offset and byte length information on per-line basis
Random access, using line-based or byte-based offsets
Automatically map line-offset information, to speed up random access
Zero dependencies
Tests
NIH? You decide :-)
Since posting my original answer, I found that split is a very easy to use node module for line reading in a file; Which also accepts optional parameters.
var split = require('split');
fs.createReadStream(file)
.pipe(split())
.on('data', function (line) {
//each chunk now is a seperate line!
});
Haven't tested on very large files. Let us know if you do.
function createLineReader(fileName){
var EM = require("events").EventEmitter
var ev = new EM()
var stream = require("fs").createReadStream(fileName)
var remainder = null;
stream.on("data",function(data){
if(remainder != null){//append newly received data chunk
var tmp = new Buffer(remainder.length+data.length)
remainder.copy(tmp)
data.copy(tmp,remainder.length)
data = tmp;
}
var start = 0;
for(var i=0; i<data.length; i++){
if(data[i] == 10){ //\n new line
var line = data.slice(start,i)
ev.emit("line", line)
start = i+1;
}
}
if(start<data.length){
remainder = data.slice(start);
}else{
remainder = null;
}
})
stream.on("end",function(){
if(null!=remainder) ev.emit("line",remainder)
})
return ev
}
//---------main---------------
fileName = process.argv[2]
lineReader = createLineReader(fileName)
lineReader.on("line",function(line){
console.log(line.toString())
//console.log("++++++++++++++++++++")
})
I wanted to tackle this same problem, basically what in Perl would be:
while (<>) {
process_line($_);
}
My use case was just a standalone script, not a server, so synchronous was fine. These were my criteria:
The minimal synchronous code that could reuse in many projects.
No limits on file size or number of lines.
No limits on length of lines.
Able to handle full Unicode in UTF-8, including characters beyond the BMP.
Able to handle *nix and Windows line endings (old-style Mac not needed for me).
Line endings character(s) to be included in lines.
Able to handle last line with or without end-of-line characters.
Not use any external libraries not included in the node.js distribution.
This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl.
After a surprising amount of effort and a couple of false starts this is the code I came up with. It's pretty fast but less trivial than I would've expected: (fork it on GitHub)
var fs = require('fs'),
StringDecoder = require('string_decoder').StringDecoder,
util = require('util');
function lineByLine(fd) {
var blob = '';
var blobStart = 0;
var blobEnd = 0;
var decoder = new StringDecoder('utf8');
var CHUNK_SIZE = 16384;
var chunk = new Buffer(CHUNK_SIZE);
var eolPos = -1;
var lastChunk = false;
var moreLines = true;
var readMore = true;
// each line
while (moreLines) {
readMore = true;
// append more chunks from the file onto the end of our blob of text until we have an EOL or EOF
while (readMore) {
// do we have a whole line? (with LF)
eolPos = blob.indexOf('\n', blobStart);
if (eolPos !== -1) {
blobEnd = eolPos;
readMore = false;
// do we have the last line? (no LF)
} else if (lastChunk) {
blobEnd = blob.length;
readMore = false;
// otherwise read more
} else {
var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);
lastChunk = bytesRead !== CHUNK_SIZE;
blob += decoder.write(chunk.slice(0, bytesRead));
}
}
if (blobStart < blob.length) {
processLine(blob.substring(blobStart, blobEnd + 1));
blobStart = blobEnd + 1;
if (blobStart >= CHUNK_SIZE) {
// blobStart is in characters, CHUNK_SIZE is in octets
var freeable = blobStart / CHUNK_SIZE;
// keep blob from growing indefinitely, not as deterministic as I'd like
blob = blob.substring(CHUNK_SIZE);
blobStart -= CHUNK_SIZE;
blobEnd -= CHUNK_SIZE;
}
} else {
moreLines = false;
}
}
}
It could probably be cleaned up further, it was the result of trial and error.
Generator based line reader: https://github.com/neurosnap/gen-readlines
var fs = require('fs');
var readlines = require('gen-readlines');
fs.open('./file.txt', 'r', function(err, fd) {
if (err) throw err;
fs.fstat(fd, function(err, stats) {
if (err) throw err;
for (var line of readlines(fd, stats.size)) {
console.log(line.toString());
}
});
});
A new function was added in Node.js v18.11.0 to read files line by line
filehandle.readLines([options])
This is how you use this with a text file you want to read
import { open } from 'node:fs/promises';
myFileReader();
async function myFileReader() {
const file = await open('./TextFileName.txt');
for await (const line of file.readLines()) {
console.log(line)
}
}
To understand more read Node.js documentation here is the link for file system readlines():
https://nodejs.org/api/fs.html#filehandlereadlinesoptions
If you want to read a file line by line and writing this in another:
var fs = require('fs');
var readline = require('readline');
var Stream = require('stream');
function readFileLineByLine(inputFile, outputFile) {
var instream = fs.createReadStream(inputFile);
var outstream = new Stream();
outstream.readable = true;
outstream.writable = true;
var rl = readline.createInterface({
input: instream,
output: outstream,
terminal: false
});
rl.on('line', function (line) {
fs.appendFileSync(outputFile, line + '\n');
});
};
var fs = require('fs');
function readfile(name,online,onend,encoding) {
var bufsize = 1024;
var buffer = new Buffer(bufsize);
var bufread = 0;
var fd = fs.openSync(name,'r');
var position = 0;
var eof = false;
var data = "";
var lines = 0;
encoding = encoding || "utf8";
function readbuf() {
bufread = fs.readSync(fd,buffer,0,bufsize,position);
position += bufread;
eof = bufread ? false : true;
data += buffer.toString(encoding,0,bufread);
}
function getLine() {
var nl = data.indexOf("\r"), hasnl = nl !== -1;
if (!hasnl && eof) return fs.closeSync(fd), online(data,++lines), onend(lines);
if (!hasnl && !eof) readbuf(), nl = data.indexOf("\r"), hasnl = nl !== -1;
if (!hasnl) return process.nextTick(getLine);
var line = data.substr(0,nl);
data = data.substr(nl+1);
if (data[0] === "\n") data = data.substr(1);
online(line,++lines);
process.nextTick(getLine);
}
getLine();
}
I had the same problem and came up with above solution
looks simular to others but is aSync and can read large files very quickly
Hopes this helps
Two questions we must ask ourselves while doing such operations are:
What's the amount of memory used to perform it?
Is the memory consumption increasing drastically with the file size?
Solutions like require('fs').readFileSync() loads the whole file into memory. That means that the amount of memory required to perform operations will be almost equivalent to the file size. We should avoid these for anything larger than 50mbs
We can easily track the amount of memory used by a function by placing these lines of code after the function invocation :
const used = process.memoryUsage().heapUsed / 1024 / 1024;
console.log(
`The script uses approximately ${Math.round(used * 100) / 100} MB`
);
Right now the best way to read particular lines from a large file is using node's readline. The documentation has amazing examples.
This is my favorite way of going through a file, a simple native solution for a progressive (as in not a "slurp" or all-in-memory way) file read with modern async/await. It's a solution that I find "natural" when processing large text files without having to resort to the readline package or any non-core dependency.
let buf = '';
for await ( const chunk of fs.createReadStream('myfile') ) {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop() ?? '';
for( const line of lines ) {
console.log(line);
}
}
if(buf.length) console.log(buf); // last line, if file does not end with newline
You can adjust encoding in the fs.createReadStream or use chunk.toString(<arg>). Also this let's you better fine-tune the line splitting to your taste, ie. use .split(/\n+/) to skip empty lines and control the chunk size with fs.createReadStream('myfile', { highWaterMark: <chunkSize> }).
Don't forget to create a function like processLine(line) to avoid repeating the line processing code twice due to the ending buf leftover. Unfortunately, the ReadStream instance does not update its end-of-file flags in this setup, so there's no way, afaik, to detect within the loop that we're in the last iteration without some more verbose tricks like comparing the file size from a fs.Stats() with .bytesRead. Hence the final buf processing solution, unless you're absolutely sure your file ends with a newline \n, in which case the for await loop should suffice.
Performance Considerations
Chunk sizes are important for performance, the default is 64k for text files and, for multi MB files, larger chunks can improve speed by an order of magnitude.
The above snippet runs at least the same speed (or even 5% faster sometimes) as code based on NodeJS v18's fs.readLine() or based on the readline module (the accepted answer), once you tune highWaterMark to something that your machine can handle, ie. setting it to the same size as the file, if your available memory allows it, is the fastest.
In any case, any of NodeJS line-reading answers here are an order of magnitude slower than the Perl or native *Nix solutions.
Similar alternatives
★ If you prefer the evented asynchronous version, this would be it:
let buf = '';
fs.createReadStream('myfile')
.on('data', chunk => {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop();
for( const line of lines ) {
console.log(line);
}
})
.on('end', () => buf.length && console.log(buf) );
★ Now if you don't mind importing the stream core package, then this is the equivalent piped stream version, which allows for chaining transforms like gzip decompression:
const { Writable } = require('stream');
let buf = '';
fs.createReadStream('myfile').pipe(
new Writable({
write: (chunk, enc, next) => {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop();
for (const line of lines) {
console.log(line);
}
next();
}
})
).on('finish', () => buf.length && console.log(buf) );
I have a little module which does this well and is used by quite a few other projects npm readline Note thay in node v10 there is a native readline module so I republished my module as linebyline https://www.npmjs.com/package/linebyline
if you dont want to use the module the function is very simple:
var fs = require('fs'),
EventEmitter = require('events').EventEmitter,
util = require('util'),
newlines = [
13, // \r
10 // \n
];
var readLine = module.exports = function(file, opts) {
if (!(this instanceof readLine)) return new readLine(file);
EventEmitter.call(this);
opts = opts || {};
var self = this,
line = [],
lineCount = 0,
emit = function(line, count) {
self.emit('line', new Buffer(line).toString(), count);
};
this.input = fs.createReadStream(file);
this.input.on('open', function(fd) {
self.emit('open', fd);
})
.on('data', function(data) {
for (var i = 0; i < data.length; i++) {
if (0 <= newlines.indexOf(data[i])) { // Newline char was found.
lineCount++;
if (line.length) emit(line, lineCount);
line = []; // Empty buffer.
} else {
line.push(data[i]); // Buffer new line data.
}
}
}).on('error', function(err) {
self.emit('error', err);
}).on('end', function() {
// Emit last line if anything left over since EOF won't trigger it.
if (line.length){
lineCount++;
emit(line, lineCount);
}
self.emit('end');
}).on('close', function() {
self.emit('close');
});
};
util.inherits(readLine, EventEmitter);
Another solution is to run logic via sequential executor nsynjs. It reads file line-by-line using node readline module, and it doesn't use promises or recursion, therefore not going to fail on large files. Here is how the code will looks like:
var nsynjs = require('nsynjs');
var textFile = require('./wrappers/nodeReadline').textFile; // this file is part of nsynjs
function process(textFile) {
var fh = new textFile();
fh.open('path/to/file');
var s;
while (typeof(s = fh.readLine(nsynjsCtx).data) != 'undefined')
console.log(s);
fh.close();
}
var ctx = nsynjs.run(process,{},textFile,function () {
console.log('done');
});
Code above is based on this exampe: https://github.com/amaksr/nsynjs/blob/master/examples/node-readline/index.js
i use this:
function emitLines(stream, re){
re = re && /\n/;
var buffer = '';
stream.on('data', stream_data);
stream.on('end', stream_end);
function stream_data(data){
buffer += data;
flush();
}//stream_data
function stream_end(){
if(buffer) stream.emmit('line', buffer);
}//stream_end
function flush(){
var re = /\n/;
var match;
while(match = re.exec(buffer)){
var index = match.index + match[0].length;
stream.emit('line', buffer.substring(0, index));
buffer = buffer.substring(index);
re.lastIndex = 0;
}
}//flush
}//emitLines
use this function on a stream and listen to the line events that is will emit.
gr-
While you should probably use the readline module as the top answer suggests, readline appears to be oriented toward command line interfaces rather than line reading. It's also a little bit more opaque regarding buffering. (Anyone who needs a streaming line oriented reader probably will want to tweak buffer sizes). The readline module is ~1000 lines while this, with stats and tests, is 34.
const EventEmitter = require('events').EventEmitter;
class LineReader extends EventEmitter{
constructor(f, delim='\n'){
super();
this.totalChars = 0;
this.totalLines = 0;
this.leftover = '';
f.on('data', (chunk)=>{
this.totalChars += chunk.length;
let lines = chunk.split(delim);
if (lines.length === 1){
this.leftover += chunk;
return;
}
lines[0] = this.leftover + lines[0];
this.leftover = lines[lines.length-1];
if (this.leftover) lines.pop();
this.totalLines += lines.length;
for (let l of lines) this.onLine(l);
});
// f.on('error', ()=>{});
f.on('end', ()=>{console.log('chars', this.totalChars, 'lines', this.totalLines)});
}
onLine(l){
this.emit('line', l);
}
}
//Command line test
const f = require('fs').createReadStream(process.argv[2], 'utf8');
const delim = process.argv[3];
const lineReader = new LineReader(f, delim);
lineReader.on('line', (line)=> console.log(line));
Here's an even shorter version, without the stats, at 19 lines:
class LineReader extends require('events').EventEmitter{
constructor(f, delim='\n'){
super();
this.leftover = '';
f.on('data', (chunk)=>{
let lines = chunk.split(delim);
if (lines.length === 1){
this.leftover += chunk;
return;
}
lines[0] = this.leftover + lines[0];
this.leftover = lines[lines.length-1];
if (this.leftover)
lines.pop();
for (let l of lines)
this.emit('line', l);
});
}
}
const fs = require("fs")
fs.readFile('./file', 'utf-8', (err, data) => {
var innerContent;
console.log("Asynchronous read: " + data.toString());
const lines = data.toString().split('\n')
for (let line of lines)
innerContent += line + '<br>';
});
I wrap the whole logic of daily line processing as a npm module: line-kit
https://www.npmjs.com/package/line-kit
// example
var count = 0
require('line-kit')(require('fs').createReadStream('/etc/issue'),
(line) => { count++; },
() => {console.log(`seen ${count} lines`)})
I use below code the read lines after verify that its not a directory and its not included in the list of files need not to be check.
(function () {
var fs = require('fs');
var glob = require('glob-fs')();
var path = require('path');
var result = 0;
var exclude = ['LICENSE',
path.join('e2e', 'util', 'db-ca', 'someother-file'),
path.join('src', 'favicon.ico')];
var files = [];
files = glob.readdirSync('**');
var allFiles = [];
var patternString = [
'trade',
'order',
'market',
'securities'
];
files.map((file) => {
try {
if (!fs.lstatSync(file).isDirectory() && exclude.indexOf(file) === -1) {
fs.readFileSync(file).toString().split(/\r?\n/).forEach(function(line){
patternString.map((pattern) => {
if (line.indexOf(pattern) !== -1) {
console.log(file + ' contain `' + pattern + '` in in line "' + line +'";');
result = 1;
}
});
});
}
} catch (e) {
console.log('Error:', e.stack);
}
});
process.exit(result);
})();
I have looked through all above answers, all of them use third-party library to solve it. It's have a simple solution in Node's API. e.g
const fs= require('fs')
let stream = fs.createReadStream('<filename>', { autoClose: true })
stream.on('data', chunk => {
let row = chunk.toString('ascii')
}))
Can I configure console.log so that the logs are written on a file instead of being printed in the console?
You could also just overload the default console.log function:
var fs = require('fs');
var util = require('util');
var log_file = fs.createWriteStream(__dirname + '/debug.log', {flags : 'w'});
var log_stdout = process.stdout;
console.log = function(d) { //
log_file.write(util.format(d) + '\n');
log_stdout.write(util.format(d) + '\n');
};
Above example will log to debug.log and stdout.
Edit: See multiparameter version by Clément also on this page.
Update 2013 - This was written around Node v0.2 and v0.4; There are much better utilites now around logging. I highly recommend Winston
Update Late 2013 - We still use winston, but now with a logger library to wrap the functionality around logging of custom objects and formatting. Here is a sample of our logger.js https://gist.github.com/rtgibbons/7354879
Should be as simple as this.
var access = fs.createWriteStream(dir + '/node.access.log', { flags: 'a' })
, error = fs.createWriteStream(dir + '/node.error.log', { flags: 'a' });
// redirect stdout / stderr
proc.stdout.pipe(access);
proc.stderr.pipe(error);
If you are looking for something in production winston is probably the best choice.
If you just want to do dev stuff quickly, output directly to a file (I think this works only for *nix systems):
nohup node simple-server.js > output.log &
I often use many arguments to console.log() and console.error(), so my solution would be:
var fs = require('fs');
var util = require('util');
var logFile = fs.createWriteStream('log.txt', { flags: 'a' });
// Or 'w' to truncate the file every time the process starts.
var logStdout = process.stdout;
console.log = function () {
logFile.write(util.format.apply(null, arguments) + '\n');
logStdout.write(util.format.apply(null, arguments) + '\n');
}
console.error = console.log;
Winston is a very-popular npm-module used for logging.
Here is a how-to.
Install winston in your project as:
npm install winston --save
Here's a configuration ready to use out-of-box that I use frequently in my projects as logger.js under utils.
/**
* Configurations of logger.
*/
const winston = require('winston');
const winstonRotator = require('winston-daily-rotate-file');
const consoleConfig = [
new winston.transports.Console({
'colorize': true
})
];
const createLogger = new winston.Logger({
'transports': consoleConfig
});
const successLogger = createLogger;
successLogger.add(winstonRotator, {
'name': 'access-file',
'level': 'info',
'filename': './logs/access.log',
'json': false,
'datePattern': 'yyyy-MM-dd-',
'prepend': true
});
const errorLogger = createLogger;
errorLogger.add(winstonRotator, {
'name': 'error-file',
'level': 'error',
'filename': './logs/error.log',
'json': false,
'datePattern': 'yyyy-MM-dd-',
'prepend': true
});
module.exports = {
'successlog': successLogger,
'errorlog': errorLogger
};
And then simply import wherever required as this:
const errorLog = require('../util/logger').errorlog;
const successlog = require('../util/logger').successlog;
Then you can log the success as:
successlog.info(`Success Message and variables: ${variable}`);
and Errors as:
errorlog.error(`Error Message : ${error}`);
It also logs all the success-logs and error-logs in a file under logs directory date-wise as you can see here.
const fs = require("fs");
const {keys} = Object;
const {Console} = console;
/**
* Redirect console to a file. Call without path or with false-y
* value to restore original behavior.
* #param {string} [path]
*/
function file(path) {
const con = path ? new Console(fs.createWriteStream(path)) : null;
keys(Console.prototype).forEach(key => {
if (path) {
this[key] = (...args) => con[key](...args);
} else {
delete this[key];
}
});
};
// patch global console object and export
module.exports = console.file = file;
To use it, do something like:
require("./console-file");
console.file("/path/to.log");
console.log("write to file!");
console.error("also write to file!");
console.file(); // go back to writing to stdout
For simple cases, we could redirect the Standard Out (STDOUT) and Standard Error (STDERR) streams directly to a file(say, test.log) using '>' and '2>&1'
Example:
// test.js
(function() {
// Below outputs are sent to Standard Out (STDOUT) stream
console.log("Hello Log");
console.info("Hello Info");
// Below outputs are sent to Standard Error (STDERR) stream
console.error("Hello Error");
console.warn("Hello Warning");
})();
node test.js > test.log 2>&1
As per the POSIX standard, 'input', 'output' and 'error' streams are identified by the positive integer file descriptors (0, 1, 2). i.e., stdin is 0, stdout is 1, and stderr is 2.
Step 1: '2>&1' will redirect from 2 (stderr) to 1 (stdout)
Step 2: '>' will redirect from 1 (stdout) to file (test.log)
If this is for an application, you're probably better off using a logging module. It'll give you more flexibility. Some suggestions.
winston https://github.com/winstonjs/winston
log4js https://github.com/nomiddlename/log4js-node
Straight from nodejs's API docs on Console
const output = fs.createWriteStream('./stdout.log');
const errorOutput = fs.createWriteStream('./stderr.log');
// custom simple logger
const logger = new Console(output, errorOutput);
// use it like console
const count = 5;
logger.log('count: %d', count);
// in stdout.log: count 5
Another solution not mentioned yet is by hooking the Writable streams in process.stdout and process.stderr. This way you don't need to override all the console functions that output to stdout and stderr. This implementation redirects both stdout and stderr to a log file:
var log_file = require('fs').createWriteStream(__dirname + '/log.txt', {flags : 'w'})
function hook_stream(stream, callback) {
var old_write = stream.write
stream.write = (function(write) {
return function(string, encoding, fd) {
write.apply(stream, arguments) // comments this line if you don't want output in the console
callback(string, encoding, fd)
}
})(stream.write)
return function() {
stream.write = old_write
}
}
console.log('a')
console.error('b')
var unhook_stdout = hook_stream(process.stdout, function(string, encoding, fd) {
log_file.write(string, encoding)
})
var unhook_stderr = hook_stream(process.stderr, function(string, encoding, fd) {
log_file.write(string, encoding)
})
console.log('c')
console.error('d')
unhook_stdout()
unhook_stderr()
console.log('e')
console.error('f')
It should print in the console
a
b
c
d
e
f
and in the log file:
c
d
For more info, check this gist.
Overwriting console.log is the way to go. But for it to work in required modules, you also need to export it.
module.exports = console;
To save yourself the trouble of writing log files, rotating and stuff, you might consider using a simple logger module like winston:
// Include the logger module
var winston = require('winston');
// Set up log file. (you can also define size, rotation etc.)
winston.add(winston.transports.File, { filename: 'somefile.log' });
// Overwrite some of the build-in console functions
console.error = winston.error;
console.log = winston.info;
console.info = winston.info;
console.debug = winston.debug;
console.warn = winston.warn;
module.exports = console;
If you're using linux, you can also use output redirection. Not sure about Windows.
node server.js >> file.log 2>> file.log
>> file.log to redirect stdout to the file
2>> file.log to redirect stderr to the file
others use the shorthand &>> for both stdout and stderr but it's not accepted by both my mac and ubuntu :(
extra: > overwrites, while >> appends.
By the way, regarding NodeJS loggers, I use pino + pino-pretty logger
METHOD STDOUT AND STDERR
This approach can help you (I use something similar in my projects) and works for all methods including console.log, console.warn, console.error, console.info
This method write the bytes written in stdout and stderr to file. Is better than changing console.log, console.warn, console.error, console.info methods, because output will be exact the same as this methods output
var fs= require("fs")
var os= require("os")
var HOME= os.homedir()
var stdout_r = fs.createWriteStream(HOME + '/node.stdout.log', { flags: 'a' })
var stderr_r = fs.createWriteStream(HOME + '/node.stderr.log', { flags: 'a' })
var attachToLog= function(std, std_new){
var originalwrite= std.write
std.write= function(data,enc){
try{
var d= data
if(!Buffer.isBuffer(d))
d= Buffer.from(data, (typeof enc === 'string') ? enc : "utf8")
std_new.write.apply(std_new, d)
}catch(e){}
return originalwrite.apply(std, arguments)
}
}
attachToLog(process.stdout, stdout_r)
attachToLog(process.stderr, stderr_r)
// recommended catch error on stdout_r and stderr_r
// stdout_r.on("error", yourfunction)
// stderr_r.on("error", yourfunction)
Adding to the answer above, a lit bit of an expansion to the short and efficient code overriding console.log. Minor additions: set filename with date, wrapper function, also do the original console.logging to keep the console active with the info.
Usage: in the beginning of your code, run setConsoleLogToFile([FILENAME]).
const fs = require("fs"),
util = require('util');
const getPrettyDate = ()=> new Date().toString().replace(":","-").replace(/00\s\(.*\)/, "").replace(` ${new Date().getFullYear()}`, ",").replace(/:\d\d\s/, " ");
module.exports.getPrettyDate = getPrettyDate;
module.exports.setConsoleLogToFile = (filename) => {
const log_file = fs.createWriteStream(`${__dirname}/${filename} - ${getPrettyDate()}.log`, { flags: 'w' }),
log_stdout = process.stdout;
const origConsole = console.log;
console.log = (d) => {
origConsole(d);
log_file.write(util.format(d) + '\n');
log_stdout.write(util.format(d) + '\n');
};
}
Most logger is overkill and does not support the build in console.log correctly. Hence I create console-log-to-file:
import { consoleLogToFile } from "console-log-to-file";
// or `const { consoleLogToFile } = require("console-log-to-file/dist/index.cjs.js")`
consoleLogToFile({
logFilePath: "/log/default.log",
});
// all of your console.log/warn/error/info will work as it does and save to file now.
If you are looking for a solution without modifying any code, here is a simple solution.
It requires pm2, just add it to your node modules and start you app with
pm2 start server.js
And you are done, console.logs are now automatically registered under home/.pm2/logs/server-out.log.
Improve on Andres Riofrio , to handle any number of arguments
var fs = require('fs');
var util = require('util');
var log_file = fs.createWriteStream(__dirname + '/debug.log', {flags : 'w'});
var log_stdout = process.stdout;
console.log = function(...args) {
var output = args.join(' ');
log_file.write(util.format(output) + '\r\n');
log_stdout.write(util.format(output) + '\r\n');
};
You can now use Caterpillar which is a streams based logging system, allowing you to log to it, then pipe the output off to different transforms and locations.
Outputting to a file is as easy as:
var logger = new (require('./').Logger)();
logger.pipe(require('fs').createWriteStream('./debug.log'));
logger.log('your log message');
Complete example on the Caterpillar Website
For future users. #keshavDulal answer doesn't work for latest version. And I couldn't find a proper fix for the issues that are reporting in the latest version 3.3.3.
Anyway I finally fixed it after researching a bit. Here is the solution for winston version 3.3.3
Install winston and winston-daily-rotate-file
npm install winston
npm install winston-daily-rotate-file
Create a new file utils/logger.js
const winston = require('winston');
const winstonRotator = require('winston-daily-rotate-file');
var logger = new winston.createLogger({
transports: [
new (winston.transports.DailyRotateFile)({
name: 'access-file',
level: 'info',
filename: './logs/access.log',
json: false,
datePattern: 'yyyy-MM-DD',
prepend: true,
maxFiles: 10
}),
new (winston.transports.DailyRotateFile)({
name: 'error-file',
level: 'error',
filename: './logs/error.log',
json: false,
datePattern: 'yyyy-MM-DD',
prepend: true,
maxFiles: 10
})
]
});
module.exports = {
logger
};
Then in any file where you want to use logging import the module like
const logger = require('./utils/logger').logger;
Use logger like the following:
logger.info('Info service started');
logger.error('Service crashed');
if you are using forever to keep your node app running, then typing forever list will show you the path to the log file that console.log is writing too
You can use the nodejs Console constructor
const mylog = new console.Console(
fs.createWriteStream("log/logger.log"),
fs.createWriteStream("log/error.log")
);
And then you can use it just like the normal console class, eg:
mylog.log("Ok!"); // Will be written into 'log/logger.log'
mylog.error("Bad!"); // Will be written into 'log/error.log'
You can also have a look at this npm module:
https://www.npmjs.com/package/noogger
noogger
simple and straight forward...
I took on the idea of swapping the output stream to a my stream.
const LogLater = require ('./loglater.js');
var logfile=new LogLater( 'log'+( new Date().toISOString().replace(/[^a-zA-Z0-9]/g,'-') )+'.txt' );
var PassThrough = require('stream').PassThrough;
var myout= new PassThrough();
var wasout=console._stdout;
myout.on('data',(data)=>{logfile.dateline("\r\n"+data);wasout.write(data);});
console._stdout=myout;
var myerr= new PassThrough();
var waserr=console._stderr;
myerr.on('data',(data)=>{logfile.dateline("\r\n"+data);waserr.write(data);});
console._stderr=myerr;
loglater.js:
const fs = require('fs');
function LogLater(filename, noduplicates, interval) {
this.filename = filename || "loglater.txt";
this.arr = [];
this.timeout = false;
this.interval = interval || 1000;
this.noduplicates = noduplicates || true;
this.onsavetimeout_bind = this.onsavetimeout.bind(this);
this.lasttext = "";
process.on('exit',()=>{ if(this.timeout)clearTimeout(this.timeout);this.timeout=false; this.save(); })
}
LogLater.prototype = {
_log: function _log(text) {
this.arr.push(text);
if (!this.timeout) this.timeout = setTimeout(this.onsavetimeout_bind, this.interval);
},
text: function log(text, loglastline) {
if (this.noduplicates) {
if (this.lasttext === text) return;
this.lastline = text;
}
this._log(text);
},
line: function log(text, loglastline) {
if (this.noduplicates) {
if (this.lasttext === text) return;
this.lastline = text;
}
this._log(text + '\r\n');
},
dateline: function dateline(text) {
if (this.noduplicates) {
if (this.lasttext === text) return;
this.lastline = text;
}
this._log(((new Date()).toISOString()) + '\t' + text + '\r\n');
},
onsavetimeout: function onsavetimeout() {
this.timeout = false;
this.save();
},
save: function save() { fs.appendFile(this.filename, this.arr.splice(0, this.arr.length).join(''), function(err) { if (err) console.log(err.stack) }); }
}
module.exports = LogLater;
I just build a pack to do this, hope you like it ;)
https://www.npmjs.com/package/writelog
I for myself simply took the example from winston and added the log(...) method (because winston names it info(..):
Console.js:
"use strict"
// Include the logger module
const winston = require('winston');
const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [
//
// - Write to all logs with level `info` and below to `combined.log`
// - Write all logs error (and below) to `error.log`.
//
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' })
]
});
//
// If we're not in production then log to the `console` with the format:
// `${info.level}: ${info.message} JSON.stringify({ ...rest }) `
//
if (process.env.NODE_ENV !== 'production') {
logger.add(new winston.transports.Console({
format: winston.format.simple()
}));
}
// Add log command
logger.log=logger.info;
module.exports = logger;
Then simply use in your code:
const console = require('Console')
Now you can simply use the normal log functions in your file and it will create a file AND log it to your console (while debugging/developing). Because of if (process.env.NODE_ENV !== 'production') { (in case you want it also in production)...
Create a utils/logger.js file with:
var fs = require('fs');
var util = require('util');
var log_file = fs.createWriteStream(__dirname + '/../logs/server.log', { flags: 'w' });
var log_stdout = process.stdout;
console.log = function () { //
[...arguments].forEach(element => {
log_file.write(util.format(element) + '\n');
log_stdout.write(util.format(element) + '\n');
});
};
module.exports = {
console
}
Include the logger.js file from any file where you want to console.log like:
const console = require('./utils/logger').console;
Create a logs folder and create an empty server.log file in it and run your app :)
Rudy Huynh's solution worked really well for me. I added a little bit to have it spit out files with today's date and time.
var dateNow = new Date();
var timeNow = dateNow.getHours() + '-' + dateNow.getMinutes();
var logPath = "log/" + dateNow.toDateString() + ' -' + ' Start Time - ' + timeNow + ".log"
consoleLogToFile({
logFilePath: logPath
});
It's not very elegant but this way it'll save different, easy to read, log files instead of just updating the same "default.log" file.
Based on multiparameter version by Clément, just without color codes for the text file
var fs = require('fs');
var util = require('util');
var logFile = fs.createWriteStream('log.txt', { flags: 'a' });
// Or 'w' to truncate the file every time the process starts.
var logStdout = process.stdout;
console.log = function () {
// Storing without color codes
logFile.write(util.format.apply(null,arguments).replace(/\033\[[0-9;]*m/g,"") + '\n');
// Display normally, with colors to Stdout
logStdout.write(util.format.apply(null, arguments) + '\n');
}
Note: Answering since I couldn't comment
Problem
I'm trying to scan a drive directory (recursively walk all the paths) and write all the paths to a file (as it's finding them) using fs.createWriteStream in order to keep the memory usage low, but it doesn't work, the memory usage reaches 2GB during the scan.
Expected
I was expecting fs.createWriteStream to automatically handle memory/disk usage at all times, keeping memory usage at a minimum with back-pressure.
Code
const fs = require('fs')
const walkdir = require('walkdir')
let dir = 'C:/'
let options = {
"max_depth": 0,
"track_inodes": true,
"return_object": false,
"no_return": true,
}
const wstream = fs.createWriteStream("C:/Users/USERNAME/Desktop/paths.txt")
let walker = walkdir(dir, options)
walker.on('path', (path) => {
wstream.write(path + '\n')
})
walker.on('end', (path) => {
wstream.end()
})
Is it because I'm not using .pipe()? I tried creating a new Stream.Readable({read{}}) and then inside the .on('path' emitter pushing paths into it with readable.push(path) but that didn't really work.
UPDATE:
Method 2:
I tried the proposed in the answers drain method but it doesn't help much, it does reduce memory usage to 500mb (which is still too much for a stream) but it slows down the code significantly (from seconds to minutes)
Method 3:
I also tried using readdirp, it uses even less memory (~400mb) and is faster but I don't know how to pause it and use the drain method there to reduce the memory usage further:
const readdirp = require('readdirp')
let dir = 'C:/'
const wstream = fs.createWriteStream("C:/Users/USERNAME/Desktop/paths.txt")
readdirp(dir, {alwaysStat: false, type: 'files_directories'})
.on('data', (entry) => {
wstream.write(`${entry.fullPath}\n`)
})
Method 4:
I also tried doing this operation with a custom recursive walker, and even though it uses only 30mb of memory, which is what I wanted, but it is like 10 times slower than the readdirp method and it is synchronous which is undesirable:
const fs = require('fs')
const path = require('path')
let dir = 'C:/'
function customRecursiveWalker(dir) {
fs.readdirSync(dir).forEach(file => {
let fullPath = path.join(dir, file)
// Folders
if (fs.lstatSync(fullPath).isDirectory()) {
fs.appendFileSync("C:/Users/USERNAME/Desktop/paths.txt", `${fullPath}\n`)
customRecursiveWalker(fullPath)
}
// Files
else {
fs.appendFileSync("C:/Users/USERNAME/Desktop/paths.txt", `${fullPath}\n`)
}
})
}
customRecursiveWalker(dir)
Preliminary observation: you've attempted to get the results you want using multiple approaches. One complication when comparing the approaches you used is that they do not all do the same work. If you run tests on file tree that contains only regular files, that tree does not contain mount points, you can probably compare the approaches fairly, but when you start adding mount points, symbolic links, etc, you may get different memory and time statistics merely due to the fact that one approach excludes files that another approach includes.
I've initially attempted a solution using readdirp, but unfortunately, but that library appears buggy to me. Running it on my system here, I got inconsistent results. One run would output 10Mb of data, another run with the same input parameters would output 22Mb, then I'd get another number, etc. I looked at the code and found that it does not respect the return value of push:
_push(entry) {
if (this.readable) {
this.push(entry);
}
}
As per the documentation the push method may return a false value, in which case the Readable stream should stop producing data and wait until _read is called again. readdirp entirely ignores that part of the specification. It is crucial to pay attention to the return value of push to get proper handling of back-pressure. There are also other things that seemed questionable in that code.
So I abandoned that and worked on a proof of concept showing how it could be done. The crucial parts are:
When the push method returns false it is imperative to stop adding data to the stream. Instead, we record where we were, and stop.
We start again only when _read is called.
If you uncomment the console.log statements that print START and STOP. You'll see them printed out in succession on the console. We start, produce data until Node tells us to stop, and then we stop, until Node tells us to start again, and so on.
const stream = require("stream");
const fs = require("fs");
const { readdir, lstat } = fs.promises;
const path = require("path");
class Walk extends stream.Readable {
constructor(root, maxDepth = Infinity) {
super();
this._maxDepth = maxDepth;
// These fields allow us to remember where we were when we have to pause our
// work.
// The path of the directory to process when we resume processing, and the
// depth of this directory.
this._curdir = [root, 1];
// The directories still to process.
this._dirs = [this._curdir];
// The list of files to process when we resume processing.
this._files = [];
// The location in `this._files` were to continue processing when we resume.
this._ix = 0;
// A flag recording whether or not the fetching of files is currently going
// on.
this._started = false;
}
async _fetch() {
// Recall where we were by loading the state in local variables.
let files = this._files;
let dirs = this._dirs;
let [dir, depth] = this._curdir;
let ix = this._ix;
while (true) {
// If we've gone past the end of the files we were processing, then
// just forget about them. This simplifies the code that follows a bit.
if (ix >= files.length) {
ix = 0;
files = [];
}
// Read directories until we have files to process.
while (!files.length) {
// We've read everything, end the stream.
if (dirs.length === 0) {
// This is how the stream API requires us to indicate the stream has
// ended.
this.push(null);
// We're no longer running.
this._started = false;
return;
}
// Here, we get the next directory to process and get the list of
// files in it.
[dir, depth] = dirs.pop();
try {
files = await readdir(dir, { withFileTypes: true });
}
catch (ex) {
// This is a proof-of-concept. In a real application, you should
// determine what exceptions you want to ignore (e.g. EPERM).
}
}
// Process each file.
for (; ix < files.length; ++ix) {
const dirent = files[ix];
// Don't include in the results those files that are not directories,
// files or symbolic links.
if (!(dirent.isFile() || dirent.isDirectory() || dirent.isSymbolicLink())) {
continue;
}
const fullPath = path.join(dir, dirent.name);
if (dirent.isDirectory() & depth < this._maxDepth) {
// Keep track that we need to walk this directory.
dirs.push([fullPath, depth + 1]);
}
// Finally, we can put the data into the stream!
if (!this.push(`${fullPath}\n`)) {
// If the push returned false, we have to stop pushing results to the
// stream until _read is called again, so we have to stop.
// Uncomment this if you want to see when the stream stops.
// console.log("STOP");
// Record where we were in our processing.
this._files = files;
// The element at ix *has* been processed, so ix + 1.
this._ix = ix + 1;
this._curdir = [dir, depth];
// We're stopping, so indicate that!
this._started = false;
return;
}
}
}
}
async _read() {
// Do not start the process that puts data on the stream over and over
// again.
if (this._started) {
return;
}
this._started = true; // Yep, we've started.
// Uncomment this if you want to see when the stream starts.
// console.log("START");
await this._fetch();
}
}
// Change the paths to something that makes sense for you.
stream.pipeline(new Walk("/home/", 5),
fs.createWriteStream("/tmp/paths3.txt"),
(err) => console.log("ended with", err));
When I run the first attempt you made with walkdir here, I get the following statistics:
Elapsed time (wall clock): 59 sec
Maximum resident set size: 2.90 GB
When I use the code I've shown above:
Elapsed time (wall clock): 35 sec
Maximum resident set size: 0.1 GB
The file tree I use for the tests produces a file listing of 792 MB
You could exploit the returned value from WritableStream.write(): it essentially states if you should continue to read or not. a WritableStream has an internal property that stores the threshold after which the buffer should be processed by the OS. The drain event will be emitted when the buffer has been flushed, i.e. you can call safely call WritableStream.write() without risking to excessively fill the buffer (which means the RAM). Luckily for you, walkdir let you control the process: you can emit pause(pause the walk. no more events will be emitted until resume) and resume(resume the walk) event from the walkdir object, pausing and resuming the writing process on you stream accordingly. Try with this:
let is_emitter_paused = false;
wstream.on('drain', (evt) => {
if (is_emitter_paused) {
walkdir.resume();
}
});
walkdir.on('path', function(path, stat) {
is_emitter_paused = !wstream.write(path + '\n');
if (is_emitter_paused) {
walkdir.pause();
}
});
Here's an implementation inspired by #Louis's answer. I think it's a bit easier to follow and in my minimal testing it performs about the same.
const fs = require('fs');
const path = require('path');
const stream = require('stream');
class Walker extends stream.Readable {
constructor(root = process.cwd(), maxDepth = Infinity) {
super();
// Dirs to process
this._dirs = [{ path: root, depth: 0 }];
// Max traversal depth
this._maxDepth = maxDepth;
// Files to flush
this._files = [];
}
_drain() {
while (this._files.length > 0) {
const file = this._files.pop();
if (file.isFile() || file.isDirectory() || file.isSymbolicLink()) {
const filePath = path.join(this._dir.path, file.name);
if (file.isDirectory() && this._maxDepth > this._dir.depth) {
// Add directory to be walked at a later time
this._dirs.push({ path: filePath, depth: this._dir.depth + 1 });
}
if (!this.push(`${filePath}\n`)) {
// Hault walking
return false;
}
}
}
if (this._dirs.length === 0) {
// Walking complete
this.push(null);
return false;
}
// Continue walking
return true;
}
async _step() {
try {
this._dir = this._dirs.pop();
this._files = await fs.promises.readdir(this._dir.path, { withFileTypes: true });
} catch (e) {
this.emit('error', e); // Uh oh...
}
}
async _walk() {
this.walking = true;
while (this._drain()) {
await this._step();
}
this.walking = false;
}
_read() {
if (!this.walking) {
this._walk();
}
}
}
stream.pipeline(new Walker('some/dir/path', 5),
fs.createWriteStream('output.txt'),
(err) => console.log('ended with', err));
I was looking for this feature in node.js and I haven't found it.
Can I implement it myself? As far as I know, node.js doesn't load any file at it's startup (like Bash does with .bashrc) and I haven't noticed any way to somehow override shell prompt.
Is there a way to implement it without writing custom shell?
You could monkey-patch the REPL. Note that you must use the callback version of the completer, otherwise it won't work correctly:
var repl = require('repl').start()
var _completer = repl.completer.bind(repl)
repl.completer = function(line, cb) {
// ...
_completer(line, cb)
}
Just as a reference.
readline module has readline.createInterface(options) method that accepts an optional completer function that makes a tab completion.
function completer(line) {
var completions = '.help .error .exit .quit .q'.split(' ')
var hits = completions.filter(function(c) { return c.indexOf(line) == 0 })
// show all completions if none found
return [hits.length ? hits : completions, line]
}
and
function completer(linePartial, callback) {
callback(null, [['123'], linePartial]);
}
link to the api docs: http://nodejs.org/api/readline.html#readline_readline_createinterface_options
You can implement tab functionality using completer function like below.
const readline = require('readline');
/*
* This function returns an array of matched strings that starts with given
* line, if there is not matched string then it return all the options
*/
var autoComplete = function completer(line) {
const completions = 'var const readline console globalObject'.split(' ');
const hits = completions.filter((c) => c.startsWith(line));
// show all completions if none found
return [hits.length ? hits : completions, line];
}
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
completer: autoComplete
});
rl.setPrompt("Type some character and press Tab key for auto completion....\n");
rl.prompt();
rl.on('line', (data) => {
console.log(`Received: ${data}`);
});
Reference :
https://self-learning-java-tutorial.blogspot.com/2018/10/nodejs-readlinecreateinterfaceoptions_2.html