read file stream line by line synchronously - javascript

I'm looking at nodejs readline module documentation for a task where I've to read a very large file line by line and it looks good. But for my particular task, I need it to read lines synchronously ie. no matter what, line 5 must not be read before line 4, and due to nature of node, I just want to confirm that is this code safe for that usage -
const readline = require('readline');
const fs = require('fs');
const rl = readline.createInterface({
input: fs.createReadStream('sample.txt')
});
rl.on('line', (line) => {
console.log(`Line from file: ${line}`);
});
If not, what should I use/do? Currently it is working for me but I don't know if it'll work with large lines where next line could be parsed faster than previous one etc..

I doubt very much that it is possible, that the callback fired later can be executed earlier than another one.
Basically, it refers to the event loop and stack of the process.
Still, to guarantee I can suggest to implement something similar to async/queue, but with ability to dynamically push callbacks.
Assuming you will have something like this:
const Queue = require('./my-queue')
const queue = new Queue()
function addLineToQueue(line) {
queue.push(function() {
// do some job with line
console.log(`Line: "${line}" was successfully processed!`)
})
}
You will modify your code:
rl.on('line', (line) => {
addLineToQueue(line)
console.log(`Added line to queue: ${line}`)
})
And sure your queue implementation should start as far as it has any tasks to execute. This way the order of callbacks will be guaranteed. But as for me it looks like a little overhead.

Related

Guarantee for streams delaying flow of data

Node streams, from what i could find1, delay all flowing of data until the end of the tick (typically process.nextTick, but which queue isn't important), and don't start pumping data synchronously on method calls.
Sadly, i could not find a guarantee for this behavior in the docs (hopefully just missed it), and can therefore hardly depend on it. Is this behavior in the public API? Does this extend to all streams using the API for stream implementers?
To elaborate, except for explicitly reading sync with stream.Readable.prototype.read, all mechanisms appear to not synchronously start pumping data:
multiple event handlers in a row:
import * as stream from 'node:stream';
const r = stream.Readable.from(['my\ndata', 'more\nstuff']);
r.on('data', data => { /* do something */ });
// oops, attaching the first event handler could have already
// sent out all data synchronously
r.on('data', data => { /* do more */ });
starting a pipe, before the event handlers exist:
import * as stream from 'node:stream';
import * as readline from 'node:readline';
const r = stream.Readable.from(['my\ndata', 'more\nstuff']);
const p = new stream.PassThrough();
const rl = readline.createInterface({
input: p,
output: null,
});
r.pipe(p);
// oops, starting the pipe could have already sent all the data,
// giving "line" events without any listener being attached yet
rl.on('line', line => { /* do something */ });
My hope is i missed a line in the docs, and that's that. Otherwise, it will be hard to tell, how much i can depend on this behavior. E.g. in the examples above, one could explicitly pause the stream first, and resume it in nextTick (or similar), but it would be considerably cleaner, if this behavior could be depended upon.
[1]: e.g. the entire added complexity and construct of keeping track of sync, or old posts asking for the behavior at the time

How do I regularly update the UI while performing an asynchronic task?

Let's say I'm reading a very large file like so (or perform another task that takes a long time and continuously updates an object):
let myParsedObject = { };
const loadFile = async (filepath, filename) => {
const rl = readline.createInterface({
input: fs.createReadStream(path.join(filepath, filename)),
crlfDelay: Infinity
});
rl.on('line', (line) => { parseLineToObject(line) });
}
Is there a way to update the UI (electron in my case, but the same should apply to HTML) at regular intervals (e.g. 500ms), so I can track the progress? I know there is setInterval, but I can't figure out how to make it work.
In C++ I have to worry about the object being updated while writing to it, how should I deal with it in Javascript?
All I can think of is calling the update function every x operations, but that seems very arbitrary and can vary quite a lot.
Thank you in advance!

Why does this code produce an empty dictionary and how can I fix it?

I am trying to read a file and split on multiple characters. If I put the log statement within the rl.on block I get some outputs but for some reason the dictionary is empty again when the block is done.
let rl = readline.createInterface({
input: fs.createReadStream('Spanish.txt'),
})
let dict={};
let arr1=[];
let arr2=[];
rl.on(`line`, (line)=>{
if (!line.startsWith('#')){
arr1=line.split('\t');
if (arr1[1]!=undefined) {
arr2 = arr1[1].split('[').join(',').split('/').
join(',').split('(').join(',').split(',');
dict[arr1[0]]=arr2[0];
}
}
});
console.log(dict);
The lines are read asynchronously, so by the time you log your dict, processing has not yet completed.
You can listen for the close event to determine whether the file has been fully read:
rl.on('close', () => console.log(dict));
Better yet, use an async/await-based approach as detailed in the Node.js documentation.

How to figure out what's holding Node.js from exiting?

I have this problem very often with various Node.js scripts that I write. After everything is done, they do not exit. Usually, it's an unclosed socket or readline interface.
When the script gets bigger, this is really hard to find out. Is there a tool of some sort that would tell me what's NodeJS waiting for? I'm asking for a generic solution that would help debug all cases of NodeJS not exiting when it's supposed to.
Samples:
Exhibit I. - Process stdin blocks node even after listener is removed
const readline = require('readline');
readline.emitKeypressEvents(process.stdin);
if (typeof process.stdin.setRawMode == "function")
process.stdin.setRawMode(true);
const keypressListener = (stream, key) => {
console.log(key);
process.stdin.removeListener("keypress", keypressListener);
}
process.stdout.write("Press any key...");
process.stdin.on("keypress", keypressListener);
Exhibit II. - readline blocks Node if you forget to close the interface
const readline = require('readline');
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
Exhibit III. - Forgotten setInterval will also block node, good luck finding it
setInterval(() => { }, 2000);
Would why-is-node-running work? Seems to do exactly what you need.

Javascript Array.sort() not finishing before next action starts [duplicate]

I am confused by some simple behavior I see from readline on() method.
I have a file called small.csv which looks like this:
Date,Close
2015-11-12,2045.97
2015-11-11,2075.00
2015-11-10,2081.72
2015-11-09,2078.58
I wrote this script:
my.js
var rl = require('readline').createInterface({
input: require('fs').createReadStream('small.csv')
});
global.myarray = [];
rl.on('line', function (line) {
console.log('Line from file:', line);
global.myarray.push(line);
});
console.log(global.myarray);
// done
Script Output:
dan#nia111:~/d1d2p $
dan#nia111:~/d1d2p $ node -v
v5.0.0
dan#nia111:~/d1d2p $
dan#nia111:~/d1d2p $
dan#nia111:~/d1d2p $ node my.js
[]
Line from file: Date,Close
Line from file: 2015-11-12,2045.97
Line from file: 2015-11-11,2075.00
Line from file: 2015-11-10,2081.72
Line from file: 2015-11-09,2078.58
dan#nia111:~/d1d2p $
dan#nia111:~/d1d2p $
I want to enhance the script so it fills global.myarray rather than leaving it empty.
When I step through the script with node-debug,
it appears that global.myarray is filling but I think that it is an illusion.
Also when I run
node my.js
it appears that
console.log(global.myarray);
runs before small.csv is read.
So I probably need to understand some asynchronous mechanism at work here.
The following question might be easy for those who understand readline well.
But, I'd be happy to get an answer to this question:
How to enhance my.js so it fills global.myarray rather than leaving it empty?
Because of it's asynchronous nature Node.js can be a bit tricky with this kind of things. It means that when your code is executing, it is firing the rl.on('line') handler and passing to the next call which in our case is the console.log. The problem in your actual code is not that the array is not filling up, it's that you are expecting it to be populated to early. Here is an example of how you could fix your problem:
var rl = require('readline').createInterface({
input: require('fs').createReadStream('small.csv')
});
global.myarray = [];
rl.on('line', function (line) {
console.log('Line from file:', line);
global.myarray.push(line);
});
rl.on('close', function () {
console.log(global.myarray);
});
In this piece of code, when there is no more line to read from, the console.log will be called and there will be some data displayed.

Categories

Resources