buffer and stream - how are they related? - javascript

I am putting some code here:
const { createReadStream, ReadStream } = require('fs');
var readStream = createReadStream('./data.txt');
readStream.on('data', chunk => {
console.log('---------------------------------');
console.log(chunk);
console.log('---------------------------------');
});
readStream.on('open', () => {
console.log('Stream opened...');
});
readStream.on('end', () => {
console.log('Stream Closed...');
});
So, stream is the movement of data from one place to another. In this case, from data.txt file to my eyes since i have to read it.
I've read in google something like this:
Typically, the movement of data is usually with the intention to
process it, or read it, and make decisions based on it. But there is a
minimum and a maximum amount of data a process could take over time.
So if the rate the data arrives is faster than the rate the process
consumes the data, the excess data need to wait somewhere for its turn
to be processed.
On the other hand, if the process is consuming the data faster than it
arrives, the few data that arrive earlier need to wait for a certain
amount of data to arrive before being sent out for processing.
My question is: which line of code is "consuming the data, processing the data" ? is it console.log(chunk) ? if I had a huge time consuming line of code instead of console.log(chunk), how would my code not grab more data from buffer and wait until my processing is done ? in the above code, it seems like, it would still come into readStream.on('data')'s callback..

My question is: which line of code is "consuming the data, processing the data"
The readStream.on('data', ...) event handler is the code that "consumes" or "processes" the data.
if I had a huge time consuming line of code instead of console.log(chunk), how would my code not grab more data from buffer and wait until my processing is done ?
If the time consuming code is synchronous (e.g. blocking), then no more data events can happen until after your synchronous code is done because only your event handler is running (in the single-threaded event loop driven architecture of node.js). No more data events will be generated until you return control back from your event handler callback function.
If the time consuming code is asynchronous (e.g. non-blocking and thus has returned control back to the event loop), then more data events certainly can happen even though a prior data event handler has not entirely finished it's asynchronous work yet. It is sometimes appropriate to call readStream.pause() while doing asynchronous work to tell the readStream not to generate any more data events until you are ready for them and you can then readStream.resume().

Related

Call stack size exceeded on re-starting Node function

I'm trying to overcome Call stack size exceeded error but with no luck,
Goal is to re-run the GET request as long as I get music in type field.
//tech: node.js + mongoose
//import components
const https = require('https');
const options = new URL('https://www.boredapi.com/api/activity');
//obtain data using GET
https.get(options, (response) => {
//console.log('statusCode:', response.statusCode);
//console.log('headers:', response.headers);
response.on('data', (data) => {
//process.stdout.write(data);
apiResult = JSON.parse(data);
apiResultType = apiResult.type;
returnDataOutside(data);
});
})
.on('error', (error) => {
console.error(error);
});
function returnDataOutside(data){
apiResultType;
if (apiResultType == 'music') {
console.log(apiResult);
} else {
returnDataOutside(data);
console.log(apiResult); //Maximum call stack size exceeded
};
};
Your function returnDataOutside() is calling itself recursively. If it doesn't gets an apiResultType of 'music' on the first time, then it just keeps calling itself deeper and deeper until the stack overflows with no chance of ever getting the music type because you're just calling it with the same data over and over.
It appears that you want to rerun the GET request when you don't have music type, but your code is not doing that - it's just calling your response function over and over. So, instead, you need to put the code that makes the GET request into a function and call that new function that actually makes a fresh GET request when the apiResultType isn't what you want.
In addition, you shouldn't code something like this that keeping going forever hammering some server. You should have either a maximum number of times you try or a timer back-off or both.
And, you can't just assume that response.on('data', ...) contains a perfectly formed piece of JSON. If the data is anything but very small, then the data may arrive in any arbitrary sized chunks. It make take multiple data events to get your entire payload. And, this may work on fast networks, but not on slow networks or through some proxies, but not others. Instead, you have to accumulate the data from the entire response (all the data events that occur) concatenated together and then process that final result on the end event.
While, you can code the plain https.get() to collect all the results for you (there's an example of that right in the doc here), it's a lot easier to just use a higher level library that brings support for a bunch of useful things.
My favorite library to use in this regard is got(), but there's a list of alternatives here and you can find the one you like. Not only do these libraries accumulate the entire request for you with you writing any extra code, but they are promise-based which makes the asynchronous coding easier and they also automatically check status code results for you, follow redirects, etc... - many things you would want an http request library to "just handle" for you.

JavaScript Why is some code getting executed before the rest?

I've mostly learned coding with OOPs like Java.
I have a personal project where I want to import a bunch of plaintext into a mongodb. I thought I'd try to expand my horizons and do this with using node.js powered JavaScript.
I got the code working fine but I'm trying to figure out why it is executing the way it is.
The output from the console is:
1. done reading file
2. closing db
3. record inserted (n times)
var fs = require('fs'),
readline = require('readline'),
instream = fs.createReadStream(config.file),
outstream = new (require('stream'))(),
rl = readline.createInterface(instream, outstream);
rl.on('line', function (line) {
var split = line.split(" ");
_user = "#" + split[0];
_text = "'" + split[1] + "'";
_addedBy = config._addedBy;
_dateAdded = new Date().toISOString();
quoteObj = { user : _user , text : _text , addedby : _addedBy, dateadded : _dateAdded};
db.collection("quotes").insertOne(quoteObj, function(err, res) {
if (err) throw err;
console.log("record inserted.");
});
});
rl.on('close', function (line) {
console.log('done reading file.');
console.log('closing db.')
db.close();
});
(full code is here: https://github.com/HansHovanitz/Import-Stuff/blob/master/importStuff.js)
When I run it I get the message 'done reading file' and 'closing db' and then all of the 'record inserted' messages. Why is that happening? Is it because of the delay in inserting a record in the db? The fact that I see 'closing db' first makes me think that the db would be getting closed and then how are the records being inserted still?
Just curious to know why the program is executing in this order for my own peace of mind. Thanks for any insight!
In short, it's because of asynchronous nature of I/O operations in the used functions - which is quite common for Node.js.
Here's what happens. First, the script reads all the lines of the file, and for each line initiates db.insertOne() operation, supplying a callback for each of them. Note that the callback will be called when the corresponding operation is finished, not in the middle of this process.
Eventually the script reaches the end of the input file, logs two messages, then invokes db.close() line. Note that even though 'insert' callbacks (that log 'inserted' message) are not called yet, the database interface has already received all the 'insert' commands.
Now the tricky part: whether or not DB interface succeeds to store all the DB records (in other words, whether or not it'll wait until all the insert operations are completed before closing the connection) is up both to DB interface and its speed. If write op is fast enough (faster than reading the file line), you'll probably end up with all the records been inserted; if not, you can miss some of them. That's why it's a safest bet to close the connection to database not in the file close (when the reading is complete), but in insert callbacks (when the writing is complete):
let linesCount = 0;
let eofReached = false;
rl.on('line', function (line) {
++linesCount;
// parsing skipped for brevity
db.collection("quotes").insertOne(quoteObj, function(err, res) {
--linesCount;
if (linesCount === 0 && eofReached) {
db.close();
console.log('database close');
}
// the rest skipped
});
});
rl.on('close', function() {
console.log('reading complete');
eofReached = true;
});
This question describes the similar problem - and several different approaches to solve it.
Welcome to the world of asynchronicity. Inserting into the DB happens asynchronously. This means that the rest of your (synchronous) code will execute completely before this task is complete. Consider the simplest asynchronous JS function setTimeout. It takes two arguments, a function and a time (in ms) after which to execute the function. In the example below "hello!" will log before "set timeout executed" is logged, even though the time is set to 0. Crazy right? That's because setTimeout is asynchronous.
This is one of the fundamental concepts of JS and it's going to come up all the time, so watch out!
setTimeout(() => {
console.log("set timeout executed")
}, 0)
console.log("hello!")
When you call db.collection("quotes").insertOne you're actually creating an asynchronous request to the database, a good way to determine if a code will be asynchronous or not is if one (or more) of its parameters is a callback.
So the order you're running it is actually expected:
You instantiate rl
You bind your event handlers to rl
Your stream starts processing & calling your 'line' handler
Your 'line' handler opens asynchronous requests
Your stream ends and rl closes
...
4.5. Your asynchronous requests return and execute their callbacks
I labelled the callback execution as 4.5 because technically your requests can return at anytime after step 4.
I hope this is a useful explanation, most modern javascript relies heavily on asynchronous events and it can be a little tricky to figure out how to work with them!
You're on the right track. The key is that the database calls are asychronous. As the file is being read, it starts a bunch of async calls to the database. Since they are asynchronous, the program doesn't wait for them to complete at the time they are called. The file then closes. As the async calls complete, your callbacks runs and the console.logs execute.
Your code reads lines and immediately after that makes a call to the db - both asynchronous processes. When the last line is read the last request to the db is made and it takes some time for this request to be processed and the callback of the insertOne to be executed. Meanwhile the r1 has done it's job and triggers the close event.

Non blocking Javascript and concurrency

I have code on a web-worker and because i can't post to it an object with methods(functions) , i dont know how to stop blocking the UI with this code:
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
obj.tbl.clear();
obj.tbl.rows.add(obj['backupAllData']);
var ext = config.extension.substring(1);
$.fn.dataTable.ext.buttons[ext + 'Html5'].action(e, dt, button, config);
obj.tbl.clear();
obj.tbl.rows.add(obj['backupData'])
This code exports records from an html table. Data is an array and is returned from a web worker and sometimes can have 50k or more objects.
As obj and all the methods that it contains are not transferable to we-worker, when data length 30k ,40k or 50k or even more, the UI blocks.
which is the best way to do this?
Thanks in advance.
you could try wrapping the heavy work in an async function like a timeout to allow the engine to queue the whole logic and elaborate it as soon as it has time
setTimeout(function(){
if (data != 'null') {
obj['backupData'] = obj.tbl.data().toArray();
obj['backupAllData'] = data[0];
}
//heavy stuff
}, 0)
or , if the code is extremely long, you can try figure it out a strategy to split your code into chunk of operation and execute each chunk in a separate async function (timeout)
Best way to iterate over an array without blocking the UI
Update:
Sadly, ImmutableJS doesn't work at the moment across webworkers. You should be able to transfer the ArrayBuffer so you don't need to parse it back into an array. Also read this article. If your workload is that heavy, it would be best to actually send back one item at a time from the worker.
Previously:
The code is converting all the data into an array, which is immediately costly. Try returning an immutable data structure from web worker if possible. This will guarantee that it doesn't change when the references change and you can continue iterating over it slowly in batches.
The next thing you can do is to use requestIdleCallback to schedule small batches of items to be processed.
This way you should be able to make the UI breathe a bit.

nodeJS wait for event that cannot be promisified

I'm trying to read an STDIN PIPE from my nodejs file and make a POST request to an URL with every line given fom STDIN then wait for the response, read next line, send, wait etc.
'use strict';
const http = require('http');
const rl = require('readline').createInterface(process.stdin,null);
rl.on('line', function (line) {
makeRequest(line); // I need to wait calling the next callback untill the previous finishes
}).on('close',function(){
process.exit(0);
});
the problem is, rl.on('line') will instantly read thousands of lines from my pipe and launch thousands of requests instantly what will lead into an EMFILE exception. I know this is the expected behavior of non-blocking IO but in this case, one cannot use promises/futures because .on('line') is a callback itself and I cannot manipulate it to not trigger without loosing data from my input.
So, if callbacks cannot be used and timeout hacks aren't elegant enough how can one break out of the curse of nonblockIO?
Keep a counter of active requests (increment on send, decrement on response). Once the counter exceeds a constant (say, 200), (check on every 'line' event) call rl.pause(). On every response, check if the counter is smaller than your constant, and if it is, call rl.resume(). This should limit the rate of requests and current lines in memory, and fix your problem.
Node's readline class has pause and resume functions that defer to the underlying stream equivalents. These functions are specifically made for throttling parts of a pipeline to assist with bottlenecks. See the following example from the stream.Readable.pause documentation:
var readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(() => {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});
That gives you fine grained control over how much data flows into your URL fetching code.

Read exactly n bytes asynchronously

I am working on a node.js project. Is it possible to read exactly n bytes asynchronously from a stream?
Usually, if I want to read a stream asynchronously, I use events. The problem is that I need to process the rest of the stream asynchronously, too.
If I listen for the data event, I can use the rest of the stream later, but I cannot control how many bytes I want to read at once. I tried to use unshift to put the unused bytes back into the buffer but this does not seem to fire the data event when another listener is added later.
This question is similar, but the only answer is a synchronous solution.
Is there an option to limit the number of bytes being passed to the data event listeners? Is it possible to somehow push the bytes back into the stream and still make them accessible through events?
As long as you're listening to the readable event and not doing a blocking look calling stream.read(n);, then that solution is asynchronous. Something like the following (untested!) should get you what you want.
function streamChunk(stream, size, dataCallback, doneCallback) {
function getChunk() {
var data = stream.read(size);
if(data != null) {
dataCallback(data);
setImmediate(getChunk);
}
}
stream.on('readable', getChunk);
stream.on('end', doneCallback);
}

Categories

Resources