I'm trying to read an STDIN PIPE from my nodejs file and make a POST request to an URL with every line given fom STDIN then wait for the response, read next line, send, wait etc.
'use strict';
const http = require('http');
const rl = require('readline').createInterface(process.stdin,null);
rl.on('line', function (line) {
makeRequest(line); // I need to wait calling the next callback untill the previous finishes
}).on('close',function(){
process.exit(0);
});
the problem is, rl.on('line') will instantly read thousands of lines from my pipe and launch thousands of requests instantly what will lead into an EMFILE exception. I know this is the expected behavior of non-blocking IO but in this case, one cannot use promises/futures because .on('line') is a callback itself and I cannot manipulate it to not trigger without loosing data from my input.
So, if callbacks cannot be used and timeout hacks aren't elegant enough how can one break out of the curse of nonblockIO?
Keep a counter of active requests (increment on send, decrement on response). Once the counter exceeds a constant (say, 200), (check on every 'line' event) call rl.pause(). On every response, check if the counter is smaller than your constant, and if it is, call rl.resume(). This should limit the rate of requests and current lines in memory, and fix your problem.
Node's readline class has pause and resume functions that defer to the underlying stream equivalents. These functions are specifically made for throttling parts of a pipeline to assist with bottlenecks. See the following example from the stream.Readable.pause documentation:
var readable = getReadableStreamSomehow();
readable.on('data', (chunk) => {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(() => {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});
That gives you fine grained control over how much data flows into your URL fetching code.
Related
I'm trying to overcome Call stack size exceeded error but with no luck,
Goal is to re-run the GET request as long as I get music in type field.
//tech: node.js + mongoose
//import components
const https = require('https');
const options = new URL('https://www.boredapi.com/api/activity');
//obtain data using GET
https.get(options, (response) => {
//console.log('statusCode:', response.statusCode);
//console.log('headers:', response.headers);
response.on('data', (data) => {
//process.stdout.write(data);
apiResult = JSON.parse(data);
apiResultType = apiResult.type;
returnDataOutside(data);
});
})
.on('error', (error) => {
console.error(error);
});
function returnDataOutside(data){
apiResultType;
if (apiResultType == 'music') {
console.log(apiResult);
} else {
returnDataOutside(data);
console.log(apiResult); //Maximum call stack size exceeded
};
};
Your function returnDataOutside() is calling itself recursively. If it doesn't gets an apiResultType of 'music' on the first time, then it just keeps calling itself deeper and deeper until the stack overflows with no chance of ever getting the music type because you're just calling it with the same data over and over.
It appears that you want to rerun the GET request when you don't have music type, but your code is not doing that - it's just calling your response function over and over. So, instead, you need to put the code that makes the GET request into a function and call that new function that actually makes a fresh GET request when the apiResultType isn't what you want.
In addition, you shouldn't code something like this that keeping going forever hammering some server. You should have either a maximum number of times you try or a timer back-off or both.
And, you can't just assume that response.on('data', ...) contains a perfectly formed piece of JSON. If the data is anything but very small, then the data may arrive in any arbitrary sized chunks. It make take multiple data events to get your entire payload. And, this may work on fast networks, but not on slow networks or through some proxies, but not others. Instead, you have to accumulate the data from the entire response (all the data events that occur) concatenated together and then process that final result on the end event.
While, you can code the plain https.get() to collect all the results for you (there's an example of that right in the doc here), it's a lot easier to just use a higher level library that brings support for a bunch of useful things.
My favorite library to use in this regard is got(), but there's a list of alternatives here and you can find the one you like. Not only do these libraries accumulate the entire request for you with you writing any extra code, but they are promise-based which makes the asynchronous coding easier and they also automatically check status code results for you, follow redirects, etc... - many things you would want an http request library to "just handle" for you.
I am putting some code here:
const { createReadStream, ReadStream } = require('fs');
var readStream = createReadStream('./data.txt');
readStream.on('data', chunk => {
console.log('---------------------------------');
console.log(chunk);
console.log('---------------------------------');
});
readStream.on('open', () => {
console.log('Stream opened...');
});
readStream.on('end', () => {
console.log('Stream Closed...');
});
So, stream is the movement of data from one place to another. In this case, from data.txt file to my eyes since i have to read it.
I've read in google something like this:
Typically, the movement of data is usually with the intention to
process it, or read it, and make decisions based on it. But there is a
minimum and a maximum amount of data a process could take over time.
So if the rate the data arrives is faster than the rate the process
consumes the data, the excess data need to wait somewhere for its turn
to be processed.
On the other hand, if the process is consuming the data faster than it
arrives, the few data that arrive earlier need to wait for a certain
amount of data to arrive before being sent out for processing.
My question is: which line of code is "consuming the data, processing the data" ? is it console.log(chunk) ? if I had a huge time consuming line of code instead of console.log(chunk), how would my code not grab more data from buffer and wait until my processing is done ? in the above code, it seems like, it would still come into readStream.on('data')'s callback..
My question is: which line of code is "consuming the data, processing the data"
The readStream.on('data', ...) event handler is the code that "consumes" or "processes" the data.
if I had a huge time consuming line of code instead of console.log(chunk), how would my code not grab more data from buffer and wait until my processing is done ?
If the time consuming code is synchronous (e.g. blocking), then no more data events can happen until after your synchronous code is done because only your event handler is running (in the single-threaded event loop driven architecture of node.js). No more data events will be generated until you return control back from your event handler callback function.
If the time consuming code is asynchronous (e.g. non-blocking and thus has returned control back to the event loop), then more data events certainly can happen even though a prior data event handler has not entirely finished it's asynchronous work yet. It is sometimes appropriate to call readStream.pause() while doing asynchronous work to tell the readStream not to generate any more data events until you are ready for them and you can then readStream.resume().
I've mostly learned coding with OOPs like Java.
I have a personal project where I want to import a bunch of plaintext into a mongodb. I thought I'd try to expand my horizons and do this with using node.js powered JavaScript.
I got the code working fine but I'm trying to figure out why it is executing the way it is.
The output from the console is:
1. done reading file
2. closing db
3. record inserted (n times)
var fs = require('fs'),
readline = require('readline'),
instream = fs.createReadStream(config.file),
outstream = new (require('stream'))(),
rl = readline.createInterface(instream, outstream);
rl.on('line', function (line) {
var split = line.split(" ");
_user = "#" + split[0];
_text = "'" + split[1] + "'";
_addedBy = config._addedBy;
_dateAdded = new Date().toISOString();
quoteObj = { user : _user , text : _text , addedby : _addedBy, dateadded : _dateAdded};
db.collection("quotes").insertOne(quoteObj, function(err, res) {
if (err) throw err;
console.log("record inserted.");
});
});
rl.on('close', function (line) {
console.log('done reading file.');
console.log('closing db.')
db.close();
});
(full code is here: https://github.com/HansHovanitz/Import-Stuff/blob/master/importStuff.js)
When I run it I get the message 'done reading file' and 'closing db' and then all of the 'record inserted' messages. Why is that happening? Is it because of the delay in inserting a record in the db? The fact that I see 'closing db' first makes me think that the db would be getting closed and then how are the records being inserted still?
Just curious to know why the program is executing in this order for my own peace of mind. Thanks for any insight!
In short, it's because of asynchronous nature of I/O operations in the used functions - which is quite common for Node.js.
Here's what happens. First, the script reads all the lines of the file, and for each line initiates db.insertOne() operation, supplying a callback for each of them. Note that the callback will be called when the corresponding operation is finished, not in the middle of this process.
Eventually the script reaches the end of the input file, logs two messages, then invokes db.close() line. Note that even though 'insert' callbacks (that log 'inserted' message) are not called yet, the database interface has already received all the 'insert' commands.
Now the tricky part: whether or not DB interface succeeds to store all the DB records (in other words, whether or not it'll wait until all the insert operations are completed before closing the connection) is up both to DB interface and its speed. If write op is fast enough (faster than reading the file line), you'll probably end up with all the records been inserted; if not, you can miss some of them. That's why it's a safest bet to close the connection to database not in the file close (when the reading is complete), but in insert callbacks (when the writing is complete):
let linesCount = 0;
let eofReached = false;
rl.on('line', function (line) {
++linesCount;
// parsing skipped for brevity
db.collection("quotes").insertOne(quoteObj, function(err, res) {
--linesCount;
if (linesCount === 0 && eofReached) {
db.close();
console.log('database close');
}
// the rest skipped
});
});
rl.on('close', function() {
console.log('reading complete');
eofReached = true;
});
This question describes the similar problem - and several different approaches to solve it.
Welcome to the world of asynchronicity. Inserting into the DB happens asynchronously. This means that the rest of your (synchronous) code will execute completely before this task is complete. Consider the simplest asynchronous JS function setTimeout. It takes two arguments, a function and a time (in ms) after which to execute the function. In the example below "hello!" will log before "set timeout executed" is logged, even though the time is set to 0. Crazy right? That's because setTimeout is asynchronous.
This is one of the fundamental concepts of JS and it's going to come up all the time, so watch out!
setTimeout(() => {
console.log("set timeout executed")
}, 0)
console.log("hello!")
When you call db.collection("quotes").insertOne you're actually creating an asynchronous request to the database, a good way to determine if a code will be asynchronous or not is if one (or more) of its parameters is a callback.
So the order you're running it is actually expected:
You instantiate rl
You bind your event handlers to rl
Your stream starts processing & calling your 'line' handler
Your 'line' handler opens asynchronous requests
Your stream ends and rl closes
...
4.5. Your asynchronous requests return and execute their callbacks
I labelled the callback execution as 4.5 because technically your requests can return at anytime after step 4.
I hope this is a useful explanation, most modern javascript relies heavily on asynchronous events and it can be a little tricky to figure out how to work with them!
You're on the right track. The key is that the database calls are asychronous. As the file is being read, it starts a bunch of async calls to the database. Since they are asynchronous, the program doesn't wait for them to complete at the time they are called. The file then closes. As the async calls complete, your callbacks runs and the console.logs execute.
Your code reads lines and immediately after that makes a call to the db - both asynchronous processes. When the last line is read the last request to the db is made and it takes some time for this request to be processed and the callback of the insertOne to be executed. Meanwhile the r1 has done it's job and triggers the close event.
I'm using spawn-child npm package to spawn a shell where i run a binary file which was originally built on C++. I provide Stdin's to the binary and then the binary would be sending out the Stdout's constantly for every second. On the node part once i start receiving the Stdout's from binary i have an on listener which would look something like stdout.on('data', function (data) {}) where i send these data's to the SSE channel.
Everything is working fine but the major concern is the constant memory growth of node process that i see when i hit the binary everytime with an Stdin. I have outlined how my code looks, is there an elegant way to control this memory growth, if so please share.
var sseChannel = require('sse-channel'),
spawnCommand = require('spawn-command'),
cmd = 'path to the binary file',
globalArray = [],
uuid = require('uuid');
module.exports = function(app) {
var child = spawnCommand(cmd),
privateChannel = new sseChannel({
historySize: 0,
cors: {
origins: ['*']
},
pingInterval: 15 * 1000,
jsonEncode: false
});
srvc = {
get: function(req, res) {
globalArray[uuid.v4()] = res;
child.stdin.write('a json object in a format that is expected by binary' + '\n'); // req.query.<queryVal>
child.stdout.on('data', function(data) {
privateChannel.send(JSON.stringify(data));
});
},
delete: function(sessionID) {
var response = globalArray[sessionID];
privateChannel.removeClient(response);
response.end();
delete globalArray[sessionID];
}
}
}
This code is just to enumerate how it would look in the app. Hitting the Run code snippet would not work in this case.
I collected heapdump at 2 different intervals and this is how the statistics looks, there is a tremendous increase in the Typed Array value, what could be done to maintain or suppress the growth of Typed Array,
The problem is that you're spawning a process once and then adding a new data event handler for every request to your http server that never gets removed. So this would explain why the memory usage never drops even after gc.
Another (unrelated) problem is that if you are using your single child process to process multiple incoming requests, you can run into the problem of mixing responses for different requests (you cannot assume that one data event will contain only the data for a particular request). If the child process is node.js-based, you could set up an ipc channel with it and then just pass regular JavaScript values back and forth instead of setting up stdout handling/parsing. If the child isn't node.js-based or you want an alternative (no-ipc) solution, you could set up a queue that all requests get pushed onto and then have a function that processes the queue and responds to each request serially (only moving onto the next request once you have somehow determined you have received all output from the child process for the current request).
If you instead meant for the child process to only be used for a single request, you will need to tweak your code to spawn once per request instead (moving spawn() inside get()).
I'm working on an online, turned based game in order to teach myself Node.js and Socket.IO. Some aspects of the game are resolved serverside. At one point during one of these functions, the server may require input from the clients. Is there a way I can "pause" the resolution of the server's function in order to wait for the clients to respond (via a var x = window.prompt)?
Here's an idea of the code I'm working with:
Server:
for (some loop){
if (some condition){
request input via io.sockets.socket(userSocket[i]).emit('requestInput', data)
}
}
Client:
socket.on('requestInput', function (data) {
var input = window.prompt('What is your input regarding ' + data + '?');
//send input back to the server
socket.emit('refresh', input)
});
Any thoughts?
I don't think that is possible.
for (some loop){
if (some condition){
request input via io.sockets.socket(userSocket[i]).emit('requestInput', data)
/* Even if you were able to pause the execution here, there is no way to resume it when client emits the 'refresh' event with user input */
}
}
What you can do instead is emit all 'requestInput' events without pausing and save all responses you will get in socket.on('refresh',function(){}) event in an array, then you can process this array later. I don't know what your exact requirement is but let me know if that works.
Since you are emitting socket.emit('refresh', input) on the client side, you just need to set up a socket event listener on the server side as well. For example:
io.sockets.on('connection', function (socket) {
socket.on('refresh', function (data) {
console.log(data) //input
});
})
I will also point out, so that you don't run into trouble down the line, that indefinite loops are a big nono in node. Nodejs runs on a single thread so you are actually blocking ALL clients as long as your loop is running.