Will a child process block the parent process in node.js? - javascript

I'm sorry if this sound like a question I could just google, but I can't quite find the answer to it, or I couldn't understand the explanation.
My assumption is it would, or else how is it possible to pipe a child process' output to the parent process.
But here's what I don't understand:
let { spawn } = require('child_process');
if (process.argv[2] === "child") {
console.log("In if!!");
}else{
const child = spawn(process.execPath, [__filename, "child"]);
child.stdout.on("data", (data) => {
console.log("In else!! ", data.toString());
});
}
Why is it outputting
In else!! In if!!
I thought by spawning a child process, it execute it immediately, so it goes to the if statement, after consoling out In if!!, it resumes to the parent process, than reaches the event listener, thus consoling In else!!. Am I misunderstanding something?
My guess is that the console.log, doesn't actually logs, but return the In if String, then passes it to the parent process, which is the data in the callback. But if that's the case, why doesn't it actually logs?
Thank you for responding in advance.

Yours is a perfectly valid question.
Remember that even if you are spawning multiple processes (and each one will be then individually managed by the system), inside each NodeJS process code execution will remain sigle threaded.
The first thing about your code is that you are using the async version of the spawn command. Child Process is a NodeJS API so its execution will be governed by NodeJS rules (single thread), so it will run as any other async function in NodeJS (new "independent" process will not start working until spawn function executes).
With that being said, your parent process will add spawn to the pending work and it will run it when it finishes the current work (when your script ends).
If you want your parent process to wait for the child process, you will have to use spawnSync command.
See Asynchronous Process Creation and Synchronous Process Creation in the NodeJS Child Process API Documentation for more info.

Related

How to hook process exit event on Express?

process.on('exit', async () => {
console.log('updating')
await campaignHelper.setIsStartedAsFalse()
console.log('exit')
process.exit(1)
})
I'm going to hook the process exit event and update database field before exit.
updating is being shown at exit. But further actions are not executed.
DB is Mongo
Then this code is in dev mode so I'm using ctrl+c to terminate the process.
For Ctrl-C (which you have now added to your question), you can't do what you want to do. node.js does not have that feature.
Since the Ctrl-C is under your control, you could send a command to your server to do a normal shut-down and then it could do your asynchronous work and then call process.exit() rather than you just typing Ctrl-C in the console. This is what many real servers do in production. They have a control port (that is not accessible from the outside world) that you can issue commands to, one of which would be to do a controlled shut-down.
Original answer
(before there was any mention of Ctrl-C being the shut-down initiation)
You can't run asynchronous operations on the exit event (it's too late in the shutdown sequence).
You can run asynchronous operations on the beforeExit event.
But, the beforeExit event is only called if nodejs naturally exits on its own because it's queue of remaining work scheduled is empty (no open sockets, files, timers, etc...). It will not be called if the process exits abnormally (such as an unhandled exception or Ctrl-C) or if process.exit() is called manually.
You can handle the case of manually calling process.exit() by replacing the call to process.exit() in your app with a call to a custom shutdown function that does your housekeeping work and then when that has successfully completed, you then call process.exit().

NodeJS: client.emit('message') needs delay after connection in order to be seen

I'm using socket.io in a Node application. Here is a snippet from my code:
io.sockets.on('connection', socket => {
setTimeout(function () {
console.log('a client connected!')
clients.forEach(s => s.emit('to_client', 'a client connected'))
}, 0)
})
If I remove the setTimeout wrapper, 'a client connected' is not seen in the console of the client (Chrome browser), however, even with a timeout of zero, it does show up. What could be the issue? I would prefer going without the setTimeout since it does not sound like something that should be required here.
Node is an asynchronous single threaded run-time so it uses callbacks to avoid blocking the I/O.
Using the setTimeout is one way (along with nodes built in process.nextTick() method for handling asynchronous code!). Your example code is trying to access clients, I suspect whatever is handling this has not been initialised before your connection callback has executed.
The setTimeout method basically pushes the code (callback function) onto the event queue and therefore anything currently on the call stack will be processed before this setTimeout callback can be run.

Node.js API to spawn off a call to another API

I created a Node.js API.
When this API gets called I return to the caller fairly quickly. Which is good.
But now I also want API to call or launch an different API or function or something that will go off and run on it's own. Kind of like calling a child process with child.unref(). In fact, I would use child.spawn() but I don't see how to have spawn() call another API. Maybe that alone would be my answer?
Of this other process, I don't care if it crashes or finishes without error.
So it doesn't need to be attached to anything. But if it does remain attached to the Node.js console then icing on the cake.
I'm still thinking about how to identify & what to do if the spawn somehow gets caught up in running a really long time. But ready to cross that part of this yet.
Your thoughts on what I might be able to do?
I guess I could child.spawn('node', [somescript])
What do you think?
I would have to explore if my cloud host will permit this too.
You need to specify exactly what the other spawned thing is supposed to do. If it is calling an HTTP API, with Node.js you should not launch a new process to do that. Node is built to run HTTP requests asynchronously.
The normal pattern, if you really need some stuff to happen in a different process, is to use something like a message queue, the cluster module, or other messaging/queue between processes that the worker will monitor, and the worker is usually set up to handle some particular task or set of tasks this way. It is pretty unusual to be spawning another process after receiving an HTTP request since launching new processes is pretty heavy-weight and can use up all of your server resources if you aren't careful, and due to Node's async capabilities usually isn't necessary especially for things mainly involving IO.
This is from a test API I built some time ago. Note I'm even passing a value into the script as a parameter.
router.put('/test', function (req, res, next) {
var u = req.body.u;
var cp = require('child_process');
var c = cp.spawn('node', ['yourtest.js', '"' + u + '"'], { detach: true });
c.unref();
res.sendStatus(200);
});
The yourtest.js script can be just about anything you want it to be. But I thought I would have enjoy learning more if I thought to first treat the script as a node.js console desktop app. FIRST get your yourtest.js script to run without error by manually running/testing it from your console's command line node yourstest.js yourparamtervalue THEN integrate it in to the child.spawn()
var u = process.argv[2];
console.log('f2u', u);
function f1() {
console.log('f1-hello');
}
function f2() {
console.log('f2-hello');
}
setTimeout(f2, 3000); // wait 3 second before execution f2(). I do this just for troubleshooting. You can watch node.exe open and then close in TaskManager if node.exe is running long enough.
f1();

Where is the node.js event queue?

I have seen similar questions on stack overflow but none of them fully dive down into the question that I have? I am familiar with event queues, how they work as well as implementing them. I am new to node.js and I am trying to wrap my head around how Node.js does it.
In a c++ application you would do something along the lines of:
int main(){
std::vector<Handler*> handlers;
BlockingQueue queue = new BlockingQueue();
//Add all the handlers call constructors and other such initialization
//Then run the event loop
while(true){
Event e = queue.pop();
for( std::vector<Handler>::iterator it = handlers.begin(); it != handlers.end(); ++it){
*it.handle(e);
}
}
}
Now in the case of node.js I might have a main file called main.js that looks like.
var http = require("http");
function main(){
// Console will print the message
console.log('Server running at http://127.0.0.1:8080/');
var server = http.createServer(function (request, response) {
// Send the HTTP header
// HTTP Status: 200 : OK
// Content Type: text/plain
response.writeHead(200, {'Content-Type': 'text/plain'});
// Send the response body as "Hello World"
response.end('Hello World\n');
});
server.listen(8080);
console.log('Main completed');
}
main();
I understand the server.listen is attaching a handler to the event queue and that we are adding the callback similar to the c++ example.
My question is. Where is the event queue? Is it in the javascript somewhere or is it built into the interpreter? Also how does the main function get called relative to the main event loop?
Where is the event queue? Is it in the javascript somewhere or is it
built into the interpreter?
The event queue is built into the operating environment that hosts the Javascript interpreter. It isn't fundamental to Javascript itself so it's not part of the actual JS runtime. One interesting indicator of this is that setTimeout() is not actually part of ECMAScript, but rather something made available to the Javascript environment by the host.
The system surrounding the Javascript implementation in node.js keeps track of externally triggered events (timers, networking results, etc...) and when Javascript is not busy executing something and an external event occurs, it then triggers an associated Javascript callback. If Javascript is busy executing something, then it queues that event so that as soon as Javascript is no longer busy, it can then trigger the next event in the queue.
node.js itself uses libuv for the event loop. You can read more about that here. It provides a multi-platform way of doing evented, async I/O that was developed for node.js, but is also being used by some other projects.
Here's a related answer that might also help:
Run Arbitrary Code While Waiting For Callback in Node?
Also how does the main function get called relative to the main event
loop?
When node.js starts up, it is given an initial script file to execute. It loads that script file into memory, parses the Javascript in it and executes it. In your particular example, that will cause the function main to get parsed and then will cause the execution of main() which will run that function.
Loading, parsing and executing the script file passed to node when it starts up is the task given to node.js. It isn't really related to the event queue at all. In some node.js applications, it runs that initial script and then exits (done with its work). In other node.js applications, the initial script starts timers or servers or something like that which will receive events in the future. When that is the case, node.js runs the initial script to completion, but because there are now lasting objects that were created and are listening for events (in your case, a server), nodejs does not shut down the app. It leaves it running so that it can receive these future events when they occur.
One missing piece here is that things like the server object you created allow you to register a callback that will be called one or more times in the future when some particular events occur. This behavior is not built into Javascript. Instead, the code that implements these objects or the TCP functions that they use must maintain a list of callbacks that are registered and when those events occur, it must execute code so that the appropriate callbacks are called and passed the appropriate data. In the case of http.createServer(), it is a mix of Javascript and native code in the nodejs http library that make that work.

Using worker/background processes in node.js vs async call

I want to know if there is any benefit in passing off db or other async calls to a worker process or processes. Specifically I'm using heroku and postgres. I've read up a good bit on node.js and how to structure your server so that the event loop isn't blocked and that smart architecture doesn't leave incoming requests hanging longer than 300ms or so.
Say I have the following:
app.get('/getsomeresults/:query', function(request, response){
var foo = request.params.query;
pg.connect(process.env.DATABASE_URL, function(err, client, done) {
client.query("SELECT * FROM users WHERE cat=$1", [foo],
function(err, result){
//do some stuff with result.rows that may take 1000ms
response.json({some:data})
});
});
});
Being that postgresql is async by nature is there any real benefit to creating a worker process to handle the processing of the results set from the initial db call?
You don't gain any benefit for running async functions in another process because the real work (running the SQL query) is already running in another process (postgres). Basically, the async/event-oriented design pattern is a lightweight process manager for things that run outside your process.
However, I noticed in your comment that the processing in the callback function does indeed take up a lot of CPU time (if that's really the case). That portion of code does benefit from being run in another process - it frees the main process to accept incoming requests.
There are two ways to structure such code. Either run the async function in a separate process (so that the callback doesn't block) or just run the relevant portion of the callback as a function in a separate process.
Calling client.query from a separate process won't give you a real benefit here, as sending queries to the server is already an asynchronous operation in node-pg. However, the real problem is the long execution time your callback function. The callback runs synchronously in the main event loop and blocks other operations, so it would be a good idea to make this non-blocking.
Option 1: Fork a child process
Creating a new process every time the callback is executed is no good idea, since each Node.js process needs its own environment, which is time consuming to set up. Instead it would be better to create multiple server processes when the server is started and let them handle requests concurrently.
Option 2: Use Node.js clusters
Luckily Node.js offers the cluster interface to achieve exactly this. Clusters give you the ability to handle multiple worker processes from one master process. It even supports connection pooling, so you can simply create a HTTP server in each child process an the incoming requests will be distributed among them automatically (node-pg supports pooling as well).
The cluster solution is also nice, because you don't have to change a lot in your code for that. Just write the master process code and start your existing code as workers.
The official documentation on Node.js clusters explains all aspects if clusters very well, so I won't go into details here. Just a short example for a possible master code:
var cluster = require("cluster");
var os = require("os");
var http = require("http");
if (cluster.isMaster)
master();
else
worker();
function master() {
console.info("MASTER "+process.pid+" starting workers");
//Create a worker for each CPU core
var numWorkers = os.cpus().length;
for (var i = 0; i < numWorkers; i++)
cluster.fork();
}
function worker() {
//Put your existing code here
console.info("WORKER "+process.pid+" starting http server");
var httpd = http.createServer();
//...
}
Option 3: Split the result processing
I assume that the reason for the long execution time of the callback function is that you have to process a lot of result rows and that there is no chance to process the results in a faster way.
In that case it might also be a good idea to split the processing into several chunks using process.nextTick(). The chunks will run synchronously in several event-loop frames, but other operations (like event-handlers) can be executed between these chunks. Here's a rough (and untested) scetch how the code could look like:
function(err, result) {
var s, i;
s = 0;
processChunk();
// process 100 rows in one frame
function processChunk() {
i = s;
s += 100;
while (i<result.rows.length && i<s) {
//do some stuff with result.rows[i]
i++;
}
if (i<result.rows.length)
process.nextTick(processChunk);
else
//go on (send the response)
}
}
I'm not 100% sure, but I think node-pg offers some way to receive a query result not as a whole, but split into several chunks. This would simplify the code a lot, so it might be an idea to search into that direction...
Final conclusion
I would use option 2 in the first place and option 3 additionally, if new requests still have to wait too long.

Categories

Resources