Using worker/background processes in node.js vs async call - javascript

I want to know if there is any benefit in passing off db or other async calls to a worker process or processes. Specifically I'm using heroku and postgres. I've read up a good bit on node.js and how to structure your server so that the event loop isn't blocked and that smart architecture doesn't leave incoming requests hanging longer than 300ms or so.
Say I have the following:
app.get('/getsomeresults/:query', function(request, response){
var foo = request.params.query;
pg.connect(process.env.DATABASE_URL, function(err, client, done) {
client.query("SELECT * FROM users WHERE cat=$1", [foo],
function(err, result){
//do some stuff with result.rows that may take 1000ms
response.json({some:data})
});
});
});
Being that postgresql is async by nature is there any real benefit to creating a worker process to handle the processing of the results set from the initial db call?

You don't gain any benefit for running async functions in another process because the real work (running the SQL query) is already running in another process (postgres). Basically, the async/event-oriented design pattern is a lightweight process manager for things that run outside your process.
However, I noticed in your comment that the processing in the callback function does indeed take up a lot of CPU time (if that's really the case). That portion of code does benefit from being run in another process - it frees the main process to accept incoming requests.
There are two ways to structure such code. Either run the async function in a separate process (so that the callback doesn't block) or just run the relevant portion of the callback as a function in a separate process.

Calling client.query from a separate process won't give you a real benefit here, as sending queries to the server is already an asynchronous operation in node-pg. However, the real problem is the long execution time your callback function. The callback runs synchronously in the main event loop and blocks other operations, so it would be a good idea to make this non-blocking.
Option 1: Fork a child process
Creating a new process every time the callback is executed is no good idea, since each Node.js process needs its own environment, which is time consuming to set up. Instead it would be better to create multiple server processes when the server is started and let them handle requests concurrently.
Option 2: Use Node.js clusters
Luckily Node.js offers the cluster interface to achieve exactly this. Clusters give you the ability to handle multiple worker processes from one master process. It even supports connection pooling, so you can simply create a HTTP server in each child process an the incoming requests will be distributed among them automatically (node-pg supports pooling as well).
The cluster solution is also nice, because you don't have to change a lot in your code for that. Just write the master process code and start your existing code as workers.
The official documentation on Node.js clusters explains all aspects if clusters very well, so I won't go into details here. Just a short example for a possible master code:
var cluster = require("cluster");
var os = require("os");
var http = require("http");
if (cluster.isMaster)
master();
else
worker();
function master() {
console.info("MASTER "+process.pid+" starting workers");
//Create a worker for each CPU core
var numWorkers = os.cpus().length;
for (var i = 0; i < numWorkers; i++)
cluster.fork();
}
function worker() {
//Put your existing code here
console.info("WORKER "+process.pid+" starting http server");
var httpd = http.createServer();
//...
}
Option 3: Split the result processing
I assume that the reason for the long execution time of the callback function is that you have to process a lot of result rows and that there is no chance to process the results in a faster way.
In that case it might also be a good idea to split the processing into several chunks using process.nextTick(). The chunks will run synchronously in several event-loop frames, but other operations (like event-handlers) can be executed between these chunks. Here's a rough (and untested) scetch how the code could look like:
function(err, result) {
var s, i;
s = 0;
processChunk();
// process 100 rows in one frame
function processChunk() {
i = s;
s += 100;
while (i<result.rows.length && i<s) {
//do some stuff with result.rows[i]
i++;
}
if (i<result.rows.length)
process.nextTick(processChunk);
else
//go on (send the response)
}
}
I'm not 100% sure, but I think node-pg offers some way to receive a query result not as a whole, but split into several chunks. This would simplify the code a lot, so it might be an idea to search into that direction...
Final conclusion
I would use option 2 in the first place and option 3 additionally, if new requests still have to wait too long.

Related

NodeJS Returning data to client browser

I think we need some help here. Thanks in advance.
I have been doing programming in .Net for desktop applications and have used Timer objects to wait for a task to complete before the task result are shown in a data grid. Recently, we switched over to NodeJs and find it pretty interesting. We could design a small application that executes some tasks using PowerShell scripts and return the data to the client browser. However, I would have to execute a Timer on the client browser (when someone clicks on a button) to see if the file, that Timer receives from the server, has "ENDOFDATA" or not. Once the Timer sees ENDOFDATA it triggers another function to populate DIV with the data that was received from the server.
Is this the right way to get the data from a server? We really don't want to block EventLoop. We run PowerShell scripts on NodeJS to collect users from Active Directory and then send the data back to the client browser. The PowerShell scripts are executed as a Job so EventLoop is not blocked.
Here is an example of the code at NodeJs:
In the below code can we insert something that won't block the EventLoop but still respond to the server once the task is completed? As you can see in the code below, we would like to send the ADUsers.CSV file to the client browser once GetUsers.PS1 has finished executing. Since GetUSers.PS1 takes about five minutes to complete the Event Loop is blocked and the Server can no longer accept any other requests.
app.post("/LoadDomUsers", (request, response) => {
//we check if the request is an AJAX one and if accepts JSON
if (request.xhr || request.accepts("json, html") === "json") {
var ThisAD = request.body.ThisAD
console.log(ThisAD);
ps.addCommand("./public/ps/GetUsers.PS1", [{
name: 'AllParaNow',
value: ScriptPara
}])
ps.addCommand(`$rc = gc ` + __dirname + "/public/TestData/AD/ADUsers.CSV");
ps.addCommand(`$rc`);
ps.invoke().then((output) => {
response.send({ message: output });
console.log(output);
});
}
});
Thank you.
The way you describe your problem isn't that clear. I had to read some of the comments in your initial question just to be sure I understood the issue. Honestly, you could just utilize various CSV NPM packages to read and write from your active directory with NodeJS.
I/O is non-blocking with NodeJS, so you're not actually blocking the EventLoop. You can handle multiple I/O requests, since NodeJS will just create threads for each one,
and continue execution on the main thread until the I/O operations complete and send back the data to its function reference, adding them as functions to the callstack and resuming program execution from those function's references. After you get the I/O data, you just send it back to the client through the response object. There should be no timers needed.
So is the issue once the powershell script runs, you have to wait for that initial script to complete before being able to handle pending requests? I'm still a bit unclear...

Requests to Node.js server timing out due to multiple requests

So I'm not super experienced with node so bear with me.
I have 2 routes on a node.js server. During a typical day, both routes will be hit with requests at the same time. Route 1 will run smoothly but route 2 is a long running processes that returns several promises, so route 1 will take up the resource causing route 2 to pause (I have determined this is happening via data logs).
Route 1 looks like this:
app.post('/route1', function (req, res) {
doStuff().then(function(data){
res.end();
})
}
Route 2 is handling an array of data that needs to be parsed through so 1 record in the array is processed at a time
app.post('/route2', function (req, res){
async function processArray(array) {
for (const item of array) {
await file.test1()(item, res);
await file.test2()(item, res);
//await test3,test4,test5,test6
}
}
processArray(data).then(function() {
res.end();
}
}
So I'm guessing the problem is that the async/await is waiting for resources to become available before it continues to process records.
Is there a way for me to write this to where route1 will not interfere with route2?
In Node, almost everything you can await for (or call then on) is asynchronous. It does not block execution thread but rather offloads the task to another layer you don't control, and then just awaits for it to be finished while being free to work on something else. That includes working with filesystem and network requests. There are ways to block the thread still, for example, using synchronous versions of filesystem methods (like readFileSync instead of readFile) or doing heavy computations on javascript (like calculating factorial of 4569485960485096)
Given your route1 doesn't do any of this, it does not take any resources from route2. They are running in parallel. It's hard to tell without seeing the actual code, but I'm pretty sure you are getting connection timeout because your route2 is poorly written and it takes a long time to resolve (or doesn't resolve at all) for reasons not related to Node performance or blocking. Node is just chilling while waiting for your filesystem to run those endless tests 6 times in every array item (or whatever is going on there) and while this happens, browser stops waiting for response and shows you connection timeout. It's most likely that you don't need to await for every test on every array in the data instead of just running them all in parallel
Read more here https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/
and here https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all
NodeJs is single threaded. This is why you break the cpu/resource instensive services into micro-services.
If these route1 and route2 need to be in the same server then see if you can change the algorithm or way you handle the computation to optimize the performance or break them so that they are handled by different cores in multi-core architecture.
Again if you are talking about production situation with huge user base then its not definately good idea to put them together and run it in the same server.
See this for more information

How to make express Node.JS reply a request during heavy workload?

I'm creating an nodejs web processor. I's is processing time that takes ~ 1 minute. I POST to my server and get status by using GET
this is my simplified code
// Configure Express
const app = express();
app.listen(8080);
// Console
app.post('/clean, async function(req, res, next) {
// start proccess
let result = await worker.process(data);
// Send result when finish
res.send(result);
});
// reply with when asked
app.get('/clean, async function(req, res, next) {
res.send(worker.status);
});
The problem is. The server is working so hard in the POST /clean process that GET /clean are not replied in time.
All GET /clean requests are replied after the worker finishes its task and free the processor to respond the request.
In other words. The application are unable to respond during workload.
How can I get around this situation?
Because node.js runs your Javascript as single threaded (only one piece of Javascript ever running at once) and does not time slice, as long as your worker.process() is running it's synchronous code, no other requests can be processed by your server. This is why worker.process() has to finish before any of the http requests that arrived while it was running get serviced. The node.js event loop is busy until worker.process() is done so it can't service any other events (like incoming http requests).
These are some of the ways to work around that:
Cluster your app with the built-in cluster module so that you have a bunch of processes that can either work on worker.process() code or handle incoming http requests.
When it's time to call worker.process(), fire up a new node.js process, run the processing there and communicate back the result with standard interprocess communication. Then, your main node.js process stays reading to handle incoming http requests near instantly as they arrive.
Create a work queue of a group of additional node.js processes that run jobs that are put in the queue and configure these processes to be able to run your worker.process() code from the queue. This is a variation of #2 that bounds the number of processes and serializes the work into a queue (better controlled than #2).
Rework the way worker.process() does its work so that it can do a few ms of work at a time, then return back to the message loop so other events can run (like incoming http requests) and then resume it's work afterwards for a few more ms at a time. This usually requires building some sort of stateful object that can do a little bit of work at a time each time it is called, but is often a pain to program effectively.
Note that #1, #2 and #3 all require that the work be done in other processes. That means that the process.status() will need to get the status from those other processes. So, you will either need some sort of interprocess way of communicating with the other processes or you will need to store the status as you go in some storage that is accessible from all processes (such as redis) so it can just be retrieved from there.
There's no working around the single-threaded nature of JS short of converting your service to a cluster of processes or to use something experimental like Worker Threads.
If neither of these options work for you, you'll need to yield up the processing thread periodically to give other tasks the ability to work on things:
function workPart1() {
// Do a bunch of stuff
setTimeout(workPart2, 10);
}
function workPart2() {
// More stuff
setTimeout(workPart3, 10); // etc.
}

Node.js API to spawn off a call to another API

I created a Node.js API.
When this API gets called I return to the caller fairly quickly. Which is good.
But now I also want API to call or launch an different API or function or something that will go off and run on it's own. Kind of like calling a child process with child.unref(). In fact, I would use child.spawn() but I don't see how to have spawn() call another API. Maybe that alone would be my answer?
Of this other process, I don't care if it crashes or finishes without error.
So it doesn't need to be attached to anything. But if it does remain attached to the Node.js console then icing on the cake.
I'm still thinking about how to identify & what to do if the spawn somehow gets caught up in running a really long time. But ready to cross that part of this yet.
Your thoughts on what I might be able to do?
I guess I could child.spawn('node', [somescript])
What do you think?
I would have to explore if my cloud host will permit this too.
You need to specify exactly what the other spawned thing is supposed to do. If it is calling an HTTP API, with Node.js you should not launch a new process to do that. Node is built to run HTTP requests asynchronously.
The normal pattern, if you really need some stuff to happen in a different process, is to use something like a message queue, the cluster module, or other messaging/queue between processes that the worker will monitor, and the worker is usually set up to handle some particular task or set of tasks this way. It is pretty unusual to be spawning another process after receiving an HTTP request since launching new processes is pretty heavy-weight and can use up all of your server resources if you aren't careful, and due to Node's async capabilities usually isn't necessary especially for things mainly involving IO.
This is from a test API I built some time ago. Note I'm even passing a value into the script as a parameter.
router.put('/test', function (req, res, next) {
var u = req.body.u;
var cp = require('child_process');
var c = cp.spawn('node', ['yourtest.js', '"' + u + '"'], { detach: true });
c.unref();
res.sendStatus(200);
});
The yourtest.js script can be just about anything you want it to be. But I thought I would have enjoy learning more if I thought to first treat the script as a node.js console desktop app. FIRST get your yourtest.js script to run without error by manually running/testing it from your console's command line node yourstest.js yourparamtervalue THEN integrate it in to the child.spawn()
var u = process.argv[2];
console.log('f2u', u);
function f1() {
console.log('f1-hello');
}
function f2() {
console.log('f2-hello');
}
setTimeout(f2, 3000); // wait 3 second before execution f2(). I do this just for troubleshooting. You can watch node.exe open and then close in TaskManager if node.exe is running long enough.
f1();

understanding of node js performance

I recently discovered Node js and I read in various articles that Node js is fast and can handle more requests than a Java server although Node js use a single thread.
I understood that Node is based on an event loop, each call to a remote api or a database is done with an async call so the main thread is never blocked and the server can continue to handle others client requests.
If I understood well, each portion of code that can take times should be processed with an async call otherwise the server will be blocked and it won't be able to handle others requests ?
var server = http.createServer(function (request, response) {
//CALL A METHOD WHICH CAN TAKE LONG TIME TO EXECUTE
slowSyncMethod();
//THE SERVER WILL STILL BE ABLE TO HANDLER OTHERS REQUESTS ??
response.writeHead(200, {"Content-Type":"text/plain"});
response.end("");
});
So if my understanding is correct, the above code is bad because the synchronous call to the slow method will block the Node js main thread ? Is Node js fast on condition that all the code that can take times are executed in an async manner ?
NodeJs is as fast as your hardware(vm) and the v8 that is running it. that being said, any heavy duty task like any type of media(music, image, video etc) file processing will definitively lock your application. so will computation on large collections thats why the async model is leveraged though events, and deferred invocations. that being said nothing stops you from spawning child processes to relegate heavy duty and asynchronously get back the result. But if you are finding your self in the need to do this for many tasks, maybe you should revisit your architecture.
I hope thhis helps

Categories

Resources