understanding of node js performance

understanding of node js performance - javascript

I recently discovered Node js and I read in various articles that Node js is fast and can handle more requests than a Java server although Node js use a single thread.
I understood that Node is based on an event loop, each call to a remote api or a database is done with an async call so the main thread is never blocked and the server can continue to handle others client requests.
If I understood well, each portion of code that can take times should be processed with an async call otherwise the server will be blocked and it won't be able to handle others requests ?
var server = http.createServer(function (request, response) {
//CALL A METHOD WHICH CAN TAKE LONG TIME TO EXECUTE
slowSyncMethod();
//THE SERVER WILL STILL BE ABLE TO HANDLER OTHERS REQUESTS ??
response.writeHead(200, {"Content-Type":"text/plain"});
response.end("");
});
So if my understanding is correct, the above code is bad because the synchronous call to the slow method will block the Node js main thread ? Is Node js fast on condition that all the code that can take times are executed in an async manner ?

NodeJs is as fast as your hardware(vm) and the v8 that is running it. that being said, any heavy duty task like any type of media(music, image, video etc) file processing will definitively lock your application. so will computation on large collections thats why the async model is leveraged though events, and deferred invocations. that being said nothing stops you from spawning child processes to relegate heavy duty and asynchronously get back the result. But if you are finding your self in the need to do this for many tasks, maybe you should revisit your architecture.
I hope thhis helps

Related

Handle Multiple Concurent Requests for Express Sever on Same Endpoint API

this question might be duplicated but I am still not getting the answer. I am fairly new to node.js so I might need some help. Many have said that node.js is perfectly free to run incoming requests asynchronously, but the code below shows that if multiple requests hit the same endpoint, say /test3, the callback function will:
Print "test3"
Call setTimeout() to prevent blocking of event loop
Wait for 5 seconds and send a response of "test3" to the client
My question here is if client 1 and client 2 call /test3 endpoint at the same time, and the assumption here is that client 1 hits the endpoint first, client 2 has to wait for client 1 to finish first before entering the event loop.
Can anybody here tells me if it is possible for multiple clients to call a single endpoint and run concurrently, not sequentially, but something like 1 thread per connection kind of analogy.
Of course, if I were to call other endpoint /test1 or /test2 while the code is still executing on /test3, I would still get a response straight from /test2, which is "test2" immediately.
app.get("/test1", (req, res) => {
console.log("test1");
setTimeout(() => res.send("test1"), 5000);
});
app.get("/test2", async (req, res, next) => {
console.log("test2");
res.send("test2");
});
app.get("/test3", (req, res) => {
console.log("test3");
setTimeout(() => res.send("test3"), 5000);
});

For those who have visited, it has got nothing to do with blocking of event loop.
I have found something interesting. The answer to the question can be found here.
When I was using chrome, the requests keep getting blocked after the first request. However, with safari, I was able to hit the endpoint concurrently. For more details look at the following link below.
GET requests from Chrome browser are blocking the API to receive further requests in NODEJS

Run your application in cluster. Lookup Pm2

This question needs more details to be answer and is clearly an opinion-based question. just because it is an strawman argument I will answer it.
first of all we need to define run concurrently, it is ambiguous if we assume the literal meaning in stric theory nothing RUNS CONCURRENTLY
CPUs can only carry out one instruction at a time.
The speed at which the CPU can carry out instructions is called the clock speed. This is controlled by a clock. With every tick of the clock, the CPU fetches and executes one instruction. The clock speed is measured in cycles per second, and 1c/s is known as 1 hertz. This means that a CPU with a clock speed of 2 gigahertz (GHz) can carry out two thousand million (or two billion for those in the US) for the rest of us/world 2000 million cycles per second.
cpu running multiple task "concurrently"
yes you're right now-days computers even cell phones comes with multi core which means the number of tasks running at the same time will depend upon the number of cores, but If you ask any expert such as this Associate Staff Engineer AKA me will tell you that is very very rarely you'll find a server with more than one core. why would you spend 500 USD for a multi core server if you can spawn a hold bunch of ...nano or whatever option available in the free trial... with kubernetes.
Another thing. why would you handle/configurate node to be incharge of the routing let apache and/or nginx to worry about that.
as you mentioned there is one thing call event loop which is a fancy way of naming a Queue Data Structure FIFO
so in other words. no, NO nodejs as well as any other programming language out there will run
but definitly it depends on your infrastructure.

NodeJS Returning data to client browser

I think we need some help here. Thanks in advance.
I have been doing programming in .Net for desktop applications and have used Timer objects to wait for a task to complete before the task result are shown in a data grid. Recently, we switched over to NodeJs and find it pretty interesting. We could design a small application that executes some tasks using PowerShell scripts and return the data to the client browser. However, I would have to execute a Timer on the client browser (when someone clicks on a button) to see if the file, that Timer receives from the server, has "ENDOFDATA" or not. Once the Timer sees ENDOFDATA it triggers another function to populate DIV with the data that was received from the server.
Is this the right way to get the data from a server? We really don't want to block EventLoop. We run PowerShell scripts on NodeJS to collect users from Active Directory and then send the data back to the client browser. The PowerShell scripts are executed as a Job so EventLoop is not blocked.
Here is an example of the code at NodeJs:
In the below code can we insert something that won't block the EventLoop but still respond to the server once the task is completed? As you can see in the code below, we would like to send the ADUsers.CSV file to the client browser once GetUsers.PS1 has finished executing. Since GetUSers.PS1 takes about five minutes to complete the Event Loop is blocked and the Server can no longer accept any other requests.
app.post("/LoadDomUsers", (request, response) => {
//we check if the request is an AJAX one and if accepts JSON
if (request.xhr || request.accepts("json, html") === "json") {
var ThisAD = request.body.ThisAD
console.log(ThisAD);
ps.addCommand("./public/ps/GetUsers.PS1", [{
name: 'AllParaNow',
value: ScriptPara
}])
ps.addCommand(`$rc = gc ` + __dirname + "/public/TestData/AD/ADUsers.CSV");
ps.addCommand(`$rc`);
ps.invoke().then((output) => {
response.send({ message: output });
console.log(output);
});
}
});
Thank you.

The way you describe your problem isn't that clear. I had to read some of the comments in your initial question just to be sure I understood the issue. Honestly, you could just utilize various CSV NPM packages to read and write from your active directory with NodeJS.
I/O is non-blocking with NodeJS, so you're not actually blocking the EventLoop. You can handle multiple I/O requests, since NodeJS will just create threads for each one,
and continue execution on the main thread until the I/O operations complete and send back the data to its function reference, adding them as functions to the callstack and resuming program execution from those function's references. After you get the I/O data, you just send it back to the client through the response object. There should be no timers needed.
So is the issue once the powershell script runs, you have to wait for that initial script to complete before being able to handle pending requests? I'm still a bit unclear...

Requests to Node.js server timing out due to multiple requests

So I'm not super experienced with node so bear with me.
I have 2 routes on a node.js server. During a typical day, both routes will be hit with requests at the same time. Route 1 will run smoothly but route 2 is a long running processes that returns several promises, so route 1 will take up the resource causing route 2 to pause (I have determined this is happening via data logs).
Route 1 looks like this:
app.post('/route1', function (req, res) {
doStuff().then(function(data){
res.end();
})
}
Route 2 is handling an array of data that needs to be parsed through so 1 record in the array is processed at a time
app.post('/route2', function (req, res){
async function processArray(array) {
for (const item of array) {
await file.test1()(item, res);
await file.test2()(item, res);
//await test3,test4,test5,test6
}
}
processArray(data).then(function() {
res.end();
}
}
So I'm guessing the problem is that the async/await is waiting for resources to become available before it continues to process records.
Is there a way for me to write this to where route1 will not interfere with route2?

In Node, almost everything you can await for (or call then on) is asynchronous. It does not block execution thread but rather offloads the task to another layer you don't control, and then just awaits for it to be finished while being free to work on something else. That includes working with filesystem and network requests. There are ways to block the thread still, for example, using synchronous versions of filesystem methods (like readFileSync instead of readFile) or doing heavy computations on javascript (like calculating factorial of 4569485960485096)
Given your route1 doesn't do any of this, it does not take any resources from route2. They are running in parallel. It's hard to tell without seeing the actual code, but I'm pretty sure you are getting connection timeout because your route2 is poorly written and it takes a long time to resolve (or doesn't resolve at all) for reasons not related to Node performance or blocking. Node is just chilling while waiting for your filesystem to run those endless tests 6 times in every array item (or whatever is going on there) and while this happens, browser stops waiting for response and shows you connection timeout. It's most likely that you don't need to await for every test on every array in the data instead of just running them all in parallel
Read more here https://nodejs.org/en/docs/guides/blocking-vs-non-blocking/
and here https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all

NodeJs is single threaded. This is why you break the cpu/resource instensive services into micro-services.
If these route1 and route2 need to be in the same server then see if you can change the algorithm or way you handle the computation to optimize the performance or break them so that they are handled by different cores in multi-core architecture.
Again if you are talking about production situation with huge user base then its not definately good idea to put them together and run it in the same server.
See this for more information

How to make express Node.JS reply a request during heavy workload?

I'm creating an nodejs web processor. I's is processing time that takes ~ 1 minute. I POST to my server and get status by using GET
this is my simplified code
// Configure Express
const app = express();
app.listen(8080);
// Console
app.post('/clean, async function(req, res, next) {
// start proccess
let result = await worker.process(data);
// Send result when finish
res.send(result);
});
// reply with when asked
app.get('/clean, async function(req, res, next) {
res.send(worker.status);
});
The problem is. The server is working so hard in the POST /clean process that GET /clean are not replied in time.
All GET /clean requests are replied after the worker finishes its task and free the processor to respond the request.
In other words. The application are unable to respond during workload.
How can I get around this situation?

Because node.js runs your Javascript as single threaded (only one piece of Javascript ever running at once) and does not time slice, as long as your worker.process() is running it's synchronous code, no other requests can be processed by your server. This is why worker.process() has to finish before any of the http requests that arrived while it was running get serviced. The node.js event loop is busy until worker.process() is done so it can't service any other events (like incoming http requests).
These are some of the ways to work around that:
Cluster your app with the built-in cluster module so that you have a bunch of processes that can either work on worker.process() code or handle incoming http requests.
When it's time to call worker.process(), fire up a new node.js process, run the processing there and communicate back the result with standard interprocess communication. Then, your main node.js process stays reading to handle incoming http requests near instantly as they arrive.
Create a work queue of a group of additional node.js processes that run jobs that are put in the queue and configure these processes to be able to run your worker.process() code from the queue. This is a variation of #2 that bounds the number of processes and serializes the work into a queue (better controlled than #2).
Rework the way worker.process() does its work so that it can do a few ms of work at a time, then return back to the message loop so other events can run (like incoming http requests) and then resume it's work afterwards for a few more ms at a time. This usually requires building some sort of stateful object that can do a little bit of work at a time each time it is called, but is often a pain to program effectively.
Note that #1, #2 and #3 all require that the work be done in other processes. That means that the process.status() will need to get the status from those other processes. So, you will either need some sort of interprocess way of communicating with the other processes or you will need to store the status as you go in some storage that is accessible from all processes (such as redis) so it can just be retrieved from there.

There's no working around the single-threaded nature of JS short of converting your service to a cluster of processes or to use something experimental like Worker Threads.
If neither of these options work for you, you'll need to yield up the processing thread periodically to give other tasks the ability to work on things:
function workPart1() {
// Do a bunch of stuff
setTimeout(workPart2, 10);
}
function workPart2() {
// More stuff
setTimeout(workPart3, 10); // etc.
}

Using worker/background processes in node.js vs async call

I want to know if there is any benefit in passing off db or other async calls to a worker process or processes. Specifically I'm using heroku and postgres. I've read up a good bit on node.js and how to structure your server so that the event loop isn't blocked and that smart architecture doesn't leave incoming requests hanging longer than 300ms or so.
Say I have the following:
app.get('/getsomeresults/:query', function(request, response){
var foo = request.params.query;
pg.connect(process.env.DATABASE_URL, function(err, client, done) {
client.query("SELECT * FROM users WHERE cat=$1", [foo],
function(err, result){
//do some stuff with result.rows that may take 1000ms
response.json({some:data})
});
});
});
Being that postgresql is async by nature is there any real benefit to creating a worker process to handle the processing of the results set from the initial db call?

You don't gain any benefit for running async functions in another process because the real work (running the SQL query) is already running in another process (postgres). Basically, the async/event-oriented design pattern is a lightweight process manager for things that run outside your process.
However, I noticed in your comment that the processing in the callback function does indeed take up a lot of CPU time (if that's really the case). That portion of code does benefit from being run in another process - it frees the main process to accept incoming requests.
There are two ways to structure such code. Either run the async function in a separate process (so that the callback doesn't block) or just run the relevant portion of the callback as a function in a separate process.

Calling client.query from a separate process won't give you a real benefit here, as sending queries to the server is already an asynchronous operation in node-pg. However, the real problem is the long execution time your callback function. The callback runs synchronously in the main event loop and blocks other operations, so it would be a good idea to make this non-blocking.
Option 1: Fork a child process
Creating a new process every time the callback is executed is no good idea, since each Node.js process needs its own environment, which is time consuming to set up. Instead it would be better to create multiple server processes when the server is started and let them handle requests concurrently.
Option 2: Use Node.js clusters
Luckily Node.js offers the cluster interface to achieve exactly this. Clusters give you the ability to handle multiple worker processes from one master process. It even supports connection pooling, so you can simply create a HTTP server in each child process an the incoming requests will be distributed among them automatically (node-pg supports pooling as well).
The cluster solution is also nice, because you don't have to change a lot in your code for that. Just write the master process code and start your existing code as workers.
The official documentation on Node.js clusters explains all aspects if clusters very well, so I won't go into details here. Just a short example for a possible master code:
var cluster = require("cluster");
var os = require("os");
var http = require("http");
if (cluster.isMaster)
master();
else
worker();
function master() {
console.info("MASTER "+process.pid+" starting workers");
//Create a worker for each CPU core
var numWorkers = os.cpus().length;
for (var i = 0; i < numWorkers; i++)
cluster.fork();
}
function worker() {
//Put your existing code here
console.info("WORKER "+process.pid+" starting http server");
var httpd = http.createServer();
//...
}
Option 3: Split the result processing
I assume that the reason for the long execution time of the callback function is that you have to process a lot of result rows and that there is no chance to process the results in a faster way.
In that case it might also be a good idea to split the processing into several chunks using process.nextTick(). The chunks will run synchronously in several event-loop frames, but other operations (like event-handlers) can be executed between these chunks. Here's a rough (and untested) scetch how the code could look like:
function(err, result) {
var s, i;
s = 0;
processChunk();
// process 100 rows in one frame
function processChunk() {
i = s;
s += 100;
while (i<result.rows.length && i<s) {
//do some stuff with result.rows[i]
i++;
}
if (i<result.rows.length)
process.nextTick(processChunk);
else
//go on (send the response)
}
}
I'm not 100% sure, but I think node-pg offers some way to receive a query result not as a whole, but split into several chunks. This would simplify the code a lot, so it might be an idea to search into that direction...
Final conclusion
I would use option 2 in the first place and option 3 additionally, if new requests still have to wait too long.

Develop Reference

JavaScript is the programming language of the Web.