I have been learning Node.js, but I have one question for which I cannot find answer anywhere. Here is what I get about Node.js=>
It is single-threaded in its architecture and uses CPU utilization efficiently due to its asynchronous non-blocking event-based looping.
How it executes these asynchronous requests is with the help of in-built library libuv, which uses threads(4 threads by default) in its internal thread pool. All these is kept away from "main" master thread which node.js uses. So we do not have to worry about that.
However, here is what my question - Suppose there are 100 asynchronous requests (let's say files) at once. Since the no. of threads libuv uses is limited, how exactly can node.js handle these 100 asynchronous requests at a time? It should ideally have 100 threads to handle these 100 asynchronous requests to respond the data back to the event queue quickly. How exactly is this faster than multi-threaded process?
How exactly is this faster than multi-threaded process?
The simple answer is, sometimes it isn't. No platform/language/compiler is best for every conceivable scenario.
However, sometimes it is faster. Dealing with many threads has its own problems (e.g. threads sharing CPU cores, thread deadlocking, race conditions, etc). In some cases node's approach is faster, because it doesn't have all that overhead to deal with those issues. In other cases it might not be faster.
That being said, there are things you can do (e.g. worker threads) that can allow you to tailor node.js to your circumstances, if you find you are CPU limited. This is fairly common on web servers, to have as many worker threads as CPU cores. (or cores minus 1, to leave a core free for the OS etc)
Related
Since Node.js 10.5, they introduced the new worker thread which makes Node.js a multi-thread environment.
Previously, with only one thread on Node.js, there's no cpu time slicing happening because of the event driven nature (If I understand correctly).
So now multiple threads on Node with with one physical cpu core, how do they share the cpu? is it the OS scheduler schedule time for each thread to run for various amount of time or what?
Worker threads are advertised as
The Worker class represents an independent JavaScript execution thread.
So something like starting another NodeJS instance but within the same process and with a bare minimum of a communication channel.
Worker threads in NodeJS mimick the Worker API in the modern browsers (not a coincidence, NodeJS is basically a browser without UI and with a few extra JS API) and in that context worker threads are really native threads, scheduled by the OS.
The description quoted above seems to imply that in NodeJS too, worker threads are implemented with native threads rather than with a scheduling managed by NodeJS.
The latter would be useless as this is exactly what the JS event loop coupled with async methods do.
So basically a worker thread is just another "instance" (context) of NodeJS run by another native thread in the same process.
Being a native thread, it is managed and scheduled by the OS. And just like you can run more than one program in a single CPU you can do that with threads (fun fact: in many OSes, threads are the only schedulable entities. Programs are just a group of thread with a common address space and other attributes).
As NodeJS is open source, it is easy to confirm this, see the Worker::StartThread and the Worker::Run functions.
The new thread will execute JS code just like the main one but it has been limited in the way it can interact with the environment (particularly the process itself).
This is in line with the JS approach to multithreading where it is more of "two or more message loops" than real multithreading (where threads are free to interact with each other with all the implication at the architectural level).
I've just started to use the npm threads package to move parts of my node server code into separate threads.
What I'm curious about is if I can specify that a certain thread gets a minimum share of the CPU that other threads aren't allowed to steal from.
My worry is that if my main loop spawns separate threads of intensive algorithms in order to prevent the intensive code from blocking other server operations, will the other thread end up hogging so much CPU that the main loop still significantly slows down anyway?
I'd want to set a minimum CPU for the main loop so that it can still do its stuff lag free regardless of the other threads i've spawned.
I have a problem using PhantomJS with web server module in a multi-threaded way, with concurrent requests.
I am using PhantomJS 2.0 to create highstock graphs on the server-side with Java, as explained here (and the code here).
It works well, and when testing graphs of several sizes, I got results that are pretty consistent, about 0.4 seconds to create a graph.
The code that I linked to was originally published by the highcharts team, and it is also used in their export server at http://export.highcharts.com/. In order to support concurrent requests, it keeps a pool of spawned PhantomJS processes, and basically its model is one phantomjs instance per concurrent request.
I saw that the webserver module supports up to 10 concurrent requests (explained here), so I thought I can tap on that to keep a lesser number of PhantomJS processes in my pool. However, when I tried to utilize more threads, I experienced a linear slow down, as if PhantomJS was using only one CPU. This slow-down is shown as follows (for a single PhantomJS instance):
1 client thread, average request time 0.44 seconds.
2 client threads, average request time 0.76 seconds.
4 client threads, average request time 1.5 seconds.
Is this a known limitation of PhantomJS? Is there a way around it?
(question also posted here)
Is this a known limitation of PhantomJS?
Yes, it is an expected limitation, because PhantomJS uses the same WebKit engine for everything and since JavaScript is single-threaded, this effectively means that every request will be handled one after the other (possibly interlocked), but never at the same time. The average overall time will increase linearly with each client.
The documentation says:
There is currently a limit of 10 concurrent requests; any other requests will be queued up.
There is a difference between the notions of concurrent and parallel requests. Concurrent simply means that the tasks finish non-deterministically. It doesn't mean that the instructions that the tasks are made of are executed in parallel on different (virtual) cores.
Is there a way around it?
Other than running your server tasks through child_process, no. The way JavaScript supports multi-threading is by using Web Workers, but a worker is sandboxed and has no access to require and therefore cannot create pages to do stuff.
I am building a calculator that makes lots of XHR calls and was wondering if placing these in a web worker, synchronously, would still lock the browser? It's my understanding these are handled in a different thread and shouldn't.
(I've built the algorithm asynchronously before, it's just very hard code to maintain and I only am looking to this option to keep the code more maintainable. I understand why it shouldn't be synchronous outside of a web worker.)
Without another processor available, it won't be as bad as it would be without Web Workers (because the OS can round-robin schedule the two threads to run interleaved even one on processor).
And with another processor available, the OS would ideally schedule it to run on that thread, and they would both run at full speed.
I'm writing an application that makes heavy use of the http.request method.
In particular, I've found that sending 16+ ~30kb requests simultaneously really bogs down a Nodejs instance on a 512mb RAM machine.
I'm wondering if this is to be expected, or if Nodejs is just the wrong platform for outbound requests.
Yes, this behavior seems perfectly reasonable.
I would be more concerned if it was doing the work you described without any noticeable load on the system (in which case it would take a very long time). Remember that node is just an evented I/O runtime, so you can have faith that it is scheduling your I/O requests (about) as quickly as the underlying system can, hence it's using the system to it's (nearly) maximum potential, hence the system being "really bogged down".
One thing you should be aware of is the fact that http.request does not create a new socket for each call. Each request occurs on an object called an "agent" which contains a pool of up to 5 sockets. If you are using the v0.6 branch, then you can up this limit by using.
http.globalAgent.maxSockets = Infinity
Try that and see if it helps