I don't understand several things about nodejs. Every information source says that node.js is more scalable than standard threaded web servers due to the lack of threads locking and context switching, but I wonder, if node.js doesn't use threads how does it handle concurrent requests in parallel? What does event I/O model means?
Your help is much appreciated.
Thanks
Node is completely event-driven. Basically the server consists of one thread processing one event after another.
A new request coming in is one kind of event. The server starts processing it and when there is a blocking IO operation, it does not wait until it completes and instead registers a callback function. The server then immediately starts to process another event (maybe another request). When the IO operation is finished, that is another kind of event, and the server will process it (i.e. continue working on the request) by executing the callback as soon as it has time.
So the server never needs to create additional threads or switch between threads, which means it has very little overhead. If you want to make full use of multiple hardware cores, you just start multiple instances of node.js
Update
At the lowest level (C++ code, not Javascript), there actually are multiple threads in node.js: there is a pool of IO workers whose job it is to receive the IO interrupts and put the corresponding events into the queue to be processed by the main thread. This prevents the main thread from being interrupted.
Although Question is already explained before a long time, I'm putting my thoughts on the same.
Node.js is single threaded JavaScript runtime environment. Basically it's creator Ryan Dahl concern was that parallel processing using multiple threads is not the right way or too complicated.
if Node.js doesn't use threads how does it handle concurrent requests in parallel
Ans: It's completely wrong sentence when you say it doesn't use threads, Node.js use threads but in a smart way. It uses single thread to serve all the HTTP requests & multiple threads in thread pool(in libuv) for handling any blocking operation
Libuv: A library to handle asynchronous I/O.
What does event I/O model means?
Ans: The right term is non-blocking I/O. It almost never blocks as Node.js official site says. When any request goes to node server it never queues the request. It take request and start executing if it's blocking operation then it's been sent to working threads area and registered a callback for the same as soon as code execution get finished, it trigger the same callback and goes to event queue and processed by event loop again after that create response and send to the respective client.
Useful link:
click here
Node JS is a JavaScript runtime environment. Both browser and Node JS run on V8 JavaScript engine. Node JS uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. Node JS applications uses single threaded event loop architecture to handle concurrent clients. Actually its' main event loop is single threaded but most of the I/O works on separate threads, because the I/O APIs in Node JS are asynchronous/non-blocking by design, in order to accommodate the main event loop. Consider a scenario where we request a backend database for the details of user1 and user2 and then print them on the screen/console. The response to this request takes time, but both of the user data requests can be carried out independently and at the same time. When 100 people connect at once, rather than having different threads, Node will loop over those connections and fire off any events your code should know about. If a connection is new it will tell you .If a connection has sent you data, it will tell you .If the connection isn’t doing anything ,it will skip over it rather than taking up precision CPU time on it. Everything in Node is based on responding to these events. So we can see the result, the CPU stay focused on that one process and doesn’t have a bunch of threads for attention.There is no buffering in Node.JS application it simply output the data in chunks.
Though its been answered , i would like to just share my understandings in simple terms
Nodejs uses a library called Libuv , so this Libuv is written in C
language which uses the concept of threads . These threads are called
as workers and these workers take care of the multiple requests from client.
Parallel processing in nodejs is achieved with the help of 2 concepts
Asynchronous
Non blocking IO
Related
Motive
For Javascript and node.js, I am trying to understand the difference between the thread pool and Web API.
What I Currently Understand
Thread pool: a multi-thread platform, where each thread executes their own operation.
Web API: an API built in to the browser. It is part of the event loop, along with the call stack and callback queue to enable asynchronous operations in Javascript.
What I Am Confused about
It seems like both the thread pool, and Web API, enable Javascript to handle asynchronous behavior.
When Javascript is executing code off the singly-threaded call stack, is it sent to the Web API, which uses a thread pool to create a single thread for each asynchronous operation?
If not, how does the Web API and thread pool work together to give Javascript asynchronous capabilities?
A Few Things First
It's not entirely clear what you mean by "Web API". There is no such thing called that built-into nodejs. There are many, many different libraries in nodejs each with their own API (timers, networking, http requests, crypto, disk I/O, etc...).
If, what you mean is TCP networking or the http networking, then I can explain a few things.
First off, nodejs uses a cross platform library called libuv and that is responsible for most of the interface between the nodejs environment and native code calls to the operation system. This includes things like file I/O, timers, disk I/O, networking and so on.
The event loop that drives much of nodejs is in libuv.
Secondly, the thread pool is a pool of native threads that are used for certain blocking operations that nodejs wants to be able to present an asynchronous interface to Javascript. If the interface went directly from Javascript to the blocking native code implementation, then the operation would be blocking and would block the event loop. So, to make the interface asynchronous, it is moved to a thread. Examples of native code operations that run in the thread pool are disk I/O and some time consuming crypto operations.
Then, networking is accomplished by using non-blocking operating system APIs so it does not need to use native threads or the thread pool.
Your Specific Questions
It seems like both the thread pool, and Web API, enable Javascript to handle asynchronous behavior.
The design of the interfaces to native operating system things (like networking, file I/O, etc...) combined with the event loop is what enables asynchronous behavior.
When Javascript is executing code off the singly-threaded call stack, is it sent to the Web API, which uses a thread pool to create a single thread for each asynchronous operation?
Again, it's not clear what you mean by the term "Web API". There is no such thing that nodejs sends things to in order to execute things asynchronously.
If not, how does the Web API and thread pool work together to give Javascript asynchronous capabilities?
Converting your "Web API" term to "nodejs API", let's examine how one such operation works.
Let's look as fs.open(), a function to open a file. When you call fs.open(), it goes to a Javascript library built into nodejs. It checks the arguments, handles any immediate errors if the arguments are wrong and eventually calls a binding function that packages the arguments from Javascript into something that native code can handle and then calls a native code function. These arguments include a reference to the callback that was passed into fs.open().
That native code function gets the arguments into a form that C/C++ can use, gets a thread from the thread pool and then starts running in that thread an OS call to open the desired file. The native C/C++, then immediately returns back to the Javascript portion of fs.open() which then returns back to your Javascript and your Javascript continues to run any successive lines of Javascript code until it eventually returns control back to the event loop.
Meanwhile the file open operation is running in another OS thread. When it completes, it inserts an event into the appropriate event queue (a part of the event loop) and returns the thread it was using back to the thread pool. This event that is inserted into the event queue includes a reference to the callback function that was originally passed into fs.open() and any data associated with a result from fs.open() (which would include either an error or the file handle).
Sometime later when control goes back to the event loop, it will see that the event queue for file I/O has a pending event and it will grab that event out of the event queue and call the completion callback associated with the event and pass it the data associated with the event. At that point, your Javascript callback will run and you will get the result of the fs.open() call in that callback.
That's an asynchronous operation in nodejs. Now, keep in mind that not every asynchronous operation uses the thread pool. Timers, for example do not use threads. In fact, they don't even use OS-level timers as the event loop has its own timer implementation. Networking does not use the thread pool either.
So i'm trying to create a system which users can match each other by specific information,
the flow i have in mind is as follows:
user 1 fills the information and clicks "find"
at the same time user 2 does the same as user 1
the client sends a request to the server in route /X so the server can push the client to a (threadsafe)queue
a worker thread pulls out from the queue each time and do the matching
meanwhile the user polls route /Y in the server to get his match
the worker thread finds 2 users match and pushes it to some (threadsafe)data structure
next time the user polls the server(in /Y), the user gets the match and is redirected to the conversation
so first of all is this a good approach?
and also is using a worker thread and threadsafe datastructure logical in javascript?(specifically Nodejs and express) is there an alternative or a better way to do this kind of stuff?
thanks.
This is a bad approach.
You do not need (and should not use) worker threads for your use case.
On Worker Threads
Worker Threads are isolated instances of Javascript which run as a separate thread. They are intended strictly for performing CPU-intensive work.
vs vanilla Node
But you don't need them, because Node libraries are asynchronous, which means that unless your code really is CPU-intensive, you won't see any benefit from using Worker Threads (in fact there is overhead to using them, so if they aren't needed, your code will run slower).
From the docs: "Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be."
More on Threadedness
Javascript is single-threaded, and works very well that way. There is no concept of "threadsafe" in Javascript, because it isn't needed; all code is threadsafe.
If you do have CPU-intensive code
If you're doing expensive regex matching, then you are right to want to run this code in parallel. Worker Threads might not be the best way to do this, though.
Splitting CPU-intensive code into separate programs is often the most flexible solution. It gives you several options:
spawn a new instance of Node run your CPU-intensive code (on the same server)
run your CPU-intensive code in the cloud on "serverless" services, such AWS Lambda
turn your CPU-intensive code into a "microservice", essentially a tiny webserver which does any specialized processing and returns the result
Further Reading
How Node's asynchronicity works (big picture)
https://blog.insiderattack.net/event-loop-and-the-big-picture-nodejs-event-loop-part-1-1cb67a182810
What kinds of operations block the event loop and how to avoid it
https://nodejs.org/uk/docs/guides/dont-block-the-event-loop/
I've primarily programmed in other programming languages but I have been making a webapp in user NodeJS and have come across a few things that I can't quite get my heard around.
I referred to https://nodejs.org/api/cluster.html#cluster_how_it_works
I found that this explained, well, how NodeJS can cope with large numbers of requests despite Node only being single threaded. However, what confuses me is when it says a port is shared among 'many workers'.
Now if Node is not multithreaded then what exactly are these workers. In java for example you can have multithreaded applications using Completable Futures for example. These cause different threads to take responsibility.
But what is a worker in node if not a thread?
Node can easily handle 10,000 concurrent connections in a single thread (see this answer for details). For some things that are blocking it uses a thread pool but this is transparent to you. Your JavaScript uses a single-threaded event loop in every process.
Keep in mind that nginx, a web server that is known for speed is also single-threaded. Redis, a database that is known for speed is also single-threaded. Multi-threading is good for CPU-bound tasks (when you use one thread per CPU) but for I/O-bound tasks that Node is usually used for, single-threaded event loops work better.
Now, to answer your question - in the context of clusters that the website that you linked to is talking about, a worker is a single process. Every one of those processes still has one single-threaded event loop but there can be many of those processes executing at the same time.
See those answers for more details:
Which would be better for concurrent tasks on node.js? Fibers? Web-workers? or Threads?
what is mean by event loop in node.js ? javascript event loop or libuv event loop?
How many clients can an http-server can handle?
I am learning Node.js and I have read that Node.js is single threaded and non-blocking.
I have a good background in JavaScript and I do understand the callbacks, but what I don't really understand is how Node.js can be single threaded and run code in the background. Isn't that contradictory?
Because if Node.js is single threaded it can still only perform one task at the time. So if it runs something in the background it has to stop the current task to process something in the background, right?
How does that work practically?
What "in the background" really means in terms of NodeJS is that things get put on a todo list for later. Whenever Node is done with what it's doing it picks from the top of the todo list. This is why doing anything that actually IS blocking can wreck your day. Everything that's happening "in the background" (actually just waiting on the todo list) gets stopped until the blocking task is complete.
Lucas explained it well, but I would like to add, this is possible to add "nodes" via some cluster libraries if you want to take advantage of your processors.
https://www.npmjs.com/package/cluster
https://www.npmjs.com/package/pm2
A tutorial to do a cluster: http://blog.carbonfive.com/2014/02/28/taking-advantage-of-multi-processor-environments-in-node-js/
Some hosters will give your the 'scalability' options, like Heroku
Anyway, when you use MongoDB with NodeJS (via Mongoose for example), it creates multiples connections.
NOTE: The advantage to be monothreaded is to handle millions users. With a legacy multithreaded server (apache), you create a thread for EACH user, then you need really BIG servers to handle thousands people.
While the JavaScript engine is monothreaded, there are multiple threads "in the background" that deal with all the non-blocking I/O work.
Specifically, libuv has a pool of worker threads waiting on OS events, I/O signals, running C++ code, etc. Size of this pool is determined by the UV_THREADPOOL_SIZE environment variable.
No JavaScript code ever runs "in the background". JavaScript functions (i.e. callbacks) are scheduled to run later on the main event loop, either by other JS functions or directly by the libuv workers. If the loop is blocked, then everything scheduled has to wait for it.
In fact, Node.js is not exactly monothreaded. Node.js use one "main thread", which is the thread where you script is executed. This main thread must never be blocked. So long-running operations are executed in separate threads. For example, Node.js use libuv library which maintains a pool of threads used to perform I/O.
Node.js servers are very efficient concerning I/O and large number of client connection. But why is node.js not suitable for heavy CPU apps in comparison to a traditional multithreading server?
I read it here Felix Baumgarten
Node is, despite its asynchronous event model, by nature single threaded. When you launch a Node process, you are running a single process with a single thread on a single core. So your code will not be executed in parallel, only I/O operations are parallel because they are executed asynchronous. As such, long running CPU tasks will block the whole server and are usually a bad idea.
Given that you just start a Node process like that, it is possible to have multiple Node processes running in parallel though. That way you could still benefit from your multithreading architecture, although a single Node process does not. You would just need to have some load balancer in front that distributes requests along all your Node processes.
Another option would be to have the CPU work in separate processes and make Node interact with those instead of doing the work itself.
Related things to read:
Node.js and CPU intensive requests
Understanding the node.js event loop
A simple Node.js server is single-threaded, meaning that any operation that takes a long time to execute will block the rest of your program from running. Node.js apps manage to maintain a high level of concurrency by working as a series of events. When an event handler is waiting for something to happen (such as reading from the database), it tells Node to go ahead and process another event in the meantime. But since a single thread can only execute one instruction at a time, this approach can't save you from a function that needs to keep actively executing for a long time. In a multithreaded architecture, even if one function takes a long time to compute the result, other threads can still process other requests — and as long as you have a core that is not fully used at the time, there's a good chance they can do it about as quickly as if no other requests were running at all.
In order to deal with this, production Node.js apps that expect to hog a lot of CPU will usually be run in clusters. This means that instead of having several threads in one program's memory space, you run several instances of the same program under the control of one "master" instance. Each process is single-threaded, but since you have several of them, you end up gaining the benefits of multiple threads.
Node is flawless if you are having asynchronous tasks because java script will run these things by worker pool. But if you run CPU intense tasks (where you heavily use CPU ) Ex you have a billion users and you want to sort those people on name. Its quit a Intense tasks, and this is synchronous which will block other code from running.
So its not a good idea to use node for these kind of applications. Technically you can find alternatives to address those kind of tasks. The above example is better addressed in a Db. then passing that result is great.
In the same way avoid Intense task and keep your CPU cool for better performance
You can have a look at this package, the-computer, which may help you do some cpu intensive works in a single instance of node.js app in a simple way.
Definitely it is not as effective as raw c++ libs, but it can cover most general computing cases, keeping you in node.js garden while allowing you leverage the cores of the cup.
Node.js runs JavaScript code in a single thread, which means that your code can only do one task at a time. However, Node.js itself is multithreaded and provides hidden threads through the libuv library, which handles I/O operations like reading files from a disk or network requests. Through the use of hidden threads, Node.js provides asynchronous methods that allow your code to make I/O requests without blocking the main thread.
Although Node.js has hidden threads, you cannot use them to offload CPU-intensive tasks, such as complex calculations, image resizing, or video compression. Since JavaScript is single-threaded when a CPU-intensive task runs, it blocks the main thread and no other code executes until the task completes. Without using other threads, the only way to speed up a CPU-bound task is to increase the processor speed.
💡 Node.js introduced the worker-threads module, which allows you to create threads and execute multiple JavaScript tasks in parallel. Once a thread finishes a task, it sends a message to the main thread that contains the result of the operation so that it can be used with other parts of the code. The advantage of using worker threads is that CPU-bound tasks don’t block the main thread and you can divide and distribute a task to multiple workers to optimize it.
ref: https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js