I've primarily programmed in other programming languages but I have been making a webapp in user NodeJS and have come across a few things that I can't quite get my heard around.
I referred to https://nodejs.org/api/cluster.html#cluster_how_it_works
I found that this explained, well, how NodeJS can cope with large numbers of requests despite Node only being single threaded. However, what confuses me is when it says a port is shared among 'many workers'.
Now if Node is not multithreaded then what exactly are these workers. In java for example you can have multithreaded applications using Completable Futures for example. These cause different threads to take responsibility.
But what is a worker in node if not a thread?
Node can easily handle 10,000 concurrent connections in a single thread (see this answer for details). For some things that are blocking it uses a thread pool but this is transparent to you. Your JavaScript uses a single-threaded event loop in every process.
Keep in mind that nginx, a web server that is known for speed is also single-threaded. Redis, a database that is known for speed is also single-threaded. Multi-threading is good for CPU-bound tasks (when you use one thread per CPU) but for I/O-bound tasks that Node is usually used for, single-threaded event loops work better.
Now, to answer your question - in the context of clusters that the website that you linked to is talking about, a worker is a single process. Every one of those processes still has one single-threaded event loop but there can be many of those processes executing at the same time.
See those answers for more details:
Which would be better for concurrent tasks on node.js? Fibers? Web-workers? or Threads?
what is mean by event loop in node.js ? javascript event loop or libuv event loop?
How many clients can an http-server can handle?
Related
I always believed that JS was a single threaded language which makes it inefficient for CPU intensive tasks. I recently came across worker threads and how it solves this inefficiency problem by creating "multiple worker threads under one process". What's the difference between a process and a thread? Why is JS all of the sudden capable of spawning multiple worker threads that help and interact with the main JS thread to enable concurrency? Could you help me understand this topic in layman terms? Thank you
Starting in node v10, they introduced WorkerThreads. A WorkerThread is an entirely new instance of the V8 Javascript interpreter. It has it's own set of variables, it's own globals and it's own thread of running Javascript. You cannot directly share regular Javascript variables between the main thread and a workerThread or between workerThreads.
You can directly share memory if it is specifically allocated as SharedMemory such as a SharedArrayBuffer, but when doing so, you open yourself up to race conditions between the two threads both accessing the Shared memory. So, you have to either use Atomics or your own concurrency management scheme to prevent race conditions when modifying the shared memory.
The main thread and workerThreads can send messages to each other and those messages can contain some types of data structures that will be copied via a structured cloning mechanism and sent to the other V8 instance.
The idea behind workerThreads is that they are useful for getting CPU-intensive code out of your main event loop (particularly useful for servers) so you can fire up one or more workerThreads to handle CPU-intensive work and keep the main thead event loop free and responsive to incoming events/networking/etc...
You can also do something similar by creating multiple nodejs processes. But, a process is a heavier-weight thing than a workerThread and workerThreads allow you to share memory with SharedMemory whereas separate processes do not.
From the Mozilla documentation:
Web Workers is a simple means for web content to run scripts in
background threads.
Considering Javascript is single-threaded, are web workers separate threads or processes? Is there shared memory that classifies them as threads?
They run in background threads, but the API completely abstracts from the implementation, so you may come across a browser that just schedules them to run on the same thread as other events like Node does. Processes are too heavyweight to run background tasks.
Considering Javascript is single-threaded
JavaScript is not single-threaded.
The main part of a JavaScript program runs on an event loop.
Long-running processes (XMLHttpRequest being the classic example) are almost always farmed out to stuff that runs outside the event loop (often on different threads).
Web Workers are just a means to write JavaScript that runs outside the main event loop.
are web workers separate threads or processes? Is there shared memory that classifies them as threads?
That's an implementation detail of the particular JS engine.
As per the MDN:-
The Worker interface spawns real OS-level threads, and mindful programmers may be concerned that concurrency can cause “interesting” effects in your code if you aren't careful.
Reference:- https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers#about_thread_safety
The documentation does not define whether the web worker runs in a separate thread or process (or another similar construct). So, depending on the hardware architecture of the processor on which the program is executed, the Operating System and the implementation of the JavaScript engine used, it may be different.
However, I guess that the essence of this question is: Can the Operating System use multiple CPU cores by using web workers? If so, the answer is: YES!!! Even regardless of the implementation of the JavaScript engine!
As long as the processor has many cores, and the Operating System can make use of them, even if the Web Worker's script is executed within another thread of the same process, these threads will be able to run on different cores because the "process" is a construct of an Operating System and itself can run on several processor cores, just as several processes can run on a single core.
P.S. If you want the code to be executed 100% in another process, delegate it to another service (e.g. running on a different server).
I am learning Node.js and I have read that Node.js is single threaded and non-blocking.
I have a good background in JavaScript and I do understand the callbacks, but what I don't really understand is how Node.js can be single threaded and run code in the background. Isn't that contradictory?
Because if Node.js is single threaded it can still only perform one task at the time. So if it runs something in the background it has to stop the current task to process something in the background, right?
How does that work practically?
What "in the background" really means in terms of NodeJS is that things get put on a todo list for later. Whenever Node is done with what it's doing it picks from the top of the todo list. This is why doing anything that actually IS blocking can wreck your day. Everything that's happening "in the background" (actually just waiting on the todo list) gets stopped until the blocking task is complete.
Lucas explained it well, but I would like to add, this is possible to add "nodes" via some cluster libraries if you want to take advantage of your processors.
https://www.npmjs.com/package/cluster
https://www.npmjs.com/package/pm2
A tutorial to do a cluster: http://blog.carbonfive.com/2014/02/28/taking-advantage-of-multi-processor-environments-in-node-js/
Some hosters will give your the 'scalability' options, like Heroku
Anyway, when you use MongoDB with NodeJS (via Mongoose for example), it creates multiples connections.
NOTE: The advantage to be monothreaded is to handle millions users. With a legacy multithreaded server (apache), you create a thread for EACH user, then you need really BIG servers to handle thousands people.
While the JavaScript engine is monothreaded, there are multiple threads "in the background" that deal with all the non-blocking I/O work.
Specifically, libuv has a pool of worker threads waiting on OS events, I/O signals, running C++ code, etc. Size of this pool is determined by the UV_THREADPOOL_SIZE environment variable.
No JavaScript code ever runs "in the background". JavaScript functions (i.e. callbacks) are scheduled to run later on the main event loop, either by other JS functions or directly by the libuv workers. If the loop is blocked, then everything scheduled has to wait for it.
In fact, Node.js is not exactly monothreaded. Node.js use one "main thread", which is the thread where you script is executed. This main thread must never be blocked. So long-running operations are executed in separate threads. For example, Node.js use libuv library which maintains a pool of threads used to perform I/O.
Node.js servers are very efficient concerning I/O and large number of client connection. But why is node.js not suitable for heavy CPU apps in comparison to a traditional multithreading server?
I read it here Felix Baumgarten
Node is, despite its asynchronous event model, by nature single threaded. When you launch a Node process, you are running a single process with a single thread on a single core. So your code will not be executed in parallel, only I/O operations are parallel because they are executed asynchronous. As such, long running CPU tasks will block the whole server and are usually a bad idea.
Given that you just start a Node process like that, it is possible to have multiple Node processes running in parallel though. That way you could still benefit from your multithreading architecture, although a single Node process does not. You would just need to have some load balancer in front that distributes requests along all your Node processes.
Another option would be to have the CPU work in separate processes and make Node interact with those instead of doing the work itself.
Related things to read:
Node.js and CPU intensive requests
Understanding the node.js event loop
A simple Node.js server is single-threaded, meaning that any operation that takes a long time to execute will block the rest of your program from running. Node.js apps manage to maintain a high level of concurrency by working as a series of events. When an event handler is waiting for something to happen (such as reading from the database), it tells Node to go ahead and process another event in the meantime. But since a single thread can only execute one instruction at a time, this approach can't save you from a function that needs to keep actively executing for a long time. In a multithreaded architecture, even if one function takes a long time to compute the result, other threads can still process other requests — and as long as you have a core that is not fully used at the time, there's a good chance they can do it about as quickly as if no other requests were running at all.
In order to deal with this, production Node.js apps that expect to hog a lot of CPU will usually be run in clusters. This means that instead of having several threads in one program's memory space, you run several instances of the same program under the control of one "master" instance. Each process is single-threaded, but since you have several of them, you end up gaining the benefits of multiple threads.
Node is flawless if you are having asynchronous tasks because java script will run these things by worker pool. But if you run CPU intense tasks (where you heavily use CPU ) Ex you have a billion users and you want to sort those people on name. Its quit a Intense tasks, and this is synchronous which will block other code from running.
So its not a good idea to use node for these kind of applications. Technically you can find alternatives to address those kind of tasks. The above example is better addressed in a Db. then passing that result is great.
In the same way avoid Intense task and keep your CPU cool for better performance
You can have a look at this package, the-computer, which may help you do some cpu intensive works in a single instance of node.js app in a simple way.
Definitely it is not as effective as raw c++ libs, but it can cover most general computing cases, keeping you in node.js garden while allowing you leverage the cores of the cup.
Node.js runs JavaScript code in a single thread, which means that your code can only do one task at a time. However, Node.js itself is multithreaded and provides hidden threads through the libuv library, which handles I/O operations like reading files from a disk or network requests. Through the use of hidden threads, Node.js provides asynchronous methods that allow your code to make I/O requests without blocking the main thread.
Although Node.js has hidden threads, you cannot use them to offload CPU-intensive tasks, such as complex calculations, image resizing, or video compression. Since JavaScript is single-threaded when a CPU-intensive task runs, it blocks the main thread and no other code executes until the task completes. Without using other threads, the only way to speed up a CPU-bound task is to increase the processor speed.
💡 Node.js introduced the worker-threads module, which allows you to create threads and execute multiple JavaScript tasks in parallel. Once a thread finishes a task, it sends a message to the main thread that contains the result of the operation so that it can be used with other parts of the code. The advantage of using worker threads is that CPU-bound tasks don’t block the main thread and you can divide and distribute a task to multiple workers to optimize it.
ref: https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js
how to impove game-server performance with mutil-core cpu
My point:
one busy process make one core busy, if only one busy process, very bad.
multi-process process and listen on diffrent port imporved concurrency connections
multi-process can't share the memory data directly. they need communicate.
they can communicate in socket, fp socket, and redis
seperate game-server into diffrent functions, each function a standalone process, some function can be parrallel processes.
if my point is correct, my question is:
what's the best way to communicate between processes, and keep the data synchronized, the best means fast and simple.
I am using nodejs, but I think it's the same with c for this topic.
Edit:
Sharding or seperate by functions , which is better
There are many discussions about that available on internet so I am just joining the links instead of repeating what they say:
Use Threads Correctly = Isolation + Asynchronous Messages
Use Thread Pools Correctly: Keep Tasks Short and Nonblocking
Break Up and Interleave Work to Keep Threads Responsive
Sharing Is the Root of All Contention
Design for Manycore Systems
These links show the best practices to create a safe and well designed parallel solution.
I would advice to create a pool of threads to handle the connections. One thread per connection. That's the common "pattern" for servers.
One method for node.js is to use ØMQ, which allows you to communicate between threads, processes, and servers with the same simple BSD sockets compatible API.
Here's the node.js binding to try:
https://github.com/JustinTulloss/zeromq.node