Matching users flow Nodejs - javascript

So i'm trying to create a system which users can match each other by specific information,
the flow i have in mind is as follows:
user 1 fills the information and clicks "find"
at the same time user 2 does the same as user 1
the client sends a request to the server in route /X so the server can push the client to a (threadsafe)queue
a worker thread pulls out from the queue each time and do the matching
meanwhile the user polls route /Y in the server to get his match
the worker thread finds 2 users match and pushes it to some (threadsafe)data structure
next time the user polls the server(in /Y), the user gets the match and is redirected to the conversation
so first of all is this a good approach?
and also is using a worker thread and threadsafe datastructure logical in javascript?(specifically Nodejs and express) is there an alternative or a better way to do this kind of stuff?
thanks.

This is a bad approach.
You do not need (and should not use) worker threads for your use case.
On Worker Threads
Worker Threads are isolated instances of Javascript which run as a separate thread. They are intended strictly for performing CPU-intensive work.
vs vanilla Node
But you don't need them, because Node libraries are asynchronous, which means that unless your code really is CPU-intensive, you won't see any benefit from using Worker Threads (in fact there is overhead to using them, so if they aren't needed, your code will run slower).
From the docs: "Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be."
More on Threadedness
Javascript is single-threaded, and works very well that way. There is no concept of "threadsafe" in Javascript, because it isn't needed; all code is threadsafe.
If you do have CPU-intensive code
If you're doing expensive regex matching, then you are right to want to run this code in parallel. Worker Threads might not be the best way to do this, though.
Splitting CPU-intensive code into separate programs is often the most flexible solution. It gives you several options:
spawn a new instance of Node run your CPU-intensive code (on the same server)
run your CPU-intensive code in the cloud on "serverless" services, such AWS Lambda
turn your CPU-intensive code into a "microservice", essentially a tiny webserver which does any specialized processing and returns the result
Further Reading
How Node's asynchronicity works (big picture)
https://blog.insiderattack.net/event-loop-and-the-big-picture-nodejs-event-loop-part-1-1cb67a182810
What kinds of operations block the event loop and how to avoid it
https://nodejs.org/uk/docs/guides/dont-block-the-event-loop/

Related

How javascript handles multiple requests being Singlethreaded? [duplicate]

I don't understand several things about nodejs. Every information source says that node.js is more scalable than standard threaded web servers due to the lack of threads locking and context switching, but I wonder, if node.js doesn't use threads how does it handle concurrent requests in parallel? What does event I/O model means?
Your help is much appreciated.
Thanks
Node is completely event-driven. Basically the server consists of one thread processing one event after another.
A new request coming in is one kind of event. The server starts processing it and when there is a blocking IO operation, it does not wait until it completes and instead registers a callback function. The server then immediately starts to process another event (maybe another request). When the IO operation is finished, that is another kind of event, and the server will process it (i.e. continue working on the request) by executing the callback as soon as it has time.
So the server never needs to create additional threads or switch between threads, which means it has very little overhead. If you want to make full use of multiple hardware cores, you just start multiple instances of node.js
Update
At the lowest level (C++ code, not Javascript), there actually are multiple threads in node.js: there is a pool of IO workers whose job it is to receive the IO interrupts and put the corresponding events into the queue to be processed by the main thread. This prevents the main thread from being interrupted.
Although Question is already explained before a long time, I'm putting my thoughts on the same.
Node.js is single threaded JavaScript runtime environment. Basically it's creator Ryan Dahl concern was that parallel processing using multiple threads is not the right way or too complicated.
if Node.js doesn't use threads how does it handle concurrent requests in parallel
Ans: It's completely wrong sentence when you say it doesn't use threads, Node.js use threads but in a smart way. It uses single thread to serve all the HTTP requests & multiple threads in thread pool(in libuv) for handling any blocking operation
Libuv: A library to handle asynchronous I/O.
What does event I/O model means?
Ans: The right term is non-blocking I/O. It almost never blocks as Node.js official site says. When any request goes to node server it never queues the request. It take request and start executing if it's blocking operation then it's been sent to working threads area and registered a callback for the same as soon as code execution get finished, it trigger the same callback and goes to event queue and processed by event loop again after that create response and send to the respective client.
Useful link:
click here
Node JS is a JavaScript runtime environment. Both browser and Node JS run on V8 JavaScript engine. Node JS uses an event-driven, non-blocking I/O model that makes it lightweight and efficient. Node JS applications uses single threaded event loop architecture to handle concurrent clients. Actually its' main event loop is single threaded but most of the I/O works on separate threads, because the I/O APIs in Node JS are asynchronous/non-blocking by design, in order to accommodate the main event loop. Consider a scenario where we request a backend database for the details of user1 and user2 and then print them on the screen/console. The response to this request takes time, but both of the user data requests can be carried out independently and at the same time. When 100 people connect at once, rather than having different threads, Node will loop over those connections and fire off any events your code should know about. If a connection is new it will tell you .If a connection has sent you data, it will tell you .If the connection isn’t doing anything ,it will skip over it rather than taking up precision CPU time on it. Everything in Node is based on responding to these events. So we can see the result, the CPU stay focused on that one process and doesn’t have a bunch of threads for attention.There is no buffering in Node.JS application it simply output the data in chunks.
Though its been answered , i would like to just share my understandings in simple terms
Nodejs uses a library called Libuv , so this Libuv is written in C
language which uses the concept of threads . These threads are called
as workers and these workers take care of the multiple requests from client.
Parallel processing in nodejs is achieved with the help of 2 concepts
Asynchronous
Non blocking IO

Does lodash or underscore each method run in parallel in Node.js?

I know that in node everything runs in parallel, except your code. Read here and here.
I’m looking a possible scenario where I have a very large array in memory and I want to perform a small computation to each of its elements. The order in which this computation executes is not important.
In node all I/O is executed very efficiently because of the event loop but when you iterate through a collection there is no I/O involved and if this iteration takes too long you can block all incoming requests in that period of time.
This gist contains a nonBlockingForEach that Neilk wrote in his article Why you should use Node.js for CPU-bound tasks which makes me wonder if I write something like this
var my_very_large_array = [...];
my_very_large_array.forEach(function() { ... })
//or
_.each(my_very_large_array, function() { ... })
I will hit a performance bottleneck on my server (this libraries fallback to the native forEach if present)
From what I learned you there are a lot of libraries like async.js to do that but I always use lodash or underscore for those tasks in the browser.
I also tried bluebird.js but promisifying those methods didn't work as expected.
So my question is this. Is lodash or underscore a performance killer in a node.js environment when you iterate through a large collection using a forEach method?
There is a new standard called "Web Workers" which do allow background work to happen in a separate thread in the same process. This requires later versions of node.js, and a separate package installed from here:
From the wiki page:
The W3C and WHATWG envision web workers as long-running scripts that are not interrupted by user-interface scripts (scripts that respond to clicks or other user interactions). Keeping such workers from being interrupted by user activities should allow Web pages to remain responsive at the same time as they are running long tasks in the background.
The simplest use of workers is for performing a computationally expensive task without interrupting the user interface.
You can enable Web workers in node.js by installing the webworker-threads package
So my question is this. Is lodash or underscore a performance killer in a node.js environment when you iterate through a large collection using a forEach method?
Yes, lodash and underscore will both kill performance in a typical node.js setup when working with a large amount of data, as they will block the only thread available, making other tasks queued up in the event-loop suffer. However if you were to run these in a web-worker thread, then your main thread would be free to continue processing work as normal.
"Performance Killer" is a relative, and fairly loaded term.
To answer your direct question, does forEach in lodash or underscore run in parallel: no. It uses a standard, synchronous iteration.
NodeJS is a single-thread single-process application. It does not matter if you process the entire array in a single forEach, or break it apart into multiple loops processed by the event queue. It's still going to take the same amount of work for that CPU core.
If you want to take advantage of multiple cores. You need to use the cluster module, and create multiple processes that work on different parts of the array.
There is no shared memory, or thread locking in NodeJS. So you will have to break apart the array into pieces for each process to work on.
https://nodejs.org/api/cluster.html

Node.js - single thread, non-blocking?

I am learning Node.js and I have read that Node.js is single threaded and non-blocking.
I have a good background in JavaScript and I do understand the callbacks, but what I don't really understand is how Node.js can be single threaded and run code in the background. Isn't that contradictory?
Because if Node.js is single threaded it can still only perform one task at the time. So if it runs something in the background it has to stop the current task to process something in the background, right?
How does that work practically?
What "in the background" really means in terms of NodeJS is that things get put on a todo list for later. Whenever Node is done with what it's doing it picks from the top of the todo list. This is why doing anything that actually IS blocking can wreck your day. Everything that's happening "in the background" (actually just waiting on the todo list) gets stopped until the blocking task is complete.
Lucas explained it well, but I would like to add, this is possible to add "nodes" via some cluster libraries if you want to take advantage of your processors.
https://www.npmjs.com/package/cluster
https://www.npmjs.com/package/pm2
A tutorial to do a cluster: http://blog.carbonfive.com/2014/02/28/taking-advantage-of-multi-processor-environments-in-node-js/
Some hosters will give your the 'scalability' options, like Heroku
Anyway, when you use MongoDB with NodeJS (via Mongoose for example), it creates multiples connections.
NOTE: The advantage to be monothreaded is to handle millions users. With a legacy multithreaded server (apache), you create a thread for EACH user, then you need really BIG servers to handle thousands people.
While the JavaScript engine is monothreaded, there are multiple threads "in the background" that deal with all the non-blocking I/O work.
Specifically, libuv has a pool of worker threads waiting on OS events, I/O signals, running C++ code, etc. Size of this pool is determined by the UV_THREADPOOL_SIZE environment variable.
No JavaScript code ever runs "in the background". JavaScript functions (i.e. callbacks) are scheduled to run later on the main event loop, either by other JS functions or directly by the libuv workers. If the loop is blocked, then everything scheduled has to wait for it.
In fact, Node.js is not exactly monothreaded. Node.js use one "main thread", which is the thread where you script is executed. This main thread must never be blocked. So long-running operations are executed in separate threads. For example, Node.js use libuv library which maintains a pool of threads used to perform I/O.

Why is org/arangodb/request synchronous?

Why is the new JavaScript module request synchronous? Is it supposed to be only used in a job queue?
Is there any way to make asynchronous http(s) requests in ArangoDB?
Full disclosure: I'm part of ArangoDB's development team and primarily work on Foxx and everything JavaScript. I'm also the guy who wrote the org/arangodb/request module.
ArangoDB is a different environment than Node.js, despite sharing many similarities (such as using the V8 JavaScript engine). Unlike Node.js (or the browser), ArangoDB uses a thread-based concurrency model and doesn't feature an Event Loop. However the threads are not exposed in JavaScript (and in fact in V8 every thread is fully isolated) so you normally don't even have to think of them.
In the browser and in Node.js functions like setTimeout work by delaying code execution via the Event Loop (until a certain amount of time has passed or until an external event has occurred).
In ArangoDB the code is always executed linearly. For example, incoming HTTP requests are passed to Foxx controllers in JavaScript and the response is sent as soon as the controller returns. Even if you could use setTimeout, the external resources you were working with (or even "internal" ones like the document collections and transactions) would likely be already gone by the time the delayed code could execute.
Because of this, the request function provided by the org/arangodb/request module is also entirely synchronous. Instead of returning a promise or taking a callback it directly returns the incoming response data. It is also decidedly not the same module as request on npm but rather a synchronous implementation based on that module's API to the extent that implementing its API is possible outside Node.js (e.g. not including streams and returning the remote response instead of taking callbacks).
If you come from a Node.js/io.js background, this may feel wrong because non-blocking IO can achieve higher throughput, but keep in mind that the design goals of ArangoDB and Node.js are very different. Node.js is built around streams and network connections. ArangoDB is built as a persistent data storage and has to deal with transactions and locks instead.
It is probably not the best idea to access external APIs directly from your Foxx controllers if you have a high likelihood of serious network latency or if the external API's response is not essential to the client response. This is what the Foxx queues are for. Transactional e-mails are a prime example for this.
While Foxx is very versatile, its primary focus is to allow you to move most of your application (especially logic that benefits from running closer to the data) directly into the database. For small to medium scale projects that, you can probably get away with doing external API calls in-bounds. But if your application is primarily concerned with talking to other services over the network, running that code in a database is probably not the optimal solution.
Luckily ArangoDB plays well with others, so it's easy to move your network-intensive code out of Foxx if you find that it becomes a performance bottleneck at higher loads. Foxx doesn't eliminate the need for application servers, but it can considerably reduce their complexity.
As a correction to Brian's answer: sadly promises won't let you write async code in a synchronous environment either. The Promises/A+ spec defines promises as having to be executed asynchronously. Where they aren't natively supported they still have to be built on top of existing functions like setTimeout or process.nextTick, neither of which ArangoDB implements.

Why is node.js not suitable for heavy CPU apps?

Node.js servers are very efficient concerning I/O and large number of client connection. But why is node.js not suitable for heavy CPU apps in comparison to a traditional multithreading server?
I read it here Felix Baumgarten
Node is, despite its asynchronous event model, by nature single threaded. When you launch a Node process, you are running a single process with a single thread on a single core. So your code will not be executed in parallel, only I/O operations are parallel because they are executed asynchronous. As such, long running CPU tasks will block the whole server and are usually a bad idea.
Given that you just start a Node process like that, it is possible to have multiple Node processes running in parallel though. That way you could still benefit from your multithreading architecture, although a single Node process does not. You would just need to have some load balancer in front that distributes requests along all your Node processes.
Another option would be to have the CPU work in separate processes and make Node interact with those instead of doing the work itself.
Related things to read:
Node.js and CPU intensive requests
Understanding the node.js event loop
A simple Node.js server is single-threaded, meaning that any operation that takes a long time to execute will block the rest of your program from running. Node.js apps manage to maintain a high level of concurrency by working as a series of events. When an event handler is waiting for something to happen (such as reading from the database), it tells Node to go ahead and process another event in the meantime. But since a single thread can only execute one instruction at a time, this approach can't save you from a function that needs to keep actively executing for a long time. In a multithreaded architecture, even if one function takes a long time to compute the result, other threads can still process other requests — and as long as you have a core that is not fully used at the time, there's a good chance they can do it about as quickly as if no other requests were running at all.
In order to deal with this, production Node.js apps that expect to hog a lot of CPU will usually be run in clusters. This means that instead of having several threads in one program's memory space, you run several instances of the same program under the control of one "master" instance. Each process is single-threaded, but since you have several of them, you end up gaining the benefits of multiple threads.
Node is flawless if you are having asynchronous tasks because java script will run these things by worker pool. But if you run CPU intense tasks (where you heavily use CPU ) Ex you have a billion users and you want to sort those people on name. Its quit a Intense tasks, and this is synchronous which will block other code from running.
So its not a good idea to use node for these kind of applications. Technically you can find alternatives to address those kind of tasks. The above example is better addressed in a Db. then passing that result is great.
In the same way avoid Intense task and keep your CPU cool for better performance
You can have a look at this package, the-computer, which may help you do some cpu intensive works in a single instance of node.js app in a simple way.
Definitely it is not as effective as raw c++ libs, but it can cover most general computing cases, keeping you in node.js garden while allowing you leverage the cores of the cup.
Node.js runs JavaScript code in a single thread, which means that your code can only do one task at a time. However, Node.js itself is multithreaded and provides hidden threads through the libuv library, which handles I/O operations like reading files from a disk or network requests. Through the use of hidden threads, Node.js provides asynchronous methods that allow your code to make I/O requests without blocking the main thread.
Although Node.js has hidden threads, you cannot use them to offload CPU-intensive tasks, such as complex calculations, image resizing, or video compression. Since JavaScript is single-threaded when a CPU-intensive task runs, it blocks the main thread and no other code executes until the task completes. Without using other threads, the only way to speed up a CPU-bound task is to increase the processor speed.
💡 Node.js introduced the worker-threads module, which allows you to create threads and execute multiple JavaScript tasks in parallel. Once a thread finishes a task, it sends a message to the main thread that contains the result of the operation so that it can be used with other parts of the code. The advantage of using worker threads is that CPU-bound tasks don’t block the main thread and you can divide and distribute a task to multiple workers to optimize it.
ref: https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

Categories

Resources