Node JS multiple concurrent requests to backend API - javascript

This is my node js backend API method.
apiRouter.route('/makeComment')
.post((req, res) => {
consoleLogger.info("Inside makeComment API..");
logger.info("Inside makeComment");
let timestamp = new Date().getTime();
let latestCommentID = timestamp.toString();
console.log("comment ID generated is "+latestCommentID);
res.json({
'makeComment': 'success',
'commentid':latestCommentID
});
});
Now, if multiple concurrent requests come to this API, what will happen?
As per my understanding, NodeJS will maintain a Event Queue for the requests and process requests one by one.
So, it is impossible to get the same timestamp for two different requests.
Please let me know if my understanding is correct?
Edit 1:
After googling for some time, got this link, which clearly explains some of the concepts.

You can absolutely end up with the same timestamp on 2 concurrent calls. But not because NodeJS or your server served the request in parallel, but rather because a millisecond is a long enough time that your server could process many requests in the same millisecond.
Multi-threaded vs. Concurrent
You have correctly identified a subtle but important distinction between concurrency and multi-threading. In a multi-threaded environment, two different operations can truly take place at the same time on 2 different cores of a CPU. In a concurrent but single-threaded environment, things do happen sequentially, hopefully in a non-blocking manner, and really fast!
Javascript is a single threaded environment. And yes, it has an event loop where it queues tasks and processes them one at a time. NodeJS (and Web APIs in case of browser Javascript apps) offer certain async functions, such as fs.writeFile or fetch, which the runtime in collaboration with the OS may perform in the same or different thread/core, and then orchestrate returning results back to your application via callbacks and Promises.
In case of your example, your handler (the part starting from (req, res) => { ... }) consists of sync calls only. So, yes, various calls to this handler will run sequentially, one after another. And, while no two runs of this handler will ever truly happen at the same time, but they will happen really fast and you will likely have cases where you'll get the same millisecond value from the Date object. If you had a higher resolution timestamp available (maybe nano seconds) or if your handler took longer to run (for example, you ran a for loop a billion iterations), then you'll be able to observe the sequential behavior more clearly.
Avoid blocking (sync) calls in single-threaded applications
This is the exact reason why you are advised against performing any synchronous IO operation (e.g. fs.writeFileSync, etc.) in a NodeJS web server, because while one request is performing a blocking IO operation, all other requests are waiting, queued on the event loop.
Really good video about Javascript event loop; this should illuminate some topics: https://www.youtube.com/watch?v=8aGhZQkoFbQ

There is no possibility of concurrency here. As you have correctly studied, Node.js runs in a single-threaded environment (unless you use clusters, the worker API, or other related Node.js modules that facilitate IPC). As such, the line
let timestamp = new Date().getTime();
cannot be concurrently invoked by two simultaneously running threads, excluding the exceptions noted above.
However, Date.prototype.getTime() only has millisecond resolution, so it's remotely possible that the callback to apiRouter.route('/makeComment').post() might be sequentially invoked by the event loop from two pending asynchronous requests within the same millisecond.

Related

Why does Javascript have much fewer blocking functions than Python

Moving from Javascript to Python, and looking at asyncio has me a little confused.
As someone who is new to the fundamental concepts of concurrency, I just assumed a superficial understanding of Javascript concurrency.
A basic understanding from using async / await in Javascript:
If we run any processes inside an async function, and await the response of the function, we are essentially waiting for the function to set a value on the Promise.
Makes total sense - when the Promise is given a value, we can also use callbacks such as .then() to handle the response. Alternatively, just await.
Whatever the underlying implementation of asynchronicity here is (for example all processes running on a single thread with an event loop), should it matter how we interface with it?
Now, I move to Python and start playing with asyncio. We have Futures, just like Promises. All of a sudden, I can't use my standard libraries, such as request.get(...), but I need to use non blocking network requests in libraries such as aiohttp.
What does blocking / non-blocking mean here? I assume it means the single thread that the event loop is on is blocked, so we cant process other functions in parallel.
So my 2 questions then are:
What causes the single thread to be blocked? For example in requests.get(...)
Why are most functions non-blocking in Javascript, but not in Python (i.e we don't need specific libraries such as aiohttp).
And what about languages like Go with their goroutines? Is it just a case because its a new language with concurrency built in from the beginning, that the concept of blocking functions don't exist. Or in Go it's not a single thread, so everything can inherently be parallelised?
Thanks :)
Event loop
Javascript, and python's async io make use of a concurrency model based on event loops.
(Note the plural because you could have multiple event loops which handle different kinds of tasks e.g. disk io, network io, ipc, parallel computations etc)
The general concept of an event loop is that, you have a number of things to do, so you put those things in a queue, and once in a while (like every nanosecond), the event loop picks an event from the queue, and runs it for a short while (maybe a millisecond or so), and either shoves it back in the queue if it hasn't completed, or waits until it yields control back to the event loop.
Now to answer some of your questions:
What does blocking / non-blocking mean here? I assume it means the
single thread that the event loop is on is blocked, so we cant process
other functions in parallel.
Blocking event loop
Blocking the event loop occurs when the event loop is running a task, and the task has either not finished or given back control to the event-loop, for a period of time longer than the event loop has scheduled it to run.
In the case of python's requests library, they make use of a synchronous http library, which doesn't respect the event loop; Therefore, running such a task in the loop will starve other tasks which are patiently waiting their turn to run, until the request is finished.
Why are most functions non-blocking in Javascript, but not in Python
(i.e we don't need specific libraries such as aiohttp).
JS
Everything in Javascript can block the event loop. The only way not to block the event loop is to make heavy use of callbacks via setTimeout. However, if care is not taken, even those callbacks can block the event loop if they run too long without yielding control back to the event loop via another setTimeout call.
(If you've never had to use setTimeout, but have used promises and async network requests in JS, then you are probably making use of a library that does. Most of the popular networking libraries used in browsers (ajax, axios, fetch, etc), are based on the popular XMLHttpRequest API, which provides async network IO.)
Python
In python, the story is slightly different: Before asyncio, there was no such thing as as "event loop". Everything must run to completion before python interpreter moves on to the next thing. This is part of what makes python very easy to learn (and dare I say, to create...). The reason for this, comes in the form of the python GIL, which in simple terms enforces a single order of execution for any python program. I encourage you to click that link, and read why the GIL exists.
And what about languages like Go with their goroutines?
Note: I am not a go programmer, but I read something
How is Go different?
The only difference between the way go handles goroutines and how python asyncio/js do their event loops, is that go makes more use of os threads to ensure that threads are scheduled fairly and make full use of the machine they run in.
While js callbacks/asyncio tasks will often run in the same thread as the event loop, goroutines are able to run in seperate OS threads and over multiple cores, thus giving them higher availability and higher parallelism. (In that case, we could almost consider goroutines to be closer to OS threads in terms of how much time they actually get to run, as compared to green threads which are bound by the amount of time the event loop's thread runs.)

Do Timers run on their Own threads in Node.js?

I am a bit confused here I know Javascript is a single-threaded language but while reading about the event loop. I got to know that in case of setTimeout or setInterval javascript calls web API provided by the browser which spawns a new thread to execute timer on that thread. but what happens in the case of node.js environment with timers how do they execute/work?
No threads are used for timers in node.js.
Timers in node.js work in conjunction with the event loop and don't use a thread. Timers in node.js are stored in a sorted linked list with the next timer to fire at the start of the linked list. Each time through the event loop, it checks to see if the first timer in the linked list has reached its time. If so, it fires that timer. If not, it runs any other events that are waiting in the event loop.
On each subsequent cycle through the event loop, it keeps checking to see if its time for the next timer or not. When a new timer is added, it is inserted into the linked list in its proper sorted order. When it fires or is cancelled, it is removed from the linked list.
If the event loop has nothing to do, it may sleep for a brief period of time, but it won't sleep past the timer for the next timer.
Other references on the topic:
How does nodejs manage timers internally
Libuv timer code in nodejs
How many concurrent setTimeouts before performance issues?
Multiple Client Requests in NodeJs
Looking for a solution between setting lots of timers or using a scheduled task queue
Node runs on a single thread but asynchronous work happens elsewhere. For example, libuv provides a pool of 4 threads that it may use, but wont if there's a better option.
The node documentation says
Node.js runs JavaScript code in the Event Loop (initialization and callbacks), and offers a Worker Pool to handle expensive tasks like file I/O. Node.js scales well, sometimes better than more heavyweight approaches like Apache. The secret to the scalability of Node.js is that it uses a small number of threads to handle many clients. If Node.js can make do with fewer threads, then it can spend more of your system's time and memory working on clients rather than on paying space and time overheads for threads (memory, context-switching). But because Node.js has only a few threads, you must structure your application to use them wisely.
A more detailed look at the event loop
No. Timers are just scheduled on the same thread and will call their callbacks when the time expires.
Depending on what OS your are on and what javascript interpreters you use they will use various APIs form poll to epoll to kqueue to overlapped I/O on Windows but in general asynchronous APIs have similar features. So let's ignore platform differences and look at a cross-platform API that exists on all OSes: the POSIX select() system call.
The select function in C looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
Where nfds is total number of file descriptors (including network sockets) you are waiting/listening on, readfds is the list/set of read file descriptors you are waiting on, writefds is the list/set of write file descriptors, exceptfds is the list/set of error file descriptors (think stderr) and timeval is the timeout for the function.
This system call blocks - yes, in non-blocking, asynchronous code there is a piece of blocking system call. The main difference between non-blocking code and blocking threaded code is that the entire program blocks in only one place, the select() function (or whatever equivalent you use).
This function only returns if any of the file descriptors have activity on them or if the timeout expires.
By managing the timeout and calculating the next value of timeval you can implement a function like setTimeout
I've written much deeper explanations of how this works in answers to the following related questions:
I know that callback function runs asynchronously, but why?
Event Queuing in NodeJS
how node.js server is better than thread based server
Node js architecture and performance
Performance of NodeJS with large amount of callbacks
Does javascript process using an elastic racetrack algorithm
Is there any other way to implement a "listening" function without an infinite while loop?
I recommend you at least browse each of the answers I wrote above because they are almost all non-duplicates. They sometimes overlap but explain different aspects of asynchronous code execution.
The gist of it is that javascript does not execute code in parallel to implement timers. It doesn't need to. Instead it waits in parallel. Once you understand the difference between running code in parallel and waiting (doing nothing) in parallel you will understand how things like node.js achieve high performance and how events work better.

How is JavaScript never blocking? [duplicate]

I'm not a Node programmer, but I'm interested in how the single-threaded non-blocking IO model works.
After I read the article understanding-the-node-js-event-loop, I'm really confused about it.
It gave an example for the model:
c.query(
'SELECT SLEEP(20);',
function (err, results, fields) {
if (err) {
throw err;
}
res.writeHead(200, {'Content-Type': 'text/html'});
res.end('<html><head><title>Hello</title></head><body><h1>Return from async DB query</h1></body></html>');
c.end();
}
);
Que: When there are two requests A(comes first) and B since there is only a single thread, the server-side program will handle the request A firstly: doing SQL querying is asleep statement standing for I/O wait. And The program is stuck at the I/O waiting, and cannot execute the code which renders the web page behind. Will the program switch to request B during the waiting? In my opinion, because of the single thread model, there is no way to switch one request from another. But the title of the example code says that everything runs in parallel except your code.
(P.S I'm not sure if I misunderstand the code or not since I have
never used Node.)How Node switch A to B during the waiting? And can
you explain the single-threaded non-blocking IO model of Node in a
simple way? I would appreciate it if you could help me. :)
Node.js is built upon libuv, a cross-platform library that abstracts apis/syscalls for asynchronous (non-blocking) input/output provided by the supported OSes (Unix, OS X and Windows at least).
Asynchronous IO
In this programming model open/read/write operation on devices and resources (sockets, filesystem, etc.) managed by the file-system don't block the calling thread (as in the typical synchronous c-like model) and just mark the process (in kernel/OS level data structure) to be notified when new data or events are available. In case of a web-server-like app, the process is then responsible to figure out which request/context the notified event belongs to and proceed processing the request from there. Note that this will necessarily mean you'll be on a different stack frame from the one that originated the request to the OS as the latter had to yield to a process' dispatcher in order for a single threaded process to handle new events.
The problem with the model I described is that it's not familiar and hard to reason about for the programmer as it's non-sequential in nature. "You need to make request in function A and handle the result in a different function where your locals from A are usually not available."
Node's model (Continuation Passing Style and Event Loop)
Node tackles the problem leveraging javascript's language features to make this model a little more synchronous-looking by inducing the programmer to employ a certain programming style. Every function that requests IO has a signature like function (... parameters ..., callback) and needs to be given a callback that will be invoked when the requested operation is completed (keep in mind that most of the time is spent waiting for the OS to signal the completion - time that can be spent doing other work). Javascript's support for closures allows you to use variables you've defined in the outer (calling) function inside the body of the callback - this allows to keep state between different functions that will be invoked by the node runtime independently. See also Continuation Passing Style.
Moreover, after invoking a function spawning an IO operation the calling function will usually return control to node's event loop. This loop will invoke the next callback or function that was scheduled for execution (most likely because the corresponding event was notified by the OS) - this allows the concurrent processing of multiple requests.
You can think of node's event loop as somewhat similar to the kernel's dispatcher: the kernel would schedule for execution a blocked thread once its pending IO is completed while node will schedule a callback when the corresponding event has occured.
Highly concurrent, no parallelism
As a final remark, the phrase "everything runs in parallel except your code" does a decent job of capturing the point that node allows your code to handle requests from hundreds of thousands open socket with a single thread concurrently by multiplexing and sequencing all your js logic in a single stream of execution (even though saying "everything runs in parallel" is probably not correct here - see Concurrency vs Parallelism - What is the difference?). This works pretty well for webapp servers as most of the time is actually spent on waiting for network or disk (database / sockets) and the logic is not really CPU intensive - that is to say: this works well for IO-bound workloads.
Well, to give some perspective, let me compare node.js with apache.
Apache is a multi-threaded HTTP server, for each and every request that the server receives, it creates a separate thread which handles that request.
Node.js on the other hand is event driven, handling all requests asynchronously from single thread.
When A and B are received on apache, two threads are created which handle requests. Each handling the query separately, each waiting for the query results before serving the page. The page is only served until the query is finished. The query fetch is blocking because the server cannot execute the rest of thread until it receives the result.
In node, c.query is handled asynchronously, which means while c.query fetches the results for A, it jumps to handle c.query for B, and when the results arrive for A arrive it sends back the results to callback which sends the response. Node.js knows to execute callback when fetch finishes.
In my opinion, because it's a single thread model, there is no way to
switch from one request to another.
Actually the node server does exactly that for you all the time. To make switches, (the asynchronous behavior) most functions that you would use will have callbacks.
Edit
The SQL query is taken from mysql library. It implements callback style as well as event emitter to queue SQL requests. It does not execute them asynchronously, that is done by the internal libuv threads that provide the abstraction of non-blocking I/O. The following steps happen for making a query :
Open a connection to db, connection itself can be made asynchronously.
Once db is connected, query is passed on to the server. Queries can be queued.
The main event loop gets notified of the completion with callback or event.
Main loop executes your callback/eventhandler.
The incoming requests to http server are handled in the similar fashion. The internal thread architecture is something like this:
The C++ threads are the libuv ones which do the asynchronous I/O (disk or network). The main event loop continues to execute after the dispatching the request to thread pool. It can accept more requests as it does not wait or sleep. SQL queries/HTTP requests/file system reads all happen this way.
Node.js uses libuv behind the scenes. libuv has a thread pool (of size 4 by default). Therefore Node.js does use threads to achieve concurrency.
However, your code runs on a single thread (i.e., all of the callbacks of Node.js functions will be called on the same thread, the so called loop-thread or event-loop). When people say "Node.js runs on a single thread" they are really saying "the callbacks of Node.js run on a single thread".
Node.js is based on the event loop programming model. The event loop runs in single thread and repeatedly waits for events and then runs any event handlers subscribed to those events. Events can be for example
timer wait is complete
next chunk of data is ready to be written to this file
theres a fresh new HTTP request coming our way
All of this runs in single thread and no JavaScript code is ever executed in parallel. As long as these event handlers are small and wait for yet more events themselves everything works out nicely. This allows multiple request to be handled concurrently by a single Node.js process.
(There's a little bit magic under the hood as where the events originate. Some of it involve low level worker threads running in parallel.)
In this SQL case, there's a lot of things (events) happening between making the database query and getting its results in the callback. During that time the event loop keeps pumping life into the application and advancing other requests one tiny event at a time. Therefore multiple requests are being served concurrently.
According to: "Event loop from 10,000ft - core concept behind Node.js".
The function c.query() has two argument
c.query("Fetch Data", "Post-Processing of Data")
The operation "Fetch Data" in this case is a DB-Query, now this may be handled by Node.js by spawning off a worker thread and giving it this task of performing the DB-Query. (Remember Node.js can create thread internally). This enables the function to return instantaneously without any delay
The second argument "Post-Processing of Data" is a callback function, the node framework registers this callback and is called by the event loop.
Thus the statement c.query (paramenter1, parameter2) will return instantaneously, enabling node to cater for another request.
P.S: I have just started to understand node, actually I wanted to write this as comment to #Philip but since didn't have enough reputation points so wrote it as an answer.
if you read a bit further - "Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code."
about - "everything runs in parallel except your code" - your code is executed synchronously, whenever you invoke an asynchronous operation such as waiting for IO, the event loop handles everything and invokes the callback. it just not something you have to think about.
in your example: there are two requests A (comes first) and B. you execute request A, your code continue to run synchronously and execute request B. the event loop handles request A, when it finishes it invokes the callback of request A with the result, same goes to request B.
Okay, most things should be clear so far... the tricky part is the SQL: if it is not in reality running in another thread or process in it’s entirety, the SQL-execution has to be broken down into individual steps (by an SQL processor made for asynchronous execution!), where the non-blocking ones are executed, and the blocking ones (e.g. the sleep) actually can be transferred to the kernel (as an alarm interrupt/event) and put on the event list for the main loop.
That means, e.g. the interpretation of the SQL, etc. is done immediately, but during the wait (stored as an event to come in the future by the kernel in some kqueue, epoll, ... structure; together with the other IO operations) the main loop can do other things and eventually check if something happened of those IOs and waits.
So, to rephrase it again: the program is never (allowed to get) stuck, sleeping calls are never executed. Their duty is done by the kernel (write something, wait for something to come over the network, waiting for time to elapse) or another thread or process. – The Node process checks if at least one of those duties is finished by the kernel in the only blocking call to the OS once in each event-loop-cycle. That point is reached, when everything non-blocking is done.
Clear? :-)
I don’t know Node. But where does the c.query come from?
The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible. Think of event loop as the manager.
New requests are sent into a queue and watched by the synchronous event demultiplexer. As you see each operations handler is also registered.
Then those requests are sent to the thread pool (Worker Pool) synchronously to be executed. JavaScript cannot perform asynchronous I/O operations. In browser environment, browser handles the async operations. In node environment, async operations are handled by the libuv by using C++. Thread's pool default size is 4, but it can be changed at startup time by setting the UV_THREADPOOL_SIZE environment variable to any value (maximum is 128). thread pool size 4 means 4 requests can get executed at a time, if event demultiplexer has 5 requsts, 4 would be passed to thread pool and 5th would be waiting. Once each request gets executed, result is returned to the `event demultiplexer.
When a set of I/O operations completes, the Event Demultiplexer pushes a set of corresponding events into the Event Queue.
handler is the callback. Now event loop keeps an eye on the event queue, if there is something ready, it is pushed to stack to execute the callback. Remember eventually callbacks get executed on stack. Note that some callbacks has priorities on other, the event loop does pick the callbacks based on their priorities.
For those who seek short answer and don't want to go to the deepest levels of Node.js internals.
Node.js is not single threaded, it runs on 5 threads by default.
Yes, the only single thread is for actual JavaScript processing, but it always switches from function to function.
It sends SQL query to a database and lets it wait in other thread, while single threaded Node.js continues to compute some other code ready to be computed.
If you wish more explanations, there are good articles about Event Loop, Worker Pool and the whole libuv documentation.

Javascript/Nodejs how readFile is implemented [duplicate]

So I have an understanding of how Node.js works: it has a single listener thread that receives an event and then delegates it to a worker pool. The worker thread notifies the listener once it completes the work, and the listener then returns the response to the caller.
My question is this: if I stand up an HTTP server in Node.js and call sleep on one of my routed path events (such as "/test/sleep"), the whole system comes to a halt. Even the single listener thread. But my understanding was that this code is happening on the worker pool.
Now, by contrast, when I use Mongoose to talk to MongoDB, DB reads are an expensive I/O operation. Node seems to be able to delegate the work to a thread and receive the callback when it completes; the time taken to load from the DB does not seem to block the system.
How does Node.js decide to use a thread pool thread vs the listener thread? Why can't I write event code that sleeps and only blocks a thread pool thread?
Your understanding of how node works isn't correct... but it's a common misconception, because the reality of the situation is actually fairly complex, and typically boiled down to pithy little phrases like "node is single threaded" that over-simplify things.
For the moment, we'll ignore explicit multi-processing/multi-threading through cluster and webworker-threads, and just talk about typical non-threaded node.
Node runs in a single event loop. It's single threaded, and you only ever get that one thread. All of the javascript you write executes in this loop, and if a blocking operation happens in that code, then it will block the entire loop and nothing else will happen until it finishes. This is the typically single threaded nature of node that you hear so much about. But, it's not the whole picture.
Certain functions and modules, usually written in C/C++, support asynchronous I/O. When you call these functions and methods, they internally manage passing the call on to a worker thread. For instance, when you use the fs module to request a file, the fs module passes that call on to a worker thread, and that worker waits for its response, which it then presents back to the event loop that has been churning on without it in the meantime. All of this is abstracted away from you, the node developer, and some of it is abstracted away from the module developers through the use of libuv.
As pointed out by Denis Dollfus in the comments (from this answer to a similar question), the strategy used by libuv to achieve asynchronous I/O is not always a thread pool, specifically in the case of the http module a different strategy appears to be used at this time. For our purposes here it's mainly important to note how the asynchronous context is achieved (by using libuv) and that the thread pool maintained by libuv is one of multiple strategies offered by that library to achieve asynchronicity.
On a mostly related tangent, there is a much deeper analysis of how node achieves asynchronicity, and some related potential problems and how to deal with them, in this excellent article. Most of it expands on what I've written above, but additionally it points out:
Any external module that you include in your project that makes use of native C++ and libuv is likely to use the thread pool (think: database access)
libuv has a default thread pool size of 4, and uses a queue to manage access to the thread pool - the upshot is that if you have 5 long-running DB queries all going at the same time, one of them (and any other asynchronous action that relies on the thread pool) will be waiting for those queries to finish before they even get started
You can mitigate this by increasing the size of the thread pool through the UV_THREADPOOL_SIZE environment variable, so long as you do it before the thread pool is required and created: process.env.UV_THREADPOOL_SIZE = 10;
If you want traditional multi-processing or multi-threading in node, you can get it through the built in cluster module or various other modules such as the aforementioned webworker-threads, or you can fake it by implementing some way of chunking up your work and manually using setTimeout or setImmediate or process.nextTick to pause your work and continue it in a later loop to let other processes complete (but that's not recommended).
Please note, if you're writing long running/blocking code in javascript, you're probably making a mistake. Other languages will perform much more efficiently.
So I have an understanding of how Node.js works: it has a single listener thread that receives an event and then delegates it to a worker pool. The worker thread notifies the listener once it completes the work, and the listener then returns the response to the caller.
This is not really accurate. Node.js has only a single "worker" thread that does javascript execution. There are threads within node that handle IO processing, but to think of them as "workers" is a misconception. There are really just IO handling and a few other details of node's internal implementation, but as a programmer you cannot influence their behavior other than a few misc parameters such as MAX_LISTENERS.
My question is this: if I stand up an HTTP server in Node.js and call sleep on one of my routed path events (such as "/test/sleep"), the whole system comes to a halt. Even the single listener thread. But my understanding was that this code is happening on the worker pool.
There is no sleep mechanism in JavaScript. We could discuss this more concretely if you posted a code snippet of what you think "sleep" means. There's no such function to call to simulate something like time.sleep(30) in python, for example. There's setTimeout but that is fundamentally NOT sleep. setTimeout and setInterval explicitly release, not block, the event loop so other bits of code can execute on the main execution thread. The only thing you can do is busy loop the CPU with in-memory computation, which will indeed starve the main execution thread and render your program unresponsive.
How does Node.js decide to use a thread pool thread vs the listener thread? Why can't I write event code that sleeps and only blocks a thread pool thread?
Network IO is always asynchronous. End of story. Disk IO has both synchronous and asynchronous APIs, so there is no "decision". node.js will behave according to the API core functions you call sync vs normal async. For example: fs.readFile vs fs.readFileSync. For child processes, there are also separate child_process.exec and child_process.execSync APIs.
Rule of thumb is always use the asynchronous APIs. The valid reasons to use the sync APIs are for initialization code in a network service before it is listening for connections or in simple scripts that do not accept network requests for build tools and that kind of thing.
Thread pool how when and who used:
First off when we use/install Node on a computer, it starts a process among other processes which is called node process in the computer, and it keeps running until you kill it. And this running process is our so-called single thread.
So the mechanism of single thread it makes easy to block a node application but this is one of the unique features that Node.js brings to the table. So, again if you run your node application, it will run in just a single thread. No matter if you have 1 or million users accessing your application at the same time.
So let's understand exactly what happens in the single thread of nodejs when you start your node application. At first the program is initialized, then all the top-level code is executed, which means all the codes that are not inside any callback function (remember all codes inside all callback functions will be executed under event loop).
After that, all the modules code executed then register all the callback, finally, event loop started for your application.
So as we discuss before all the callback functions and codes inside those functions will execute under event loop. In the event loop, loads are distributed in different phases. Anyway, I'm not going to discuss about event loop here.
Well for the sack of better understanding of Thread pool I a requesting you to imagine that in the event loop, codes inside of one callback function execute after completing execution of codes inside another callback function, now if there are some tasks are actually too heavy. They would then block our nodejs single thread. And so, that's where the thread pool comes in, which is just like the event loop, is provided to Node.js by the libuv library.
So the thread pool is not a part of nodejs itself, it's provided by libuv to offload heavy duties to libuv, and libuv will execute those codes in its own threads and after execution libuv will return the results to the event in the event loop.
Thread pool gives us four additional threads, those are completely separate from the main single thread. And we can actually configure it up to 128 threads.
So all these threads together formed a thread pool. and the event loop can then automatically offload heavy tasks to the thread pool.
The fun part is all this happens automatically behind the scenes. It's not us developers who decide what goes to the thread pool and what doesn't.
There are many tasks goes to the thread pool, such as
-> All operations dealing with files
->Everyting is related to cryptography, like caching passwords.
->All compression stuff
->DNS lookups
This misunderstanding is merely the difference between pre-emptive multi-tasking and cooperative multitasking...
The sleep turns off the entire carnival because there is really one line to all the rides, and you closed the gate. Think of it as "a JS interpreter and some other things" and ignore the threads...for you, there is only one thread, ...
...so don't block it.

Why is node.js asynchronous?

Nobody has actually asked this (from all the 'suggestions' I'm getting and also from searching before I asked here).
So why is node.js asynchronous?
From what I have deduced after some research:
Languages like PHP and Python are scripting languages (I could be wrong about the actual languages that are scripting languages) whilst JavaScript isn't. (I suppose this derives from the fact that JS doesn't compile?)
Node.js runs on a single thread whilst scripting languages use multiple threads.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Maybe the answer is found somewhere stated above, but I'm still not sure.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
PS. I know some of you will ask "why would you want to make JS synchronous?" in your answers, but the truth is that I don't. I'm just asking these types of questions because I'm sure there are more people out there than just myself that have thought about such questions.
Node.js runs on a single thread whilst scripting languages use multiple threads.
Not technically. Node.js uses several threads, but only one execution thread. The background threads are for dealing with IO to make all of the asynchronous goodness work. Dealing with threads efficiently is a royal pain, so the next best option is to run in an event loop so code can run while background threads are blocked on IO.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Not necessarily. You can preserve state in an asynchronous system pretty easily. For example, in Javascript, you can use bind() to bind a this to a function, thereby preserving state explicitly when the function returns:
function State() {
// make sure that whenever doStuff is called it maintains its state
this.doStuff = this.doStuff.bind(this);
}
State.prototype.doStuff = function () {
};
Asynchronous means not waiting for an operation to finish, but registering a listener instead. This happens all the time in other languages, notably anything that needs to accept input from the user. For example, in a Java GUI, you don't block waiting for the user to press a button, but you register a listener with the GUI.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
Technically, all languages are synchronous, even Javascript. However, Javascript works a lot better in an asynchronous design because it was designed to be single threaded.
Basically there are two types of programs:
CPU bound- the only way to make it go faster is to get more CPU time
IO bound- spends a lot of time waiting for data, so a faster processor won't matter
Video games, number crunchers and compilers are CPU bound, whereas web servers and GUIs are generally IO bound. Javascript is relatively slow (because of how complex it is), so it wouldn't be able to compete in a CPU bound scenario (trust me, I've written my fair share of CPU-bound Javascript).
Instead of coding in terms of classes and objects, Javascript lends itself to coding in terms of simple functions that can be strung together. This works very well in asynchronous design, because algorithms can be written to process data incrementally as it comes in. IO (especially network IO) is very slow, so there's quite a bit of time between packets of data.
Example
Let's suppose you have 1000 live connections, each delivering a packet every millisecond, and processing each packet takes 1 microsecond (very reasonable). Let's also assume each connection sends 5 packets.
In a single-threaded, synchronous application, each connection will be handled in series. The total time taken is (5*1 + 5*.001) * 1000 milliseconds, or ~5005 milliseconds.
In a single-threaded, asynchronous application, each connection will be handled in parallel. Since every packet takes 1 millisecond, and processing each packet takes .001 milliseconds, we can process every connection's packet between packets, so our formula becomes: 1000*.001 + 5*1 milliseconds, or ~6 milliseconds.
The traditional solution to this problem was to create more threads. This solved the IO problem, but then when the number of connections rose, so did the memory usage (threads cost lots of memory) and CPU usage (multiplexing 100 threads onto 1 core is harder than 1 thread on 1 core).
However, there are downsides. If your web application happens to also need to do some heavy number crunching, you're SOL because while you're crunching numbers, connections need to wait. Threading solves this because the OS can swap out your CPU-intensive task when data is ready for a thread waiting on IO. Also, node.js is bound to a single core, so you can't take advantage of your multi-core processor unless you spin up multiple instances and proxy requests.
Javascript does not compile into anything. It's "evaluated" at runtime, just like PHP & Ruby. Therefore it is a scripting language just like PHP/Ruby. (it's official name is actually ECMAScript).
The 'model' that Node adheres to is a bit different than PHP/Ruby. Node.js uses an 'event loop' (the single thread) that has the one goal of taking network requests and handling them very quickly, and if for any reason it encounters an operation that takes a while (API request, database query -- basically anything involving I.O. (input/output)) it passes that off to a background 'worker' thread and goes off to do something else while the worker thread waits for the long task to complete. When that happens the main 'event loop' will take the results and continue deal with them.
PHP/Ruby following a threading model. Essentially, for every incoming network request, the application server spins up an isloated thread or process to handle the request. This does not scale tremendously well and Node's approach is cited as one of its core strengths compared to this model.
Asynchronous means stateless and that the connection is persistent
whilst synchronous is the (almost) opposite.
No. Synchronous instructions are completed in a natural order, from first to last. Asynchronous instructions mean that if a step in the flow of a program takes a relatively long time, the program will continue executing operations and simply return to this operation when complete.
Could JavaScript be made into a synchronous language?
Certain operations in JavaScript are synchronous. Others are asynchronous.
For example:
Blocking operations:
for(var k = 0; k < 1; k = k - 1;){
alert('this will quickly get annoying and the loop will block execution')
alert('this is blocked and will never happen because the above loop is infinite');
Asynchronous:
jQuery.get('/foo', function (result) { alert('This will occur 2nd, asynchronously'); });
alert('This will occur 1st. The above operation was skipped over and execution continued until the above operation completes.');
Could JavaScript be made into a synchronous language?
Javascript is not an "asynchronous language"; rather, node.js has a lot of asynchronous APIs. Asynchronous-ness is a property of the API and not the language. The ease with which functions can be created and passed around in javascript makes it convenient to pass callback functions, which is one way to handle control flow in an asynchronous API, but there's nothing inherently asynchronous about javascript. Javascript can easily support synchronous APIs.
Why is node.js asynchronous?
Node.js favors asynchronous APIs because it is single-threaded. This allows it to efficiently manage its own resources, but requires that long-running operations be non-blocking, and asynchronous APIs are a way to allow for control of flow with lots of non-blocking operations.

Categories

Resources