How does Node.js process incoming requests? - javascript

In order to dive into more complex concepts about Node.js, I am doing some research to make sure I understand the principles about the language and the basic building blocks it´s build upon.
As far as I´m concerned, Node.js relies in the reactor pattern to process each incoming request. This algorithm is based in the Event Demultiplexer algorithm. The second one is in charge of processing the request operation and it´s resources and then, once the operation is finished, it adds the 'result' to the Event Queue. After this process, the Event Loop is now in charge of executing the Handlers for each event.
As a single threaded process, I´m struggling to understand how does the Event demultiplexer handle all the incoming operations if the event loop is managing all the event queue tasks in parallel...
I´ve found this in the Node.js documentation:
Since most modern kernels are multi-threaded, they can handle multiple
operations executing in the background. When one of these operations
completes, the kernel tells Node.js so that the appropriate callback
may be added to the poll queue to eventually be executed. We'll
explain this in further detail later in this topic.
Does this mean that the Event Demultiplexer tasks are not handled by Node and that Node just gets the result in order to add it to the Event Queue? Doesn´t that mean that Node is not single threaded?
I've found some useful graphics on the web that clearly explain the path that each request follows, but It´s really hard to find some explaining the actual timing in which the thread processes each one.

In node.js, it runs your Javascript in a single thread (assuming we're not talking about worker threads or clustering). But, many asynchronous operations in the node.js built-in library such as file I/O or some crypto operations use their own threads to accomplish their task. So, when you call an asynchronous operations such as fs.open() to open a file, it transitions to native code, grabs a thread from an internal thread pool and that thread goes about opening the file. The fs.open() function then returns back to your Javascript (while the thread continues in the background). Sometime later when it finishes its task, the internal thread inserts an event into the nodejs event queue and when nodejs has cycles to check the event queue, it will find that event and run the Javascript callback associated with it to provide the asynchronous result back to your Javascript.
So, even though threads may be involved, your Javascript is all still single threaded through the event loop.
So, nodejs does use native code threads for some internal native code operations. Other things like networking and timers are implemented without threads.
Then if I send a request to fetch data from a database in Node, how does the Event Demultiplexer process It? Does the main thread stop the event loop to handle that operation and then resumes after the event handler is queued?
These terms like "Event Demultiplexor" are things you are applying to node.js. They are not something that node.js uses at all so I'm not entirely sure what you're asking.
Node.js runs one event at a time. It has no interrupt-driven abilities. So, it runs one event until that event returns control back to the event loop (by issuing a return from the callback that started everything). Now, the original event may not be complete - it may be doing something asynchronous that will trigger another event to announce completion, but it has finished running Javascript for now and has returned control back to the event loop.
When an incoming Fetch request (which is just an incoming http request) arrives at the server, it is first queued by the OS. Then, when the event loop has a chance to see it, it is added to the node.js event queue and is served in FIFO order when other events that were before it are done processing. Now, the nodejs event loop is not as simple as a single event queue. It actually has several different queues for different types of work and has priorities for what gets to run first, but at the simplest level, you can start out thinking of it as a single FIFO queue. And, nothing is ever interrupted.
Pull an event out of the event queue, run the Javascript callback associated with it. When that callback returns back to the event loop, get the next event and do the same.
Events may be added to the event queue either by native code threads (like might be done with file I/O or some crypto operations) or via some polling mechanisms built into the event loop as part of the event loop cycle it checks for certain things that are ready to run (as with networking and timers).

Related

What is the difference between the event loop in JavaScript and async non-blocking I/O in Node.js?

In this answer to the question -
What is non-blocking or asynchronous I/O in Node.js?
the description sounds no different from the event loop in vanilla js. Is there a difference between the two? If not, is the Event loop simply re-branded as "Asynchronous non-blocking I/O" to sell Node.js over other options more easily?
The event loop is the mechanism. Asynchronous I/O is the goal.
Asynchronous I/O is a style of programming in which I/O calls do not wait for the operation to complete before returning, but merely arrange for the caller to be notified when that happens, and for the result to be returned somewhere. In JavaScript, the notification is usually performed by invoking a callback or resolving a promise. As far as the programmer is concerned, it doesn’t matter how this happens: it just does. I request the operation, and when it’s done, I get notified about it.
An event loop is how this is usually achieved. The thing is, in most JavaScript implementations, there is literally a loop somewhere that ultimately boils down to:
while (poll_event(&ev)) {
dispatch_event(&ev);
}
Performing an asynchronous operation is then done by arranging for the completion of the operation to be received as an event by that loop, and having it dispatch to a callback of the caller’s choice.
There are ways to achieve asynchronous programming not based on an event loop, for example using threads and condition variables. But historical reasons make this programming style quite difficult to realise in JavaScript. So in practice, the predominant implementation of asynchrony in JavaScript is based on dispatching callbacks from a global event loop.
Put another way, ‘the event loop’ describes what the host does, while ‘asynchronous I/O’ describes what the programmer does.
From a non-programmer’s bird’s eye view this may seem like splitting hairs, but the distinction can be occasionally important.
There are 2 different Event Loops:
Browser Event Loop
NodeJS Event Loop
Browser Event Loop
The Event Loop is a process that runs continually, executing any task queued. It has multiple task sources which guarantees execution order within that source, but the Browser gets to pick which source to take a task from on each turn of the loop. This allows Browser to give preference to performance sensitive tasks such as user-input.
There are a few different steps that Browser Event Loop checks continuously:
Task Queue - There can be multiple task queues. Browser can execute queues in any order they like. Tasks in the same queue must be executed in the order they arrived, first in - first out. Tasks execute in order, and the Browser may render between tasks. Task from the same source must go in the same queue. The important thing is that task is going to run from start to finish. After each task, Event Loop will go to Microtask Queue and do all tasks from there.
Microtasks Queue - The microtask queue is processed at the end of each task. Any additional microtasks queued during during microtasks are added to the end of the queue and are also processed.
Animation Callback Queue - The animation callback queue is processed before pixels repaint. All animation tasks from the queue will be processed, but any additional animation tasks queued during animation tasks will be scheduled for the next frame.
Rendering Pipeline - In this step, rendering will happen. The Browser gets to decide when to do this and it tried to be as efficient as possible. The rendering steps only happen if there is something actually worth updating. The majority of screens update at a set frequency, in most cases 60 times a second (60Hz). So, if we would change page style 1000 times a second, rendering steps would not get processed 1000 times a second, but instead it would synchronize itself with the display and only render up to a frequency display is capable of.
Important thing to mention are Web APIs, that are effectively threads. So, for example setTimeout() is an API provided to us by Browser. When you call setTimeout() Web API would take over and process it, and it will return the result to the main thread as a new task in a task queue.
The best video I found that describes how Event Loops works is this one. It helped me a lot when I was investigating how Event Loop works. Another great videos are this one and this one. You should definitely check all of them.
NodeJS Event Loop
NodeJS Event Loop allows NodeJS to perform non-blocking operations by offloading operation to the system kernel whenever possible. Most modern kernels are multi-threaded and they can perform multiple operations in the background. When one of these operations completes, the kernel tells NodeJS.
Library that provides the Event Loop to NodeJS is called Libuv. It will by default create something called Thread Pool with 4 threads to offload asynchronous work to. If you want, you can also change the number of threads in the Thread Pool.
NodeJS Event Loop goes through different phases:
timers - this phase executes callbacks scheduled by setTimeout() and setInterval().
pending callbacks - executes I/O callbacks deferred to the next loop iteration.
idle, prepare - only used internally.
poll - retrieve new I/O events; execute I/O related callbacks (almost all with the exception of close callbacks, the ones scheduled by timers, and setImmediate()) Node will block here when appropriate.
check - setImmediate() callbacks are invoked here.
close callbacks - some close callbacks, e.g. socket.on('close', ...).
Between each run of the event loop, Node.js checks if it is waiting for any asynchronous I/O or timers and shuts down cleanly if there are not any.
In Browser, we had Web APIs. In NodeJS, we have C++ APIs with the same rule.
I found this video to be useful if you want to check for more information.
For years, JavaScript has been limited to client-side applications
such as interactive web applications that run on the browser. Using
NodeJS, JavaScript can be used to develop server-side applications as
well. Though it’s the same programming language which is used in both
use cases, client-side and server-side have different requirements.
“Event Loop” is a generic programming pattern and JavaScript/NodeJS
event loops are no different. The event loop continuously watches for
any queued event handlers and will process them accordingly.
The “Events”, in a browser’s context, are user interactions on web
pages (e.g, clicks, mouse movements, keyboard events etc.), but in
Node’s context, events are asynchronous server-side operations (e.g,
File I/O access, Network I/O etc.)

Listening to events in JS

I have heard a lot about how JS is single threaded and asynchronous, and I know about the event loop, and the callback queue.
What I don't understand is how can a single threaded language be listening to events and adding event handlers to the queue while it executes other code?
For example clicking a button while a long loop is running will add its callback to the queue even though the thread is occupied.
Thank You
The answer is that the OS (or Javascript, for code-generated events) sends events to the JS in the browser, and these events are put on a queue for for processing. Javascript's one thread takes events off the queue and processes them until there are none left, at which time it waits for the next event to come in. For more, read about message queues: https://en.wikipedia.org/wiki/Message_queue
Simply put, browsers are not written in javascript, and javascript is not what handles events.
UI events are first messaged to the browser by the OS, then the browser will handle these messages and queue a task to build up the DOM Event and dispatch it, where the related JavaScript callbacks may finally fire.
Only the last part involves JavaScript. You can very well have an User Agent (UA) that doesn't implement JavaScript but still does handle some UI events (e.g CSS pointer-events, HTML inputs, media controls etc.)
How this is handled will depend on the UA.
For very long, in the exact example you gave, browsers were unable to handle the clicks that were fired while the event loop was busy (IIRC latest IE still wasn't able to do so). It's only in modern browsers that they started to use multi-process architectures which are able to communicate (Inter-Process Communication) by sharing different message queues.
So even though the "main" renderer thread that does execute the various tasks of the event loop is busy, e.g executing js, other processes can still run in parallel and queue new tasks on the appropriate task queue.
When the renderer thread is done with its long task, a new task has been queued and it can process it.

How is JavaScript never blocking? [duplicate]

I'm not a Node programmer, but I'm interested in how the single-threaded non-blocking IO model works.
After I read the article understanding-the-node-js-event-loop, I'm really confused about it.
It gave an example for the model:
c.query(
'SELECT SLEEP(20);',
function (err, results, fields) {
if (err) {
throw err;
}
res.writeHead(200, {'Content-Type': 'text/html'});
res.end('<html><head><title>Hello</title></head><body><h1>Return from async DB query</h1></body></html>');
c.end();
}
);
Que: When there are two requests A(comes first) and B since there is only a single thread, the server-side program will handle the request A firstly: doing SQL querying is asleep statement standing for I/O wait. And The program is stuck at the I/O waiting, and cannot execute the code which renders the web page behind. Will the program switch to request B during the waiting? In my opinion, because of the single thread model, there is no way to switch one request from another. But the title of the example code says that everything runs in parallel except your code.
(P.S I'm not sure if I misunderstand the code or not since I have
never used Node.)How Node switch A to B during the waiting? And can
you explain the single-threaded non-blocking IO model of Node in a
simple way? I would appreciate it if you could help me. :)
Node.js is built upon libuv, a cross-platform library that abstracts apis/syscalls for asynchronous (non-blocking) input/output provided by the supported OSes (Unix, OS X and Windows at least).
Asynchronous IO
In this programming model open/read/write operation on devices and resources (sockets, filesystem, etc.) managed by the file-system don't block the calling thread (as in the typical synchronous c-like model) and just mark the process (in kernel/OS level data structure) to be notified when new data or events are available. In case of a web-server-like app, the process is then responsible to figure out which request/context the notified event belongs to and proceed processing the request from there. Note that this will necessarily mean you'll be on a different stack frame from the one that originated the request to the OS as the latter had to yield to a process' dispatcher in order for a single threaded process to handle new events.
The problem with the model I described is that it's not familiar and hard to reason about for the programmer as it's non-sequential in nature. "You need to make request in function A and handle the result in a different function where your locals from A are usually not available."
Node's model (Continuation Passing Style and Event Loop)
Node tackles the problem leveraging javascript's language features to make this model a little more synchronous-looking by inducing the programmer to employ a certain programming style. Every function that requests IO has a signature like function (... parameters ..., callback) and needs to be given a callback that will be invoked when the requested operation is completed (keep in mind that most of the time is spent waiting for the OS to signal the completion - time that can be spent doing other work). Javascript's support for closures allows you to use variables you've defined in the outer (calling) function inside the body of the callback - this allows to keep state between different functions that will be invoked by the node runtime independently. See also Continuation Passing Style.
Moreover, after invoking a function spawning an IO operation the calling function will usually return control to node's event loop. This loop will invoke the next callback or function that was scheduled for execution (most likely because the corresponding event was notified by the OS) - this allows the concurrent processing of multiple requests.
You can think of node's event loop as somewhat similar to the kernel's dispatcher: the kernel would schedule for execution a blocked thread once its pending IO is completed while node will schedule a callback when the corresponding event has occured.
Highly concurrent, no parallelism
As a final remark, the phrase "everything runs in parallel except your code" does a decent job of capturing the point that node allows your code to handle requests from hundreds of thousands open socket with a single thread concurrently by multiplexing and sequencing all your js logic in a single stream of execution (even though saying "everything runs in parallel" is probably not correct here - see Concurrency vs Parallelism - What is the difference?). This works pretty well for webapp servers as most of the time is actually spent on waiting for network or disk (database / sockets) and the logic is not really CPU intensive - that is to say: this works well for IO-bound workloads.
Well, to give some perspective, let me compare node.js with apache.
Apache is a multi-threaded HTTP server, for each and every request that the server receives, it creates a separate thread which handles that request.
Node.js on the other hand is event driven, handling all requests asynchronously from single thread.
When A and B are received on apache, two threads are created which handle requests. Each handling the query separately, each waiting for the query results before serving the page. The page is only served until the query is finished. The query fetch is blocking because the server cannot execute the rest of thread until it receives the result.
In node, c.query is handled asynchronously, which means while c.query fetches the results for A, it jumps to handle c.query for B, and when the results arrive for A arrive it sends back the results to callback which sends the response. Node.js knows to execute callback when fetch finishes.
In my opinion, because it's a single thread model, there is no way to
switch from one request to another.
Actually the node server does exactly that for you all the time. To make switches, (the asynchronous behavior) most functions that you would use will have callbacks.
Edit
The SQL query is taken from mysql library. It implements callback style as well as event emitter to queue SQL requests. It does not execute them asynchronously, that is done by the internal libuv threads that provide the abstraction of non-blocking I/O. The following steps happen for making a query :
Open a connection to db, connection itself can be made asynchronously.
Once db is connected, query is passed on to the server. Queries can be queued.
The main event loop gets notified of the completion with callback or event.
Main loop executes your callback/eventhandler.
The incoming requests to http server are handled in the similar fashion. The internal thread architecture is something like this:
The C++ threads are the libuv ones which do the asynchronous I/O (disk or network). The main event loop continues to execute after the dispatching the request to thread pool. It can accept more requests as it does not wait or sleep. SQL queries/HTTP requests/file system reads all happen this way.
Node.js uses libuv behind the scenes. libuv has a thread pool (of size 4 by default). Therefore Node.js does use threads to achieve concurrency.
However, your code runs on a single thread (i.e., all of the callbacks of Node.js functions will be called on the same thread, the so called loop-thread or event-loop). When people say "Node.js runs on a single thread" they are really saying "the callbacks of Node.js run on a single thread".
Node.js is based on the event loop programming model. The event loop runs in single thread and repeatedly waits for events and then runs any event handlers subscribed to those events. Events can be for example
timer wait is complete
next chunk of data is ready to be written to this file
theres a fresh new HTTP request coming our way
All of this runs in single thread and no JavaScript code is ever executed in parallel. As long as these event handlers are small and wait for yet more events themselves everything works out nicely. This allows multiple request to be handled concurrently by a single Node.js process.
(There's a little bit magic under the hood as where the events originate. Some of it involve low level worker threads running in parallel.)
In this SQL case, there's a lot of things (events) happening between making the database query and getting its results in the callback. During that time the event loop keeps pumping life into the application and advancing other requests one tiny event at a time. Therefore multiple requests are being served concurrently.
According to: "Event loop from 10,000ft - core concept behind Node.js".
The function c.query() has two argument
c.query("Fetch Data", "Post-Processing of Data")
The operation "Fetch Data" in this case is a DB-Query, now this may be handled by Node.js by spawning off a worker thread and giving it this task of performing the DB-Query. (Remember Node.js can create thread internally). This enables the function to return instantaneously without any delay
The second argument "Post-Processing of Data" is a callback function, the node framework registers this callback and is called by the event loop.
Thus the statement c.query (paramenter1, parameter2) will return instantaneously, enabling node to cater for another request.
P.S: I have just started to understand node, actually I wanted to write this as comment to #Philip but since didn't have enough reputation points so wrote it as an answer.
if you read a bit further - "Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code."
about - "everything runs in parallel except your code" - your code is executed synchronously, whenever you invoke an asynchronous operation such as waiting for IO, the event loop handles everything and invokes the callback. it just not something you have to think about.
in your example: there are two requests A (comes first) and B. you execute request A, your code continue to run synchronously and execute request B. the event loop handles request A, when it finishes it invokes the callback of request A with the result, same goes to request B.
Okay, most things should be clear so far... the tricky part is the SQL: if it is not in reality running in another thread or process in it’s entirety, the SQL-execution has to be broken down into individual steps (by an SQL processor made for asynchronous execution!), where the non-blocking ones are executed, and the blocking ones (e.g. the sleep) actually can be transferred to the kernel (as an alarm interrupt/event) and put on the event list for the main loop.
That means, e.g. the interpretation of the SQL, etc. is done immediately, but during the wait (stored as an event to come in the future by the kernel in some kqueue, epoll, ... structure; together with the other IO operations) the main loop can do other things and eventually check if something happened of those IOs and waits.
So, to rephrase it again: the program is never (allowed to get) stuck, sleeping calls are never executed. Their duty is done by the kernel (write something, wait for something to come over the network, waiting for time to elapse) or another thread or process. – The Node process checks if at least one of those duties is finished by the kernel in the only blocking call to the OS once in each event-loop-cycle. That point is reached, when everything non-blocking is done.
Clear? :-)
I don’t know Node. But where does the c.query come from?
The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible. Think of event loop as the manager.
New requests are sent into a queue and watched by the synchronous event demultiplexer. As you see each operations handler is also registered.
Then those requests are sent to the thread pool (Worker Pool) synchronously to be executed. JavaScript cannot perform asynchronous I/O operations. In browser environment, browser handles the async operations. In node environment, async operations are handled by the libuv by using C++. Thread's pool default size is 4, but it can be changed at startup time by setting the UV_THREADPOOL_SIZE environment variable to any value (maximum is 128). thread pool size 4 means 4 requests can get executed at a time, if event demultiplexer has 5 requsts, 4 would be passed to thread pool and 5th would be waiting. Once each request gets executed, result is returned to the `event demultiplexer.
When a set of I/O operations completes, the Event Demultiplexer pushes a set of corresponding events into the Event Queue.
handler is the callback. Now event loop keeps an eye on the event queue, if there is something ready, it is pushed to stack to execute the callback. Remember eventually callbacks get executed on stack. Note that some callbacks has priorities on other, the event loop does pick the callbacks based on their priorities.
For those who seek short answer and don't want to go to the deepest levels of Node.js internals.
Node.js is not single threaded, it runs on 5 threads by default.
Yes, the only single thread is for actual JavaScript processing, but it always switches from function to function.
It sends SQL query to a database and lets it wait in other thread, while single threaded Node.js continues to compute some other code ready to be computed.
If you wish more explanations, there are good articles about Event Loop, Worker Pool and the whole libuv documentation.

What happend in libuv when you make parallel requests

As Javascript is a single threaded, how libuv handles when i manage to make two requests parallely? Eg: Making array of promises and resolving latter
I presume what you're really asking is how does libv8 handle two asynchronous requests that are "in flight" at the same time. Since Javascript is single thread, you can't start them at the same moment. One will be started, then your JS will be able to run some more and start the second one. They will both be "in process" at the same time.
First off, the library used in nodejs is generally called libuv, not libv8. Here's the doc for libuv.
The answer for how libuv does this is that it depends upon the type of asynchronous operation. Here's a diagram from the libuv site:
Disk I/O in libuv uses native threads via a thread pool. A native thread runs each disk I/O operation and then it completes, it then puts an event into the nodejs event queue so that when nodejs is available, it can pull that event from the event queue and call the callback registered for the async I/O operation. This functionality originally came from libeio, but is apparently its own implementation now.
Networking operations in libuv use native OS async capabilities such as epoll, kqueue and IOCP.
technically you don't make the request parallel, one does come first. that one starts first, but it listens or checks for one if it is finished, and then the other, and back and forth until one finishes first.

When does node.js event loop terminate?

I am wondering what are the conditions under which node.js terminates the event loop. How does node.js figure out that no further events are going to be triggered ? For e.g. in the case of an http client or a file reading application.
i'd recommend you watch this great videoPhilip Roberts: What the heck is the event loop anyway? | JSConf EU 2014
To explain briefly, Javascript runtime can do one thing at a time, thus it is called "single threaded". However, you have the reason that we can do things concurrently like HTTP requests .. etc. is due what he calls on the video as "Browser APIs" for the browser and for a backend environment like Node.js it is the C++ APIs.
Now, any of these "external" API calls like the HTTP client call, is pushed to a task queue. The event loop has one very simple job. It will always take one job from the task queue given that the main callstack is empty and push that to the execution/running stack. So the event loop will terminate when there are no jobs in the task queue (external API calls)
I recommend watching the full video for better understanding :)

Categories

Resources