Nodejs Event Loop - interaction with top-level code - javascript

hoping for a little confirmation on understanding of node.js execution model. I understand that when node.js process starts, this is the sequence of executions:
(from Jonas Schmedtmann's Udemy node.js course)
With the main takeaway being that top-level code is always executed first before any callbacks.
Then, in the event-loop, this is the sequence of the 'phases':
After some digging, I also confirmed why a setTimeout and a setImmediate called in the main module has 'arbitrary' execution order, but when called from the I/O phase, the setImmediate will always execute first, based on this post: https://github.com/nodejs/help/issues/392#issuecomment-274032320.
(Reason: assuming the timer threshold has already passed, since we are currently in the I/O phase, and the next phase after that is the check-handles phase where setImmediate callbacks are executed, immediate always executes before timer.)
Now, when timer and immediate callbacks are called from a phase such that the next phase is the due-timers phase (such as from main module), if the top-level code took long enough that the timer is due, the timer callback will always execute first, correct? I've tested this with the following code, and it seems to be true (everytime I've run it, timer executes first, even though it has a full second delay compared to the immediate callback)
setTimeout(() => {
console.log('timer completed');
}, 1000);
setImmediate(() => {
console.log('immediate completed');
});
for (let i = 0; i < 5000; i++) {
console.log(`top-level code: ${i}`);
}
So here is my question: shouldn't an I/O operation callback also be executed before the immediate's callback due to the event-loop, assuming that the top-level code takes long enough that the I/O operation completes by the time we start the event-loop?
However, this code below suggests otherwise, as the execution order is always: top-levels->timer->immediate->io
Even though based on the model above I should be expecting: top-levels->timer->io->immediate (?)
setTimeout(() => {
console.log('timer completed');
}, 1000);
fs.readFile('test-file.txt', 'utf-8', () => {
console.log('io completed');
});
setImmediate(() => {
console.log('immediate completed');
});
for (let i = 0; i < 5000; i++) {
console.log(`top-level code: ${i}`);
}
Thank you!

I might be a little late to answering this question and it is possible you've already figured this one out #M.Lee. But here goes the answer to your question:
During the top-level code execution the code you're running in your example is not running in the event loop. Like the first image from your question shows, event loop will start ticking after the top level code is already executed. So, when it comes to the top-level code, Node does not follow the same order that it follows during an event loop tick.
In this particular case, the I/O callback is getting executed the last is plainly because the contents of this particular file (BTW, I had to go ahead and do the research by looking at Jonas' Node course and understand what this file contained. It just contains the line "Node.js is the best!" 1 million times).
Also a side note here is that you're using the asynchronous readFile function instead of the readFileSync.

Related

nodejs: setTimeout and IO(POLL) is showing inconsistency

I have this piece of code:
// * Run this snippet of code multiple times
const fs = require('fs');
setTimeout(() => {
console.log('timer');
});
fs.readFile('', 'utf-8',(err, data) => {
console.log('io');
});
setImmediate(() => {
console.log('check');
});
On running the above mentioned code for multiple times. I'm getting different outputs.
Result 1
Somethimes I'm getting
timer
io
check
Result 2
and other times. I'm getting
io
check
timer
Can anyone please clarify what is going on here? I was expecting Result 1.
That has to do with how the event loop picks up things to do when more than one thing is available. Here you can find a good explanation.
If you really want to set that order, try async/await or calling fs.readFile() inside the setTimeout() callback.
Based on node.js event loop documentation, a setTimeout is called which sets a minimum amount of time to wait until the function is executed. Next you start an asynchronous file read. Then the polling phase of the event loop will begin. The polling phase has a queue of callback functions to complete. If the file read is not complete, it's callback function (the console.log) will not be put in the polling queue. The polling phase will instead wait for the timer to meet the minimum threshold time and then loop back to the timer callback execution phase. Since your setTimeout is set to zero ms, the polling phase will usually exit and complete the timer callback (console.log) first and then go back to polling for the file read to finish. This is why sometimes your setTimeout completes first if your underlying operating system delays the file read, or the file read completes first if the operating system delays the setTimeout. setImmediate will always happen immediately after the polling (io) phase.

Difference between returning new Promise and Promise.resolve

For the below code snippet, i would like to understand how NodeJS runtime handles things :
const billion = 1000000000;
function longRunningTask(){
let i = 0;
while (i <= billion) i++;
console.log(`Billion loops done.`);
}
function longRunningTaskProm(){
return new Promise((resolve, reject) => {
let i = 0;
while (i <= billion) i++;
resolve(`Billion loops done : with promise.`);
});
}
function longRunningTaskPromResolve(){
return Promise.resolve().then(v => {
let i = 0;
while (i <= billion) i++;
return `Billion loops done : with promise.resolve`;
})
}
console.log(`*** STARTING ***`);
console.log(`1> Long Running Task`);
longRunningTask();
console.log(`2> Long Running Task that returns promise`);
longRunningTaskProm().then(console.log);
console.log(`3> Long Running Task that returns promise.resolve`);
longRunningTaskPromResolve().then(console.log);
console.log(`*** COMPLETED ***`);
1st approach :
longRunningTask() function will block the main thread, as expected.
2nd approach :
In longRunningTaskProm() wrapping the same code in a Promise, was expecting execution will move away from main thread and run as a micro-task. Doesn't seem so, would like to understand what's happening behind the scenes.
3rd approach :
Third approach longRunningTaskPromResolve() works.
Here's my understanding :
Creation and execution of a Promise is still hooked to the main thread. Only Promise resolved execution is moved as a micro-task.
Am kinda not convinced with whatever resources i found & with my understanding.
All three of these options run the code in the main thread and block the event loop. There is a slight difference in timing for WHEN they start running the while loop code and when they block the event loop which will lead to a difference in when they run versus some of your console messages.
The first and second options block the event loop immediately.
The third option blocks the event loop starting on the next tick - that's when Promise.resolve().then() calls the callback you pass to .then() (on the next tick).
The first option is just pure synchronous code. No surprise that it immediately blocks the event loop until the while loop is done.
In the second option the new Promise executor callback function is also called synchronously so again it blocks the event loop immediately until the while loop is done.
In the third option, it calls:
Promise.resolve().then(yourCallback);
The Promise.resolve() creates an already resolved promise and then calls .then(yourCallback) on that new promise. This schedules yourCallback to run on the next tick of the event loop. Per the promise specification, .then() handlers are always run on a future tick of the event loop, even if the promise is already resolved.
Meanwhile, any other Javascript right after this continues to run and only when that Javascript is done does the interpreter get to the next tick of the event loop and run yourCallback. But, when it does run that callback, it's run in the main thread and therefore blocks until it's done.
Creation and execution of a Promise is still hooked to the main thread. Only Promise resolved execution is moved as a micro-task.
All your code in your example is run in the main thread. A .then() handler is scheduled to run in a future tick of the event loop (still in the main thread). This scheduling uses a micro task queue which allows it to get in front of some other things in the event queue, but it still runs in the main thread and it still runs on a future tick of the event loop.
Also, the phrase "execution of a promise" is a bit of a misnomer. Promises are a notification system and you schedule to run callbacks with them at some point in the future using .then() or .catch() or .finally() on a promise. So, in general, you don't want to think of "executing a promise". Your code executes causing a promise to get created and then you register callbacks on that promise to run in the future based on what happens with that promise. Promises are a specialized event notification system.
Promises help notify you when things complete or help you schedule when things run. They don't move tasks to another thread.
As an illustration, you can insert a setTimeout(fn, 1) right after the third option and see that the timeout is blocked from running until the third option finishes. Here's an example of that. And, I've made the blocking loops all be 1000ms long so you can more easily see. Run this in the browser here or copy into a node.js file and run it there to see how the setTimeout() is blocked from executing on time by the execution time of longRunningTaskPromResolve(). So, longRunningTaskPromResolve() is still blocking. Putting it inside a .then() handler changes when it gets to run, but it is still blocking.
const loopTime = 1000;
let startTime;
function log(...args) {
if (!startTime) {
startTime = Date.now();
}
let delta = (Date.now() - startTime) / 1000;
args.unshift(delta.toFixed(3) + ":");
console.log(...args);
}
function longRunningTask(){
log('longRunningTask() starting');
let start = Date.now();
while (Date.now() - start < loopTime) {}
log('** longRunningTask() done **');
}
function longRunningTaskProm(){
log('longRunningTaskProm() starting');
return new Promise((resolve, reject) => {
let start = Date.now();
while (Date.now() - start < loopTime) {}
log('About to call resolve() in longRunningTaskProm()');
resolve('** longRunningTaskProm().then(handler) called **');
});
}
function longRunningTaskPromResolve(){
log('longRunningTaskPromResolve() starting');
return Promise.resolve().then(v => {
log('Start running .then() handler in longRunningTaskPromResolve()');
let start = Date.now();
while (Date.now() - start < loopTime) {}
log('About to return from .then() in longRunningTaskPromResolve()');
return '** longRunningTaskPromResolve().then(handler) called **';
})
}
log('*** STARTING ***');
longRunningTask();
longRunningTaskProm().then(log);
longRunningTaskPromResolve().then(log);
log('Scheduling 1ms setTimeout')
setTimeout(() => {
log('1ms setTimeout Got to Run');
}, 1);
log('*** First sequence of code completed, returning to event loop ***');
If you run this snippet and look at exactly when each message is output and the timing associated with each message, you can see the exact sequence of when things get to run.
Here's the output when I run it in node.js (line numbers added to help with the explanation below):
1 0.000: *** STARTING ***
2 0.005: longRunningTask() starting
3 1.006: ** longRunningTask() done **
4 1.006: longRunningTaskProm() starting
5 2.007: About to call resolve() in longRunningTaskProm()
6 2.007: longRunningTaskPromResolve() starting
7 2.008: Scheduling 1ms setTimeout
8 2.009: *** First sequence of code completed, returning to event loop ***
9 2.010: ** longRunningTaskProm().then(handler) called **
10 2.010: Start running .then() handler in longRunningTaskPromResolve()
11 3.010: About to return from .then() in longRunningTaskPromResolve()
12 3.010: ** longRunningTaskPromResolve().then(handler) called **
13 3.012: 1ms setTimeout Got to Run
Here's a step-by-step annotation:
Things start.
longRunningTask() initiated.
longRunningTask() completes. It is entirely synchronous.
longRunningTaskProm() initiated.
longRunningTaskProm() calls resolve(). You can see from this that the promise executor function (the callback passed to new Promise(fn)` is entirely synchronous too.
longRunningTaskPromResolve() initiated. You can see that the handler from longRunningTaskProm().then(handler) has not yet been called. That has been scheduled to run on the next tick of the event loop, but since we haven't gotten back to the event loop yet, it hasn't yet been called.
We're now setting the 1ms timer. Note that this timer is being set only 1ms after we started longRunningTaskPromResolve(). That's because longRunningTaskPromResolve() didn't do much yet. It ran Promise.resolve().then(handler), but all that did was schedule the handler to run on a future tick of the event loop. So, that only took 1ms to schedule that. The long running part of that function hasn't started running yet.
We get to the end of this sequence of code and return back to the event loop.
The next thing scheduled to run in the event loop is the handler from longRunningTaskProm().then(handler) so that gets called. You can see that it was already waiting to run since it ran only 1ms after we returned to the event loop. That handler runs and we return back to the event loop.
The next thing scheduled to run in the event loop is the handler from Promise.resolve().then(handler) so we now see that that starts to run and since it was already queued, it runs immediately after the previous event finished.
It takes exactly 1000ms for the loop in longRunningTaskPromResolve() to run and then it returns from it's .then() handler which schedules then next .then() handler in that promise chain to run on the next tick of the eventl loop.
That .then() gets to run.
Then, finally when there are no .then() handlers scheduled to run, the setTimeout() callback gets to run. It was set to run in 1ms, but it got delayed by all the promise action running at a higher priority ahead of it so instead of running 1ms, it ran in 1004ms.

Does the nodejs (libuv) event loop execute all the callbacks in one phase (queue) before moving to the next or run in a round robin fashion?

I am studying about the event loop provided by libuv in Node. I came across the following blog by Deepal Jayasekara and also saw the explanations of Bert Belder and Daniel Khan on youtube.
There is one point that I am not clear with- As per my understanding, the event loop processes all the items of one phase before moving on to another. So if that is the case, I should be able to block the event loop if the setTimeout phase is constantly getting callbacks added to it.
However, when I tried to replicate that- it doesn't happen. The following is the code:
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.write('Hello World!');
console.log("Response sent");
res.end();
}).listen(8081);
setInterval(() => {
console.log("Entering for loop");
// Long running loop that allows more callbacks to get added to the setTimeout phase before this callback's processing completes
for (let i = 0; i < 7777777777; i++);
console.log("Exiting for loop");
}, 0);
The event loop seems to run in a round robin fashion. It first executes the callbacks that were added before i send a request to the server, then processes the request and then continues with the callbacks. It feels like a single queue is running.
From the little that I understood, there isn't a single queue and all the expired timer callbacks should get executed first before moving to the next phase. Hence the above snippet should not be able to return the Hello World response.
What could be the possible explanation for this?
Thanks.
If you look in libuv itself, you find that the operative part of running timers in the event loop is the function uv_run_timers().
void uv__run_timers(uv_loop_t* loop) {
struct heap_node* heap_node;
uv_timer_t* handle;
for (;;) {
heap_node = heap_min(timer_heap(loop));
if (heap_node == NULL)
break;
handle = container_of(heap_node, uv_timer_t, heap_node);
if (handle->timeout > loop->time)
break;
uv_timer_stop(handle);
uv_timer_again(handle);
handle->timer_cb(handle);
}
}
The way it works is the event loop sets a time mark at the current time and then it processes all timers that are due by that time one after another without updating the loop time. So, this will fire all the timers that are already past their time, but won't fire any new timers that come due while it's processing the ones that were already due.
This leads to a bit fairer scheduling as it runs all timers that are due, then goes and runs the rest of the types of events in the event loop, then comes back to do any more timers that are due again. This will NOT process any timers that are not due at the start of this event loop cycle, but come due while it's processing other timers. Thus, you see the behavior you asked about.
The above function is called from the main part of the event loop with this code:
int uv_run(uv_loop_t *loop, uv_run_mode mode) {
DWORD timeout;
int r;
int ran_pending;
r = uv__loop_alive(loop);
if (!r)
uv_update_time(loop);
while (r != 0 && loop->stop_flag == 0) {
uv_update_time(loop); <== establish loop time
uv__run_timers(loop); <== process only timers due by that loop time
ran_pending = uv_process_reqs(loop);
uv_idle_invoke(loop);
uv_prepare_invoke(loop);
.... more code here
}
Note the call to uv_update_time(loop) right before calling uv__run_timers(). That sets the timer that uv__run_timers() references. Here's the code for uv_update_time():
void uv_update_time(uv_loop_t* loop) {
uint64_t new_time = uv__hrtime(1000);
assert(new_time >= loop->time);
loop->time = new_time;
}
from the docs,
when the event loop enters a
given phase, it will perform any operations specific to that phase,
then execute callbacks in that phase's queue until the queue has been
exhausted or the maximum number of callbacks has executed. When the
queue has been exhausted or the callback limit is reached, the event
loop will move to the next phase, and so on.
Also from the docs,
When delay is larger than 2147483647 or less than 1, the delay will be set to 1
Now, when when you run your snippet following things happen,
script execution begins and callbacks are registered to specific phases. Also, as the docs suggests the setInterval delay is implicitly converted to 1 sec.
After 1 sec, your setInterval callback will be executed, it will block eventloop until all iterations and completed. Meanwhile, eventloop will not be notified of any incoming request atleast until loop terminates.
Once, all iterations are completed, and there is a timeout of 1 sec, the poll phase will execute your HTTP request callback, if any.
back to step 2.

Javascript Asynchronous with setTimeout(..., 0)

I wanted to have a better understanding of how the event loop and asynchronous code works in Javascript. There is a ton of resources online but I could not find an answer to my question
Everyday I mostly use callbacks, promises, async/awaits, but at the end I am simply relying on already asynchronous methods.
Therefore I wanted to know how it works, creating an async function from scratch, and deal with blocking code (or should I say slow code, that is not an HttpRequest or anything that is already provided to us).
For example taking while loop with a very high condition to stop it, should take a second to finish. And this is what I decided to implement for my tests.
After my research, I could read that one way of making my code asynchronous, would be to use setTimeout with a 0ms delay (placing a message in the event queue, that will be executed after the next tick)
function longOperation(cb) {
setTimeout(function() {
var i = 0;
while (i != 1000000000) { i++; }
cb();
}, 0);
}
longOperation(() => {
console.log('callback finished');
})
console.log('start');
My question is:
When my code is finally going to be executed, why isn't it blocking anymore ? What is the difference between executing it normally, and placing a message that the event loop will pick to push it to the call stack ?
The following video shows how the event loop handles a setTimeout with 0 delay.
JavaScript Event loop with setTimeout 0
However, the code executed is a simple console log. In my example, this is a "long" operation...
The outer code executes to completion.
The setTimeout 0 timer expires immediately, so its callback runs right away and executes to completion (the long-running while loop and its callback).
During both of these code execution phases, no other JavaScript user code will run.

Can you use setInterval to call functions asynchronously?

The following code is taken from Project Silk (a Microsoft sample application)
The publish method below loops thru an an array of event callBacks and executes each one. Instead of using a for loop setInterval is used instead.
The documentation says this allows each subscriber callback to be invoked before the previous callback finishes. Is this correct? I thought the browser would not allow the execution of the function inside the interval to run until all prior executions of it had finished.
Is this really any different than doing a for loop?
that.publish = function (eventName, data)
{
var context, intervalId, idx = 0;
if (queue[eventName])
{
intervalId = setInterval(function ()
{
if (queue[eventName][idx])
{
context = queue[eventName][idx].context || this;
queue[eventName][idx].callback.call(context, data);
idx += 1;
}
else { clearInterval(intervalId); }
}, 0);
}
Using setInterval here does make the execution sort of "asynchronous", because it schedules the execution of the callback for the next time the main execution thread is available.
This means that the callback executions should not block the browser because any other synchronous processing will take place before the callbacks (because the callbacks are scheduled to run only when the main execution thread has a spare millisecond) - and that's what makes this construct "better" than a regular for loop - the fact that the callbacks won't block the browser and cause the dreaded "This page has a script that's taking too long" error.
The side effect of this scheduling pattern is that the timeout is only a "suggestion" - which is why they use 0 here.
See: http://ejohn.org/blog/how-javascript-timers-work/
setInterval(..., 0) can be used to yield control to the browser UI, to prevent it from freezing if your code takes a long time to run.
In this case, that.publish will exit almost immediately before any callback has been executed. Each callback will then run "in the background" - they will be placed in the event loop, and each one will yield to the browser to do it's thing before the next callback is executed.
This seems like a good idea in event processing, because you don't want event processing to freeze the browser even if there are a lot of them or some take a long time to finish.
About the documentation - as stated, it is incorrect. Javascript is single-threaded. But if you were to call publish() a few times in a row, it is true that those calls would all finish before any callbacks are executed, because of the setTimeout. Maybe this is what the documentation means?

Categories

Resources