Why is node.js asynchronous? - javascript

Nobody has actually asked this (from all the 'suggestions' I'm getting and also from searching before I asked here).
So why is node.js asynchronous?
From what I have deduced after some research:
Languages like PHP and Python are scripting languages (I could be wrong about the actual languages that are scripting languages) whilst JavaScript isn't. (I suppose this derives from the fact that JS doesn't compile?)
Node.js runs on a single thread whilst scripting languages use multiple threads.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Maybe the answer is found somewhere stated above, but I'm still not sure.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
PS. I know some of you will ask "why would you want to make JS synchronous?" in your answers, but the truth is that I don't. I'm just asking these types of questions because I'm sure there are more people out there than just myself that have thought about such questions.

Node.js runs on a single thread whilst scripting languages use multiple threads.
Not technically. Node.js uses several threads, but only one execution thread. The background threads are for dealing with IO to make all of the asynchronous goodness work. Dealing with threads efficiently is a royal pain, so the next best option is to run in an event loop so code can run while background threads are blocked on IO.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Not necessarily. You can preserve state in an asynchronous system pretty easily. For example, in Javascript, you can use bind() to bind a this to a function, thereby preserving state explicitly when the function returns:
function State() {
// make sure that whenever doStuff is called it maintains its state
this.doStuff = this.doStuff.bind(this);
}
State.prototype.doStuff = function () {
};
Asynchronous means not waiting for an operation to finish, but registering a listener instead. This happens all the time in other languages, notably anything that needs to accept input from the user. For example, in a Java GUI, you don't block waiting for the user to press a button, but you register a listener with the GUI.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
Technically, all languages are synchronous, even Javascript. However, Javascript works a lot better in an asynchronous design because it was designed to be single threaded.
Basically there are two types of programs:
CPU bound- the only way to make it go faster is to get more CPU time
IO bound- spends a lot of time waiting for data, so a faster processor won't matter
Video games, number crunchers and compilers are CPU bound, whereas web servers and GUIs are generally IO bound. Javascript is relatively slow (because of how complex it is), so it wouldn't be able to compete in a CPU bound scenario (trust me, I've written my fair share of CPU-bound Javascript).
Instead of coding in terms of classes and objects, Javascript lends itself to coding in terms of simple functions that can be strung together. This works very well in asynchronous design, because algorithms can be written to process data incrementally as it comes in. IO (especially network IO) is very slow, so there's quite a bit of time between packets of data.
Example
Let's suppose you have 1000 live connections, each delivering a packet every millisecond, and processing each packet takes 1 microsecond (very reasonable). Let's also assume each connection sends 5 packets.
In a single-threaded, synchronous application, each connection will be handled in series. The total time taken is (5*1 + 5*.001) * 1000 milliseconds, or ~5005 milliseconds.
In a single-threaded, asynchronous application, each connection will be handled in parallel. Since every packet takes 1 millisecond, and processing each packet takes .001 milliseconds, we can process every connection's packet between packets, so our formula becomes: 1000*.001 + 5*1 milliseconds, or ~6 milliseconds.
The traditional solution to this problem was to create more threads. This solved the IO problem, but then when the number of connections rose, so did the memory usage (threads cost lots of memory) and CPU usage (multiplexing 100 threads onto 1 core is harder than 1 thread on 1 core).
However, there are downsides. If your web application happens to also need to do some heavy number crunching, you're SOL because while you're crunching numbers, connections need to wait. Threading solves this because the OS can swap out your CPU-intensive task when data is ready for a thread waiting on IO. Also, node.js is bound to a single core, so you can't take advantage of your multi-core processor unless you spin up multiple instances and proxy requests.

Javascript does not compile into anything. It's "evaluated" at runtime, just like PHP & Ruby. Therefore it is a scripting language just like PHP/Ruby. (it's official name is actually ECMAScript).
The 'model' that Node adheres to is a bit different than PHP/Ruby. Node.js uses an 'event loop' (the single thread) that has the one goal of taking network requests and handling them very quickly, and if for any reason it encounters an operation that takes a while (API request, database query -- basically anything involving I.O. (input/output)) it passes that off to a background 'worker' thread and goes off to do something else while the worker thread waits for the long task to complete. When that happens the main 'event loop' will take the results and continue deal with them.
PHP/Ruby following a threading model. Essentially, for every incoming network request, the application server spins up an isloated thread or process to handle the request. This does not scale tremendously well and Node's approach is cited as one of its core strengths compared to this model.
Asynchronous means stateless and that the connection is persistent
whilst synchronous is the (almost) opposite.
No. Synchronous instructions are completed in a natural order, from first to last. Asynchronous instructions mean that if a step in the flow of a program takes a relatively long time, the program will continue executing operations and simply return to this operation when complete.
Could JavaScript be made into a synchronous language?
Certain operations in JavaScript are synchronous. Others are asynchronous.
For example:
Blocking operations:
for(var k = 0; k < 1; k = k - 1;){
alert('this will quickly get annoying and the loop will block execution')
alert('this is blocked and will never happen because the above loop is infinite');
Asynchronous:
jQuery.get('/foo', function (result) { alert('This will occur 2nd, asynchronously'); });
alert('This will occur 1st. The above operation was skipped over and execution continued until the above operation completes.');

Could JavaScript be made into a synchronous language?
Javascript is not an "asynchronous language"; rather, node.js has a lot of asynchronous APIs. Asynchronous-ness is a property of the API and not the language. The ease with which functions can be created and passed around in javascript makes it convenient to pass callback functions, which is one way to handle control flow in an asynchronous API, but there's nothing inherently asynchronous about javascript. Javascript can easily support synchronous APIs.
Why is node.js asynchronous?
Node.js favors asynchronous APIs because it is single-threaded. This allows it to efficiently manage its own resources, but requires that long-running operations be non-blocking, and asynchronous APIs are a way to allow for control of flow with lots of non-blocking operations.

Related

Do Timers run on their Own threads in Node.js?

I am a bit confused here I know Javascript is a single-threaded language but while reading about the event loop. I got to know that in case of setTimeout or setInterval javascript calls web API provided by the browser which spawns a new thread to execute timer on that thread. but what happens in the case of node.js environment with timers how do they execute/work?
No threads are used for timers in node.js.
Timers in node.js work in conjunction with the event loop and don't use a thread. Timers in node.js are stored in a sorted linked list with the next timer to fire at the start of the linked list. Each time through the event loop, it checks to see if the first timer in the linked list has reached its time. If so, it fires that timer. If not, it runs any other events that are waiting in the event loop.
On each subsequent cycle through the event loop, it keeps checking to see if its time for the next timer or not. When a new timer is added, it is inserted into the linked list in its proper sorted order. When it fires or is cancelled, it is removed from the linked list.
If the event loop has nothing to do, it may sleep for a brief period of time, but it won't sleep past the timer for the next timer.
Other references on the topic:
How does nodejs manage timers internally
Libuv timer code in nodejs
How many concurrent setTimeouts before performance issues?
Multiple Client Requests in NodeJs
Looking for a solution between setting lots of timers or using a scheduled task queue
Node runs on a single thread but asynchronous work happens elsewhere. For example, libuv provides a pool of 4 threads that it may use, but wont if there's a better option.
The node documentation says
Node.js runs JavaScript code in the Event Loop (initialization and callbacks), and offers a Worker Pool to handle expensive tasks like file I/O. Node.js scales well, sometimes better than more heavyweight approaches like Apache. The secret to the scalability of Node.js is that it uses a small number of threads to handle many clients. If Node.js can make do with fewer threads, then it can spend more of your system's time and memory working on clients rather than on paying space and time overheads for threads (memory, context-switching). But because Node.js has only a few threads, you must structure your application to use them wisely.
A more detailed look at the event loop
No. Timers are just scheduled on the same thread and will call their callbacks when the time expires.
Depending on what OS your are on and what javascript interpreters you use they will use various APIs form poll to epoll to kqueue to overlapped I/O on Windows but in general asynchronous APIs have similar features. So let's ignore platform differences and look at a cross-platform API that exists on all OSes: the POSIX select() system call.
The select function in C looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
Where nfds is total number of file descriptors (including network sockets) you are waiting/listening on, readfds is the list/set of read file descriptors you are waiting on, writefds is the list/set of write file descriptors, exceptfds is the list/set of error file descriptors (think stderr) and timeval is the timeout for the function.
This system call blocks - yes, in non-blocking, asynchronous code there is a piece of blocking system call. The main difference between non-blocking code and blocking threaded code is that the entire program blocks in only one place, the select() function (or whatever equivalent you use).
This function only returns if any of the file descriptors have activity on them or if the timeout expires.
By managing the timeout and calculating the next value of timeval you can implement a function like setTimeout
I've written much deeper explanations of how this works in answers to the following related questions:
I know that callback function runs asynchronously, but why?
Event Queuing in NodeJS
how node.js server is better than thread based server
Node js architecture and performance
Performance of NodeJS with large amount of callbacks
Does javascript process using an elastic racetrack algorithm
Is there any other way to implement a "listening" function without an infinite while loop?
I recommend you at least browse each of the answers I wrote above because they are almost all non-duplicates. They sometimes overlap but explain different aspects of asynchronous code execution.
The gist of it is that javascript does not execute code in parallel to implement timers. It doesn't need to. Instead it waits in parallel. Once you understand the difference between running code in parallel and waiting (doing nothing) in parallel you will understand how things like node.js achieve high performance and how events work better.

Node JS multiple concurrent requests to backend API

This is my node js backend API method.
apiRouter.route('/makeComment')
.post((req, res) => {
consoleLogger.info("Inside makeComment API..");
logger.info("Inside makeComment");
let timestamp = new Date().getTime();
let latestCommentID = timestamp.toString();
console.log("comment ID generated is "+latestCommentID);
res.json({
'makeComment': 'success',
'commentid':latestCommentID
});
});
Now, if multiple concurrent requests come to this API, what will happen?
As per my understanding, NodeJS will maintain a Event Queue for the requests and process requests one by one.
So, it is impossible to get the same timestamp for two different requests.
Please let me know if my understanding is correct?
Edit 1:
After googling for some time, got this link, which clearly explains some of the concepts.
You can absolutely end up with the same timestamp on 2 concurrent calls. But not because NodeJS or your server served the request in parallel, but rather because a millisecond is a long enough time that your server could process many requests in the same millisecond.
Multi-threaded vs. Concurrent
You have correctly identified a subtle but important distinction between concurrency and multi-threading. In a multi-threaded environment, two different operations can truly take place at the same time on 2 different cores of a CPU. In a concurrent but single-threaded environment, things do happen sequentially, hopefully in a non-blocking manner, and really fast!
Javascript is a single threaded environment. And yes, it has an event loop where it queues tasks and processes them one at a time. NodeJS (and Web APIs in case of browser Javascript apps) offer certain async functions, such as fs.writeFile or fetch, which the runtime in collaboration with the OS may perform in the same or different thread/core, and then orchestrate returning results back to your application via callbacks and Promises.
In case of your example, your handler (the part starting from (req, res) => { ... }) consists of sync calls only. So, yes, various calls to this handler will run sequentially, one after another. And, while no two runs of this handler will ever truly happen at the same time, but they will happen really fast and you will likely have cases where you'll get the same millisecond value from the Date object. If you had a higher resolution timestamp available (maybe nano seconds) or if your handler took longer to run (for example, you ran a for loop a billion iterations), then you'll be able to observe the sequential behavior more clearly.
Avoid blocking (sync) calls in single-threaded applications
This is the exact reason why you are advised against performing any synchronous IO operation (e.g. fs.writeFileSync, etc.) in a NodeJS web server, because while one request is performing a blocking IO operation, all other requests are waiting, queued on the event loop.
Really good video about Javascript event loop; this should illuminate some topics: https://www.youtube.com/watch?v=8aGhZQkoFbQ
There is no possibility of concurrency here. As you have correctly studied, Node.js runs in a single-threaded environment (unless you use clusters, the worker API, or other related Node.js modules that facilitate IPC). As such, the line
let timestamp = new Date().getTime();
cannot be concurrently invoked by two simultaneously running threads, excluding the exceptions noted above.
However, Date.prototype.getTime() only has millisecond resolution, so it's remotely possible that the callback to apiRouter.route('/makeComment').post() might be sequentially invoked by the event loop from two pending asynchronous requests within the same millisecond.

What do we call JS async "strands of execution"?

In Java we have "threads", in CPython we have threads (non-concurrent) and "processes".
In JS, when I kick off an async function or method, how do I officially refer to these "strands of executing code"?
I have heard that each such code block executes from start to finish*, meaning that there is never any concurrent** processing in JS. I'm not quite sure whether this is the same situation as with CPython threads. Personally I hesitate to use "thread" for what we have in JS as these "strands" are so different from Java concurrent threads.
* Just to clarify in light of Stephen Cleary's helpful response: I mean "each such synchronous code block". Obviously if an await is encountered control is released ...
** And obviously never any "true parallel" processing. I'm following the widely accepted distinction between "concurrent" (only one thread at any one time, but one "strand of execution" may give way to another) and "parallel" (multiple processes, implementing true parallel processing, often using multiple CPUs or cores or processes). My understanding is that these "strands" in JS are not even concurrent: once one AJAX method or Promise or async method/function starts executing nothing can happen until it's finished (or an await happens)...
In JS, when I kick off an async function or method, how do I officially refer to these "strands of executing code"?
For lack of a better term, I've referred to these as "asynchronous operations".
I have heard that each such code block executes from start to finish
This was true... until await. Now there's a couple of ways to think about it, both of which are correct:
Code blocks execute exclusively unless there's an await.
await splits its function into multiple code blocks that do execute exclusively (each await is a "split point").
meaning that there is never any concurrent processing in JS.
I'd disagree with this statement. JavaScript forces asynchrony (before async/await, it used callbacks or promises), so it does have asynchronous concurrency, though not parallel concurrency.
A good mental model is that JavaScript is inherently single-threaded, and that one thread has a queue of work to do. await does not block that thread; it returns, and when its thenable/Promise completes, it schedules the remainder of that method (the "continuation") to the queue.
So you can have multiple asynchronous methods executing concurrently. This is exactly how Node.js handles multiple simultaneous requests.
My understanding is that these "strands" in JS are not even concurrent: once one AJAX method or Promise or async method/function starts executing nothing can happen until it's finished...
No, this is incorrect. await will return control to the main thread, freeing it up to do other work.
the way things operate in JS means you "never have to worry about concurrency issues, etc."
Well... yes and no. Since JavaScript is inherently single-threaded, there's never any contention over data shared between multiple threads (obviously). I'd say that's what the original writer was thinking about. So I'd say easily north of 90% of thread-synchronization problems do just go away.
However, as you noted, you still need to be careful modifying shared state from multiple asynchronous operations. If they're running concurrently, then they can complete in any order, and your logic has to properly handle that.
Ideally, the best solution is to move to a more functional mindset - that is, get rid of the shared state as much as possible. Asynchronous operations should return their results instead of updating shared state. Then these concurrent asynchronous operations can be composed into a higher-level asynchronous operation using async, await, and Promise.all. If possible, this more functional approach of returning-instead-of-state and function composition will make the code easier to deal with.
But there's still some situations where this isn't easily achievable. It's possible to develop asynchronous equivalents of classical synchronous coordination primitives. I worked up a proof-of-concept AsyncLock, but can't seem to find the code anywhere. Well, it's possible to do this, anyway.
To date 24 people, no more no fewer, have chosen to look at this.
FWIW, and should anyone eccentric take an interest in this question one day, I propose "hogging strands" or just "strands" ... they hog the browser's JS engine, it would appear, and just don't let go, unless they encounter an await.
I've been developing this app where 3 "strands" operate more or less simultaneously, all involving AJAX calls and a database query... but in fact it is highly preferable if the 2nd of the 3 executes and returns before the 3rd.
But because the 3rd query is quite a bit simpler for MySQL than the 2nd the former tends to return from its AJAX "excursion" quite a bit earlier if everything is allowed to operate with unconstrained asynchronicity. This then causes a big delay before the 2nd strand is allowed to do its stuff. To force the code to perform and complete the 2nd "strand" therefore requires quite a bit of coercion, using async and await.
I read one comment on JS asynchronicity somewhere where someone opined that the way things operate in JS means you "never have to worry about concurrency issues, etc.". I don't agree: if you don't in fact know when awaits will unblock this does necessarily mean that concurrency is involved. Perhaps "hogging strands" are easier to deal with than true "concurrency", let alone true "parallelism", however... ?

How does JavaScript's Single Threaded Model handle time consuming tasks?

This question is regarding the sinlge threaded model of JavaScript. I understand that javascript is non-block in nature cause of its ability to add a callbacks to the async event queue. But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? How does nodejs handle such a problem? And is this an unavoidable problem for developers on the front end? I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?
But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded?
Yes.
How does nodejs handle such a problem?
Node.js handles nothing. How you handle concurrency is up to you and your application. Now, Node.js does have a few tools available to you. The first thing you have to understand is that Node.js is basically V8 (JavaScript engine) with a lightweight library split between JavaScript and native code bolted on. While your JavaScript code is single-threaded by nature, the native code can and does create threads to handle your work.
For example, when you ask Node.js to load a file from disk, your request is passed off to native code where a thread pool is used, and your data is loaded from disk. Once your request is made, your JavaScript code continues on. This is the meaning of "non-blocking" in the context of Node.js. Once that file on disk is loaded, the native code passes it off to the Node.js JavaScript library, which then executes your callback with the appropriate parameters. Your code continued to run while the background work was going on, but when your callback is dealing with that data, other JavaScript code is indeed blocked from running.
This architecture allows you to get much of the benefit of multithreaded code without having to actually write any multithreaded code, keeping your application straightforward.
I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?
My philosophy is always to use what you need. It's true that if a request comes in to your application and you have a lot of JavaScript processing of data that is blocking, other requests will not be processed during this time. Remember though that if you are doing this sort of work, you are likely CPU bound anyway and doing double the work will cause both requests to take longer.
In practice, the majority of web applications are IO bound. They shuffle data from a database, reformat it, and send it out over the network. The part where they handle data is actually not all that time consuming when compared to the amount of time the application is simply waiting to hear back from the upstream data source. It is in these applications where Node.js really shines.
Finally, remember that you can always spawn child processes to better distribute the load. If your application is that rare application where you do 99% of your work load in CPU-bound JavaScript and you have a box with many CPUs and/or cores, split the load across several processes.
Your question is a very large one, so I am just going to focus on one part.
if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? (...) Is it really because long tasks in javascript will actually block other tasks?
Non-blocking is a beautiful thing XD
The best practices include:
Braking every function down into its minimum functional form.
Keep CallBacks asynchronies, THIS is an excellent post on the use of CallBacks
Avoid stacking operations, (Like nested Loops)
Use setTimeout() to brake up potentially blocking code
And many other things, Node.JS is the gold standard of none blocking so its worth a look.
--
--
setTimeout() is one of the most important functions in no-blocking code
So lets say you make a clock function that looks like this:
function setTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
}
while(true){setTime();}
Its quite problematic, because this code will happily loop its self until the end of time. No other function will ever be called. You want to brake up the operation so other things can run.
function startTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
setTimeout(startTime(),1000);
}
'setTimeout();' brakes up the loop and executes it every 1-ish seconds. An infinite loop is a bit of an extreme example. The point is 'setTimeout();' is great at braking up large operation chains into smaller ones, making everything more manageable.

Will Node.js get blocked when processing large file uploads?

Will Node.js get blocked when processing large file uploads?
Since Node.js only has one thread, is that true that when doing large file uploads all other requests will get blocked?
If so, how should I handle file uploads in nodejs?
All the I/O operations is handled by Node.js is using multiple threads internally; it's the programming interface to that I/O functionality that's single threaded, event-based, and asynchronous.
So the big upload of your example is performed by a separate thread that's managed by Node.js, and when that thread completes its work, your callback is put onto the event loop queue.
When you do CPU intensive task it blocks. Let's say we have a task compute() which needs to run almost continuously, and does some CPU intensive calculations.
Answer to the main question "How should I handle file uploads in nodejs?" Check in your code (or the library) where you save file on the server, is it dependent on writefile() or writeFileSync()?If it is using writefile() then its asynchronous; But if it is writeFileSync() its is synchronous version.
Updates: In response to a comment:
"the answer "No, it won't block" is correct but explanation is
completely wrong. JS is in one thread AND I/O is in one (same)
thread. Event loop / asynchronous processing / callbacks make this possible. No multiple threads required. " - by andrey-sidorov
There is no async API for file operations so Node.js uses a thread pool for that. You can see it in the code of libuv. You can look at the source for fs.readFile in lib/fs.js, you’ll see binding.read. Whenever you see binding in Node’s core modules you’re looking at a portal into the land of C++. This binding is made available using NODE_SET_METHOD(target, "read", Read). If you know any C, you might think this is a macro – it was originally, but it’s now a function.
Going back to ASYNC_CALL in Read, one of the arguments is read: the syscall read. But wait, doesn't this function block?
Yes, but that’s not the end of the story. An Introduction to libuv denotes the following:
"The libuv filesystem operations are different from socket operations. Socket operations use the non-blocking operations provided by the operating system. Filesystem operations use blocking functions internally, but invoke these functions in a thread pool and notify watchers registered with the event loop when application interaction is required."
Summary:
Node API method writeFile() is asynchronous, but that doesn’t necessarily mean it’s non-blocking underneath. As the libuv book points out, socket (network) code is non-blocking, but filesystems are more complicated. Some things are event-based (kqueue), others use thread pool (as in this case).
Consider going through C code on which Node.js is developed, for more information:
Unix fs.c
Windows fs.c
That depends on the functions used for accomplishing that task. If you are using asynchronous functions, then Node.js will not block. But there are also synchronous functions, for example fs.readFileSync (FileSystem Doc), that will block execution.
Just take care and choose asynchronous functions. This way Node.js will keep running while slow tasks/waits are completed by external libraries. Once those tasks are completed, the Event Loop will take care of the result and execute your callbacks.
You can read more about the Event Loop here: Understanding the Node.js event loop
That is the exact reason node.js being asynchronous.
Most (all?) functions in node.js involving Input/Output operations (where bottleneck is some other device than CPU or RAM) the operation happens on a sepparate thread, letting your node.js server do some other code while waiting.

Categories

Resources