I read that the javascript language has characteristics that assist in the implementation of non-blocking IO which contributes to the success of projects like node.js. My question is what are these characteristics and why is non-blocking IO trickier to implement in other languages?
JavaScript itself does not provide non-blocking IO. The underlying system calls that node.js uses do the non-blocking IO. JavaScript's first-class functions mean that it is easy to pass callbacks around when IO has completed.
Other languages can do non-blocking IO just fine. node.js just argues that callbacks make it super-easy to reason about and handle non-blocking operations.
Ruby has EventMachine, which passes blocks around instead of functions. C can do non-blocking IO with function pointers, but then you don't get closures, so it is a bit more of a pain.
The reason that javascript is sometimes labeled as a non-blocking IO is because of the concept of anonymously defined, (event based), functions. Node.js specifically labels this as their reasoning why javascript is a good server side language. This however, is only a half truth, as it is not technically non-blocking, but it will continue to execute code while waiting for a callback from an anonymous callback/ajax function. I'm not sure if this is what you read, but an explanation offered in one Node tutorial is:
"The other method, the one taken by Node and some extremely fast modern servers such as Nginx and Thin, is to use a single non-blocking thread with an event loop. This is where the decision to use JavaScript really shines, since JavaScript was designed to be used in a single threaded event loop-based environment: the browser. JavaScript’s ability to pass around closures makes event-based programming dead simple. You basically just call a function to perform some type of I/O and pass it a callback function and JavaScript automatically creates a closure, making sure that the correct state is preserved even after the calling function has long since gone out of scope."
source: http://net.tutsplus.com/tutorials/javascript-ajax/this-time-youll-learn-node-js/
In reference to your multithreading tag, Node.js and Javascript are NOT multithreaded, they use a system of closures to preserve state while waiting for a callback. Therefore, they are NOT completely non-blocking. There are plenty of scenarios where blocking can occur, but for most small implementations, a developer will never encounter a blocking situation.
see here for possible info on why node.js is bad: http://teddziuba.com/2011/10/node-js-is-cancer.html (Link broken)
and here for a rebuttle: http://rhyolight.posterous.com/nodejs-is-not-cancer (Link broken)
Asynchronous functions are usually event-based in JavaScript, which means registering callback-handlers. Your code runs on after the registration, but does not wait for the event - everything to be done after a event must be invoked from the handler. I hope that says all.
Of course there are exceptions, like window.alert / confirm / prompt in browsers.
https://youtu.be/dFnkZ15-_0o?t=2125 This excerpt from Andrew Mead's node.js course does a great job of visually explaining the differences between non-blocking and blocking I/O operations in JS. The clip is from 35:25 - 47:16.
Related
So I have an understanding of how Node.js works: it has a single listener thread that receives an event and then delegates it to a worker pool. The worker thread notifies the listener once it completes the work, and the listener then returns the response to the caller.
My question is this: if I stand up an HTTP server in Node.js and call sleep on one of my routed path events (such as "/test/sleep"), the whole system comes to a halt. Even the single listener thread. But my understanding was that this code is happening on the worker pool.
Now, by contrast, when I use Mongoose to talk to MongoDB, DB reads are an expensive I/O operation. Node seems to be able to delegate the work to a thread and receive the callback when it completes; the time taken to load from the DB does not seem to block the system.
How does Node.js decide to use a thread pool thread vs the listener thread? Why can't I write event code that sleeps and only blocks a thread pool thread?
Your understanding of how node works isn't correct... but it's a common misconception, because the reality of the situation is actually fairly complex, and typically boiled down to pithy little phrases like "node is single threaded" that over-simplify things.
For the moment, we'll ignore explicit multi-processing/multi-threading through cluster and webworker-threads, and just talk about typical non-threaded node.
Node runs in a single event loop. It's single threaded, and you only ever get that one thread. All of the javascript you write executes in this loop, and if a blocking operation happens in that code, then it will block the entire loop and nothing else will happen until it finishes. This is the typically single threaded nature of node that you hear so much about. But, it's not the whole picture.
Certain functions and modules, usually written in C/C++, support asynchronous I/O. When you call these functions and methods, they internally manage passing the call on to a worker thread. For instance, when you use the fs module to request a file, the fs module passes that call on to a worker thread, and that worker waits for its response, which it then presents back to the event loop that has been churning on without it in the meantime. All of this is abstracted away from you, the node developer, and some of it is abstracted away from the module developers through the use of libuv.
As pointed out by Denis Dollfus in the comments (from this answer to a similar question), the strategy used by libuv to achieve asynchronous I/O is not always a thread pool, specifically in the case of the http module a different strategy appears to be used at this time. For our purposes here it's mainly important to note how the asynchronous context is achieved (by using libuv) and that the thread pool maintained by libuv is one of multiple strategies offered by that library to achieve asynchronicity.
On a mostly related tangent, there is a much deeper analysis of how node achieves asynchronicity, and some related potential problems and how to deal with them, in this excellent article. Most of it expands on what I've written above, but additionally it points out:
Any external module that you include in your project that makes use of native C++ and libuv is likely to use the thread pool (think: database access)
libuv has a default thread pool size of 4, and uses a queue to manage access to the thread pool - the upshot is that if you have 5 long-running DB queries all going at the same time, one of them (and any other asynchronous action that relies on the thread pool) will be waiting for those queries to finish before they even get started
You can mitigate this by increasing the size of the thread pool through the UV_THREADPOOL_SIZE environment variable, so long as you do it before the thread pool is required and created: process.env.UV_THREADPOOL_SIZE = 10;
If you want traditional multi-processing or multi-threading in node, you can get it through the built in cluster module or various other modules such as the aforementioned webworker-threads, or you can fake it by implementing some way of chunking up your work and manually using setTimeout or setImmediate or process.nextTick to pause your work and continue it in a later loop to let other processes complete (but that's not recommended).
Please note, if you're writing long running/blocking code in javascript, you're probably making a mistake. Other languages will perform much more efficiently.
So I have an understanding of how Node.js works: it has a single listener thread that receives an event and then delegates it to a worker pool. The worker thread notifies the listener once it completes the work, and the listener then returns the response to the caller.
This is not really accurate. Node.js has only a single "worker" thread that does javascript execution. There are threads within node that handle IO processing, but to think of them as "workers" is a misconception. There are really just IO handling and a few other details of node's internal implementation, but as a programmer you cannot influence their behavior other than a few misc parameters such as MAX_LISTENERS.
My question is this: if I stand up an HTTP server in Node.js and call sleep on one of my routed path events (such as "/test/sleep"), the whole system comes to a halt. Even the single listener thread. But my understanding was that this code is happening on the worker pool.
There is no sleep mechanism in JavaScript. We could discuss this more concretely if you posted a code snippet of what you think "sleep" means. There's no such function to call to simulate something like time.sleep(30) in python, for example. There's setTimeout but that is fundamentally NOT sleep. setTimeout and setInterval explicitly release, not block, the event loop so other bits of code can execute on the main execution thread. The only thing you can do is busy loop the CPU with in-memory computation, which will indeed starve the main execution thread and render your program unresponsive.
How does Node.js decide to use a thread pool thread vs the listener thread? Why can't I write event code that sleeps and only blocks a thread pool thread?
Network IO is always asynchronous. End of story. Disk IO has both synchronous and asynchronous APIs, so there is no "decision". node.js will behave according to the API core functions you call sync vs normal async. For example: fs.readFile vs fs.readFileSync. For child processes, there are also separate child_process.exec and child_process.execSync APIs.
Rule of thumb is always use the asynchronous APIs. The valid reasons to use the sync APIs are for initialization code in a network service before it is listening for connections or in simple scripts that do not accept network requests for build tools and that kind of thing.
Thread pool how when and who used:
First off when we use/install Node on a computer, it starts a process among other processes which is called node process in the computer, and it keeps running until you kill it. And this running process is our so-called single thread.
So the mechanism of single thread it makes easy to block a node application but this is one of the unique features that Node.js brings to the table. So, again if you run your node application, it will run in just a single thread. No matter if you have 1 or million users accessing your application at the same time.
So let's understand exactly what happens in the single thread of nodejs when you start your node application. At first the program is initialized, then all the top-level code is executed, which means all the codes that are not inside any callback function (remember all codes inside all callback functions will be executed under event loop).
After that, all the modules code executed then register all the callback, finally, event loop started for your application.
So as we discuss before all the callback functions and codes inside those functions will execute under event loop. In the event loop, loads are distributed in different phases. Anyway, I'm not going to discuss about event loop here.
Well for the sack of better understanding of Thread pool I a requesting you to imagine that in the event loop, codes inside of one callback function execute after completing execution of codes inside another callback function, now if there are some tasks are actually too heavy. They would then block our nodejs single thread. And so, that's where the thread pool comes in, which is just like the event loop, is provided to Node.js by the libuv library.
So the thread pool is not a part of nodejs itself, it's provided by libuv to offload heavy duties to libuv, and libuv will execute those codes in its own threads and after execution libuv will return the results to the event in the event loop.
Thread pool gives us four additional threads, those are completely separate from the main single thread. And we can actually configure it up to 128 threads.
So all these threads together formed a thread pool. and the event loop can then automatically offload heavy tasks to the thread pool.
The fun part is all this happens automatically behind the scenes. It's not us developers who decide what goes to the thread pool and what doesn't.
There are many tasks goes to the thread pool, such as
-> All operations dealing with files
->Everyting is related to cryptography, like caching passwords.
->All compression stuff
->DNS lookups
This misunderstanding is merely the difference between pre-emptive multi-tasking and cooperative multitasking...
The sleep turns off the entire carnival because there is really one line to all the rides, and you closed the gate. Think of it as "a JS interpreter and some other things" and ignore the threads...for you, there is only one thread, ...
...so don't block it.
Which part of syntax provides the information that this function should run in other thread and be non-blocking?
Let's consider simple asynchronous I/O in node.js
var fs = require('fs');
var path = process.argv[2];
fs.readFile(path, 'utf8', function(err,data) {
var lines = data.split('\n');
console.log(lines.length-1);
});
What exactly makes the trick that it happens in background? Could anyone explain it precisely or paste a link to some good resource? Everywhere I looked there is plenty of info about what callback is, but nobody explains why it actually works like that.
This is not the specific question about node.js, it's about general concept of callback in each programming language.
EDIT:
Probably the example I provided is not best here. So let's do not consider this node.js code snippet. I'm asking generally - what makes the trick that program keeps executing when encounter callback function. What is in syntax
that makes callback concept a non-blocking one?
Thanks in advance!
There is nothing in the syntax that tells you your callback is executed asynchronously. Callbacks can be asynchronous, such as:
setTimeout(function(){
console.log("this is async");
}, 100);
or it can be synchronous, such as:
an_array.forEach(function(x){
console.log("this is sync");
});
So, how can you know if a function will invoke the callback synchronously or asynchronously? The only reliable way is to read the documentation.
You can also write a test to find out if documentation is not available:
var t = "this is async";
some_function(function(){
t = "this is sync";
});
console.log(t);
How asynchronous code work
Javascript, per se, doesn't have any feature to make functions asynchronous. If you want to write an asynchronous function you have two options:
Use another asynchronous function such as setTimeout or web workers to execute your logic.
Write it in C.
As for how the C coded functions (such as setTimeout) implement asynchronous execution? It all has to do with the event loop (or mostly).
The Event Loop
Inside the web browser there is this piece of code that is used for networking. Originally, the networking code could only download one thing: the HTML page itself. When Mosaic invented the <img> tag the networking code evolved to download multiple resources. Then Netscape implemented progressive rendering of images, they had to make the networking code asynchronous so that they can draw the page before all images are loaded and update each image progressively and individually. This is the origin of the event loop.
In the heart of the browser there is an event loop that evolved from asynchronous networking code. So it's not surprising that it uses an I/O primitive as its core: select() (or something similar such as poll, epoll etc. depending on OS).
The select() function in C allows you to wait for multiple I/O operations in a single thread without needing to spawn additional threads. select() looks something like:
select (max, readlist, writelist, errlist, timeout)
To have it wait for an I/O (from a socket or disk) you'd add the file descriptor to the readlist and it will return when there is data available on any of your I/O channels. Once it returns you can continue processing the data.
The javascript interpreter saves your callback and then calls the select() function. When select() returns the interpreter figures out which callback is associated with which I/O channel and then calls it.
Conveniently, select() also allows you to specify a timeout value. By carefully managing the timeout passed to select() you can cause callbacks to be called at some time in the future. This is how setTimeout and setInterval are implemented. The interpreter keeps a list of all timeouts and calculates what it needs to pass as timeout to select(). Then when select() returns in addition to finding out if there are any callbacks that needs to be called due to an I/O operation the interpreter also checks for any expired timeouts that needs to be called.
So select() alone covers almost all the functionality necessary to implement asynchronous functions. But modern browsers also have web workers. In the case of web workers the browser spawns threads to execute javascript code asynchronously. To communicate back to the main thread the workers must still interact with the event loop (the select() function).
Node.js also spawns threads when dealing with file/disk I/O. When the I/O operation completes it communicates back with the main event loop to cause the appropriate callbacks to execute.
Hopefully this answers your question. I've always wanted to write this answer but was to busy to do so previously. If you want to know more about non-blocking I/O programming in C I suggest you take a read this: http://www.gnu.org/software/libc/manual/html_node/Waiting-for-I_002fO.html
For more information see also:
Is nodejs representing Reactor or Proactor design pattern?
Performance of NodeJS with large amount of callbacks
First of all, if something is not Async, it means it's blocking. So the javascript runner stops on that line until that function is over (that's what a readFileSync would do).
As we all know, fs is a IO library, so that kind of things take time (tell the hardware to read some files is not something done right away), so it makes a lot of sense that anything that does not require only the CPU, it's async, because it takes time, and does not need to freeze the rest of the code for waiting another piece of hardware (while the CPU is idle).
I hope this solves your doubts.
A callback is not necessarily asynchronous. Execution depends entirely on how fs.readFile decides to treat the function parameter.
In JavaScript, you can execute a function asynchronously using for example setTimeout.
Discussion and resources:
How does node.js implement non-blocking I/O?
Concurrency model and Event Loop
Wikipedia:
There are two types of callbacks, differing in how they control data flow at runtime: blocking callbacks (also known as synchronous callbacks or just callbacks) and deferred callbacks (also known as asynchronous callbacks).
I am trying to choose a platform to code my network application on, it will be a small realtime online game server. I am not very familiar with the async theory though I know how to write a little asynchronous code.
I both know javascript and python, on the same level.
So I was reading on twisted here and he says:
During a callback, the Twisted loop is effectively “blocked” on our
code. So we should make sure our callback code doesn’t waste any time.
In particular, we should avoid making blocking I/O calls in our
callbacks. Otherwise, we would be defeating the whole point of using
the reactor pattern in the first place. Twisted will not take any
special precautions to prevent our code from blocking, we just have to
make sure not to do it. As we will eventually see, for the common case
of network I/O we don’t have to worry about it as we let Twisted do
the asynchronous communication for us.
I wanted to see how this is different to how the event loop on node.js is done. I believe node.js implements the event loop and it never blocks, or am I missing something?
I write somewhat blocking codes on my callbacks with node.js does this mean I'm making a mistake?
Why is twisted called async and event driven when it still blocks?
Cheers,
Maj
Twisted, node.js, and every other asynchronous framework behave the exact same way here: If you writing blocking code in your callbacks, the entire event loop is blocked until your callback is done.
Asynchronous frameworks are really great for doing I/O-bound work; the event loop never gets blocked waiting for I/O, because it can all be done in a non-blocking way. When there is data ready for reading, the event loop fires off your callback, the callback handles the data, and then the event loop takes control again. When you hear these frameworks being called "async" and "event-driven", it's referring to this non-blocking I/O + event loop model.
However, when you actually need to do some kind of processing with the data being sent/received, you need to be careful. Event loops are single-threaded; only one CPU-based operation can happen at a time. That means if you do some expensive calculation that takes 10 seconds in a callback, your event loop is blocked for 10 seconds. There's no extra magic in node.js that avoids this.
If you want to be able to do CPU-based operations without blocking your event loop, node.js (and twisted) have mechanisms for sending the CPU-bound work to a sub-process, and then fetching the results when the sub-process is complete. The node.js About page actually mentions this:
But what about multiple-processor concurrency? Aren't threads
necessary to scale programs to multi-core computers? You can start new
processes via child_process.fork() these other processes will be
scheduled in parallel. For load balancing incoming connections across
multiple processes use the cluster module.
I read a lot about node js trying to understand the event loop and its patterns / anti patterns. One thing that many authors fail to mentions that node actually handles threads. The application programmer however doesn't get access to them of course, but it's nice to know that they exist and when they will kick in.
As far as I understand, when Ryan Dahl explains it, threads will be used only for file system access and networking. Thereby: not for computing... And my concern here is: why not computing?
Even if I place a looong for loop in a callback function it will block the entire loop when executed. Due to this image found on http://www.slideshare.net/cacois/nodejs-patterns-for-discerning-developers all registered callbacks will be handled by the advanced threading mechanism of node. But apparently not :(
Even if a lot of speed is gained from making io and file handling async, why not go the whole mile and make all the registered callbacks be handled by node's internal threads?
It just struck me though, that the shared concurrency wouldn't work with separate threads trying to access the global app namespace. (This might be a big reason)
What do you think?
Even if a lot of speed is gained from making io and file handling async, why not go the whole mile and make all the registered callbacks be handled by node's internal threads?
That would break one of the fundamental “nice things” about Node.js. If you have this:
if (a === 7) {
console.log(a);
}
a is guaranteed to be 7 when calling console.log, because it’s synchronous code. Parallel execution of synchronous code kind of breaks that. Sure, you can make an arbitrary break at callbacks and turn them into threads, but that’s no better than every other threading system.
There’s also the matter of threads being able to exhaust a system’s resources in a way a task queue can do only with great difficulty.
Nobody has actually asked this (from all the 'suggestions' I'm getting and also from searching before I asked here).
So why is node.js asynchronous?
From what I have deduced after some research:
Languages like PHP and Python are scripting languages (I could be wrong about the actual languages that are scripting languages) whilst JavaScript isn't. (I suppose this derives from the fact that JS doesn't compile?)
Node.js runs on a single thread whilst scripting languages use multiple threads.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Maybe the answer is found somewhere stated above, but I'm still not sure.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
PS. I know some of you will ask "why would you want to make JS synchronous?" in your answers, but the truth is that I don't. I'm just asking these types of questions because I'm sure there are more people out there than just myself that have thought about such questions.
Node.js runs on a single thread whilst scripting languages use multiple threads.
Not technically. Node.js uses several threads, but only one execution thread. The background threads are for dealing with IO to make all of the asynchronous goodness work. Dealing with threads efficiently is a royal pain, so the next best option is to run in an event loop so code can run while background threads are blocked on IO.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Not necessarily. You can preserve state in an asynchronous system pretty easily. For example, in Javascript, you can use bind() to bind a this to a function, thereby preserving state explicitly when the function returns:
function State() {
// make sure that whenever doStuff is called it maintains its state
this.doStuff = this.doStuff.bind(this);
}
State.prototype.doStuff = function () {
};
Asynchronous means not waiting for an operation to finish, but registering a listener instead. This happens all the time in other languages, notably anything that needs to accept input from the user. For example, in a Java GUI, you don't block waiting for the user to press a button, but you register a listener with the GUI.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
Technically, all languages are synchronous, even Javascript. However, Javascript works a lot better in an asynchronous design because it was designed to be single threaded.
Basically there are two types of programs:
CPU bound- the only way to make it go faster is to get more CPU time
IO bound- spends a lot of time waiting for data, so a faster processor won't matter
Video games, number crunchers and compilers are CPU bound, whereas web servers and GUIs are generally IO bound. Javascript is relatively slow (because of how complex it is), so it wouldn't be able to compete in a CPU bound scenario (trust me, I've written my fair share of CPU-bound Javascript).
Instead of coding in terms of classes and objects, Javascript lends itself to coding in terms of simple functions that can be strung together. This works very well in asynchronous design, because algorithms can be written to process data incrementally as it comes in. IO (especially network IO) is very slow, so there's quite a bit of time between packets of data.
Example
Let's suppose you have 1000 live connections, each delivering a packet every millisecond, and processing each packet takes 1 microsecond (very reasonable). Let's also assume each connection sends 5 packets.
In a single-threaded, synchronous application, each connection will be handled in series. The total time taken is (5*1 + 5*.001) * 1000 milliseconds, or ~5005 milliseconds.
In a single-threaded, asynchronous application, each connection will be handled in parallel. Since every packet takes 1 millisecond, and processing each packet takes .001 milliseconds, we can process every connection's packet between packets, so our formula becomes: 1000*.001 + 5*1 milliseconds, or ~6 milliseconds.
The traditional solution to this problem was to create more threads. This solved the IO problem, but then when the number of connections rose, so did the memory usage (threads cost lots of memory) and CPU usage (multiplexing 100 threads onto 1 core is harder than 1 thread on 1 core).
However, there are downsides. If your web application happens to also need to do some heavy number crunching, you're SOL because while you're crunching numbers, connections need to wait. Threading solves this because the OS can swap out your CPU-intensive task when data is ready for a thread waiting on IO. Also, node.js is bound to a single core, so you can't take advantage of your multi-core processor unless you spin up multiple instances and proxy requests.
Javascript does not compile into anything. It's "evaluated" at runtime, just like PHP & Ruby. Therefore it is a scripting language just like PHP/Ruby. (it's official name is actually ECMAScript).
The 'model' that Node adheres to is a bit different than PHP/Ruby. Node.js uses an 'event loop' (the single thread) that has the one goal of taking network requests and handling them very quickly, and if for any reason it encounters an operation that takes a while (API request, database query -- basically anything involving I.O. (input/output)) it passes that off to a background 'worker' thread and goes off to do something else while the worker thread waits for the long task to complete. When that happens the main 'event loop' will take the results and continue deal with them.
PHP/Ruby following a threading model. Essentially, for every incoming network request, the application server spins up an isloated thread or process to handle the request. This does not scale tremendously well and Node's approach is cited as one of its core strengths compared to this model.
Asynchronous means stateless and that the connection is persistent
whilst synchronous is the (almost) opposite.
No. Synchronous instructions are completed in a natural order, from first to last. Asynchronous instructions mean that if a step in the flow of a program takes a relatively long time, the program will continue executing operations and simply return to this operation when complete.
Could JavaScript be made into a synchronous language?
Certain operations in JavaScript are synchronous. Others are asynchronous.
For example:
Blocking operations:
for(var k = 0; k < 1; k = k - 1;){
alert('this will quickly get annoying and the loop will block execution')
alert('this is blocked and will never happen because the above loop is infinite');
Asynchronous:
jQuery.get('/foo', function (result) { alert('This will occur 2nd, asynchronously'); });
alert('This will occur 1st. The above operation was skipped over and execution continued until the above operation completes.');
Could JavaScript be made into a synchronous language?
Javascript is not an "asynchronous language"; rather, node.js has a lot of asynchronous APIs. Asynchronous-ness is a property of the API and not the language. The ease with which functions can be created and passed around in javascript makes it convenient to pass callback functions, which is one way to handle control flow in an asynchronous API, but there's nothing inherently asynchronous about javascript. Javascript can easily support synchronous APIs.
Why is node.js asynchronous?
Node.js favors asynchronous APIs because it is single-threaded. This allows it to efficiently manage its own resources, but requires that long-running operations be non-blocking, and asynchronous APIs are a way to allow for control of flow with lots of non-blocking operations.