Say I have a sample code running in NodeJS
function (){
///OPERATION 1
///OPERATION 2
}
Considering none of the operations require any sort of time out, by default would javascript run both at the same time or finish operation 1 then operation 2?
No two lines of JavaScript ever run simultaneously within the same process. Ever.
See concurrency vs parallelism.
Asynchronous code in Node.js - assuming there are no worker processes involved - is always running with concurrency and never parallelism. Concurrency is easier to program and helps us build complex machinery quickly. To "scale up" you may need to bring in worker processes to do work in parallel.
In your example, if both operations are synchronous, they will run in the order they are written (1, 2). In fact, if operation 1 is synchronous it will always run first no matter what. If both are asynchronous, then how you experience it depends on how long they each take to complete! If operation 1 is asynchronous but operation 2 is synchronous, then they will seem to run in reverse order (2, 1). This has to do with the way functions get scheduled on the event loop, so understanding that will help this all make sense.
Take a breath. Time for a deep dive.
To be clear, in reality, lines of code always get executed in order. We don't have GOTO and the JavaScript engine does not mysteriously jump around to different places. The key thing to understand is that when someone says a function is asynchronous it is really also partly synchronous. Something is happening synchronously. Otherwise it would be an empty function. Instead, it just means that only a tiny bit of work is done synchronously, usually that simply consists of scheduling work for later, and then the rest of it happens later.
So above when I said that if they are both asynchronous then "it depends", it's merely the completion or result of that function (which you experience via a callback or Promise) whose order is undefined in relation to the completion of other asynchronous functions. This isn't the case for fully synchronous functions simply because the world stops for synchronous functions.
If you have two functions both trying to retrieve the same data from two different sources, one from your hard disk and one from the internet, which will finish first? If they are both asynchronous, then it's a trick question. Probably the hard disk is faster, but don't bet your life on it. Still, one of them technically gets kicked off first, synchronously.
This paradigm of scheduling things for later and not waiting for the result before continuing (non-blocking) is one of the ways Node.js manages to have such great performance even without worker processes / parallelism. For I/O in particular, such as reading a file from the disk, there are "quiet periods" of inactivity where there is nothing for the process to do. In this case, waiting before continuing is a huge waste of time. It is more appropriate to use that opportunity to interleave other functions in the meantime. This is what asynchronous functions do. They delay work to allow us to interleave other work. Usually they achieve this via process.nextTick() or setImmediate().
All that said, nothing in life is free. Delaying work has a cost and if you misuse timers you will slow down your program. The goal is to make everything that has unavoidable delays (like I/O) asynchronous and almost nothing else. However asynchronous behavior "pollutes the stack". Anything that uses an asynchronous function becomes asynchronous in nature. You could return a value synchronously and pretend like it's not the case (make it invisible to the outside world), but that is usually a bad idea because then you cannot propagate errors or the result at all.
If you are still confused about how to look at a program and figure out when everything runs, have a look at async and await. It is a wonderful way to write asynchronous code that looks like more traditional synchronous code. And it is arriving in Node 7.
Related
I'm confused about synchronous JavaScript vs asynchronous. If by default JavaScript is single threaded and can only complete one operation at a time, line by line, wouldn't this be 'asynchronous' i.e. things do not occur at the same time? How is it synchronous?
Also a piece of async code like a promise, the promise allows the rest of the code to run while it waits to resolve. Wouldn't this be synchronous i.e. letting multiple operations happen at once?
I'm confused as this seems the wrong way round in my mind.
Something can be both single-threaded and asynchronous.
Firstly, let's talk about the difference between a thread and a process.
The basic definition is that separate processes have seperate memory spaces - they can't access each other's memory.
Whereas separate threads share the same memory space.
If we think of a thread as a queue of instructions, a multithreaded application can have two of these queues of instructions operating at the same time, but each accessing the same memory (and potentially screwing up the the state of the memory for the other thread.).
JavaScript is single threaded
All this means is that there is this one queue of instructions.
Now, this does that mean that JavaScript might not be suitable for doing embarrassingly parallel processing like sorting a trillion numbers using the quick sort algorithm, because you can't make use of a computer's multiple processors.
So how does the asynchronous work?
It comes down to the JavaScript event loop, and the fact that JavaScript is non-blocking.
To give an example, if I write some code that looks like:
const response = await fetch("/api/someData");
or not using async/await:
fetch("/api/someData").then(response => {
//Use the response here.
});
And say it takes one second for this response to return, the JavaScript engine doesn't just sit there doing nothing until the reponse returns.
Instead, the event loop continues and it continues processing everything else that it can, until the promise resolves and that code can continue.
If you want more details on exactly how the event loop works, I recommend reading that Mozilla documentation, or this post.
I am aiming to understand what exactly is it about certain JavaScript functions that make them asynchronous. You could have a function like this:
function someRandomFunction () {
for (let i of everything) {
iterator++
something = true
hello = that + whatever
}
}
Nothing about this function is asynchronous. It does a lot of stuff but it does it very quickly.
But then take a Node.js function like this:
fs.readFile('path/to/file', (err, data) => {
// do something
})
This function is declared to be asynchronous. But why is it? What is the reason behind it?
Is it because it takes a certain amount of time for reading a file to complete, therefore it's asynchronous? Why is that asynchronous when looping through some variables and doing some calculations is not asynchronous?
The 'idea' of something being asynchronous means "we've relinquished control to some other operational code, and we have no idea when that other code will allow us to operate again".
So in your bottom sample, you give control over to node's filesystem operations to read a file. We don't know how large that file is, so there is no way to anticipate how much time it will take to complete. So JavaScript allows us to provide a "callback" that get's fired when the async operation is complete. When the callback is executed, control is returned to our code.
The top example is synchronous because your code maintains control over the operation and Javascript executes code synchronously by nature.
A function is either explicitly synchronous or asynchronous; there's no "certain amount of time" that automatically makes it asynchronous.
The important thing to remember with Node is that it's "async I/O", which means nothing is ever async without some input/output to/from the process. You normally see this with file reading/writing, database calls, and anything that requires a network call.
This is a basic but fundamental question. Passing a function to another function doesn't make the code asynchronous automatically. It's all about attemting to work with another piece of code, mostly of unknown origins, probably but not necessarily related with operating system's IO mechanism. In another words you ask something to happen out of the realm of the currently exectuing JS context by handing a task to known or unknown resources which are possibly running on a separate thread.
As i said this can be an IO operation or a database access of which you have no idea on
What code is running
At which thread it is running
How long it will take
Whether it will succeed or not
Behind the curtains these alien codes finally trigger the event queue to invoke the callback function through a driver. In some pure functional languages like Haskell such activities are strictly kept away from the context and must be handled by monadic type functions. In JS this is basically done by asynchronous workflow by callbacks, promises or async-awiat mechanism.
Once you hand out a task to the outer world then there shall be no way for it to interfere with your synchronous code.
In terms of JavaScript, the difference between synchronous and async functions is where and when the code is executed. Synchronous functions are executed immediately one after the other in the current call stack. Async functions are passed off to the event loop, and values returned once the current call stack has completed execution.
Caveat being, nothing in JavaScript is truly asynchronous due to the fact that JS only executes on a single process thread
In Java we have "threads", in CPython we have threads (non-concurrent) and "processes".
In JS, when I kick off an async function or method, how do I officially refer to these "strands of executing code"?
I have heard that each such code block executes from start to finish*, meaning that there is never any concurrent** processing in JS. I'm not quite sure whether this is the same situation as with CPython threads. Personally I hesitate to use "thread" for what we have in JS as these "strands" are so different from Java concurrent threads.
* Just to clarify in light of Stephen Cleary's helpful response: I mean "each such synchronous code block". Obviously if an await is encountered control is released ...
** And obviously never any "true parallel" processing. I'm following the widely accepted distinction between "concurrent" (only one thread at any one time, but one "strand of execution" may give way to another) and "parallel" (multiple processes, implementing true parallel processing, often using multiple CPUs or cores or processes). My understanding is that these "strands" in JS are not even concurrent: once one AJAX method or Promise or async method/function starts executing nothing can happen until it's finished (or an await happens)...
In JS, when I kick off an async function or method, how do I officially refer to these "strands of executing code"?
For lack of a better term, I've referred to these as "asynchronous operations".
I have heard that each such code block executes from start to finish
This was true... until await. Now there's a couple of ways to think about it, both of which are correct:
Code blocks execute exclusively unless there's an await.
await splits its function into multiple code blocks that do execute exclusively (each await is a "split point").
meaning that there is never any concurrent processing in JS.
I'd disagree with this statement. JavaScript forces asynchrony (before async/await, it used callbacks or promises), so it does have asynchronous concurrency, though not parallel concurrency.
A good mental model is that JavaScript is inherently single-threaded, and that one thread has a queue of work to do. await does not block that thread; it returns, and when its thenable/Promise completes, it schedules the remainder of that method (the "continuation") to the queue.
So you can have multiple asynchronous methods executing concurrently. This is exactly how Node.js handles multiple simultaneous requests.
My understanding is that these "strands" in JS are not even concurrent: once one AJAX method or Promise or async method/function starts executing nothing can happen until it's finished...
No, this is incorrect. await will return control to the main thread, freeing it up to do other work.
the way things operate in JS means you "never have to worry about concurrency issues, etc."
Well... yes and no. Since JavaScript is inherently single-threaded, there's never any contention over data shared between multiple threads (obviously). I'd say that's what the original writer was thinking about. So I'd say easily north of 90% of thread-synchronization problems do just go away.
However, as you noted, you still need to be careful modifying shared state from multiple asynchronous operations. If they're running concurrently, then they can complete in any order, and your logic has to properly handle that.
Ideally, the best solution is to move to a more functional mindset - that is, get rid of the shared state as much as possible. Asynchronous operations should return their results instead of updating shared state. Then these concurrent asynchronous operations can be composed into a higher-level asynchronous operation using async, await, and Promise.all. If possible, this more functional approach of returning-instead-of-state and function composition will make the code easier to deal with.
But there's still some situations where this isn't easily achievable. It's possible to develop asynchronous equivalents of classical synchronous coordination primitives. I worked up a proof-of-concept AsyncLock, but can't seem to find the code anywhere. Well, it's possible to do this, anyway.
To date 24 people, no more no fewer, have chosen to look at this.
FWIW, and should anyone eccentric take an interest in this question one day, I propose "hogging strands" or just "strands" ... they hog the browser's JS engine, it would appear, and just don't let go, unless they encounter an await.
I've been developing this app where 3 "strands" operate more or less simultaneously, all involving AJAX calls and a database query... but in fact it is highly preferable if the 2nd of the 3 executes and returns before the 3rd.
But because the 3rd query is quite a bit simpler for MySQL than the 2nd the former tends to return from its AJAX "excursion" quite a bit earlier if everything is allowed to operate with unconstrained asynchronicity. This then causes a big delay before the 2nd strand is allowed to do its stuff. To force the code to perform and complete the 2nd "strand" therefore requires quite a bit of coercion, using async and await.
I read one comment on JS asynchronicity somewhere where someone opined that the way things operate in JS means you "never have to worry about concurrency issues, etc.". I don't agree: if you don't in fact know when awaits will unblock this does necessarily mean that concurrency is involved. Perhaps "hogging strands" are easier to deal with than true "concurrency", let alone true "parallelism", however... ?
Which part of syntax provides the information that this function should run in other thread and be non-blocking?
Let's consider simple asynchronous I/O in node.js
var fs = require('fs');
var path = process.argv[2];
fs.readFile(path, 'utf8', function(err,data) {
var lines = data.split('\n');
console.log(lines.length-1);
});
What exactly makes the trick that it happens in background? Could anyone explain it precisely or paste a link to some good resource? Everywhere I looked there is plenty of info about what callback is, but nobody explains why it actually works like that.
This is not the specific question about node.js, it's about general concept of callback in each programming language.
EDIT:
Probably the example I provided is not best here. So let's do not consider this node.js code snippet. I'm asking generally - what makes the trick that program keeps executing when encounter callback function. What is in syntax
that makes callback concept a non-blocking one?
Thanks in advance!
There is nothing in the syntax that tells you your callback is executed asynchronously. Callbacks can be asynchronous, such as:
setTimeout(function(){
console.log("this is async");
}, 100);
or it can be synchronous, such as:
an_array.forEach(function(x){
console.log("this is sync");
});
So, how can you know if a function will invoke the callback synchronously or asynchronously? The only reliable way is to read the documentation.
You can also write a test to find out if documentation is not available:
var t = "this is async";
some_function(function(){
t = "this is sync";
});
console.log(t);
How asynchronous code work
Javascript, per se, doesn't have any feature to make functions asynchronous. If you want to write an asynchronous function you have two options:
Use another asynchronous function such as setTimeout or web workers to execute your logic.
Write it in C.
As for how the C coded functions (such as setTimeout) implement asynchronous execution? It all has to do with the event loop (or mostly).
The Event Loop
Inside the web browser there is this piece of code that is used for networking. Originally, the networking code could only download one thing: the HTML page itself. When Mosaic invented the <img> tag the networking code evolved to download multiple resources. Then Netscape implemented progressive rendering of images, they had to make the networking code asynchronous so that they can draw the page before all images are loaded and update each image progressively and individually. This is the origin of the event loop.
In the heart of the browser there is an event loop that evolved from asynchronous networking code. So it's not surprising that it uses an I/O primitive as its core: select() (or something similar such as poll, epoll etc. depending on OS).
The select() function in C allows you to wait for multiple I/O operations in a single thread without needing to spawn additional threads. select() looks something like:
select (max, readlist, writelist, errlist, timeout)
To have it wait for an I/O (from a socket or disk) you'd add the file descriptor to the readlist and it will return when there is data available on any of your I/O channels. Once it returns you can continue processing the data.
The javascript interpreter saves your callback and then calls the select() function. When select() returns the interpreter figures out which callback is associated with which I/O channel and then calls it.
Conveniently, select() also allows you to specify a timeout value. By carefully managing the timeout passed to select() you can cause callbacks to be called at some time in the future. This is how setTimeout and setInterval are implemented. The interpreter keeps a list of all timeouts and calculates what it needs to pass as timeout to select(). Then when select() returns in addition to finding out if there are any callbacks that needs to be called due to an I/O operation the interpreter also checks for any expired timeouts that needs to be called.
So select() alone covers almost all the functionality necessary to implement asynchronous functions. But modern browsers also have web workers. In the case of web workers the browser spawns threads to execute javascript code asynchronously. To communicate back to the main thread the workers must still interact with the event loop (the select() function).
Node.js also spawns threads when dealing with file/disk I/O. When the I/O operation completes it communicates back with the main event loop to cause the appropriate callbacks to execute.
Hopefully this answers your question. I've always wanted to write this answer but was to busy to do so previously. If you want to know more about non-blocking I/O programming in C I suggest you take a read this: http://www.gnu.org/software/libc/manual/html_node/Waiting-for-I_002fO.html
For more information see also:
Is nodejs representing Reactor or Proactor design pattern?
Performance of NodeJS with large amount of callbacks
First of all, if something is not Async, it means it's blocking. So the javascript runner stops on that line until that function is over (that's what a readFileSync would do).
As we all know, fs is a IO library, so that kind of things take time (tell the hardware to read some files is not something done right away), so it makes a lot of sense that anything that does not require only the CPU, it's async, because it takes time, and does not need to freeze the rest of the code for waiting another piece of hardware (while the CPU is idle).
I hope this solves your doubts.
A callback is not necessarily asynchronous. Execution depends entirely on how fs.readFile decides to treat the function parameter.
In JavaScript, you can execute a function asynchronously using for example setTimeout.
Discussion and resources:
How does node.js implement non-blocking I/O?
Concurrency model and Event Loop
Wikipedia:
There are two types of callbacks, differing in how they control data flow at runtime: blocking callbacks (also known as synchronous callbacks or just callbacks) and deferred callbacks (also known as asynchronous callbacks).
Nobody has actually asked this (from all the 'suggestions' I'm getting and also from searching before I asked here).
So why is node.js asynchronous?
From what I have deduced after some research:
Languages like PHP and Python are scripting languages (I could be wrong about the actual languages that are scripting languages) whilst JavaScript isn't. (I suppose this derives from the fact that JS doesn't compile?)
Node.js runs on a single thread whilst scripting languages use multiple threads.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Maybe the answer is found somewhere stated above, but I'm still not sure.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
PS. I know some of you will ask "why would you want to make JS synchronous?" in your answers, but the truth is that I don't. I'm just asking these types of questions because I'm sure there are more people out there than just myself that have thought about such questions.
Node.js runs on a single thread whilst scripting languages use multiple threads.
Not technically. Node.js uses several threads, but only one execution thread. The background threads are for dealing with IO to make all of the asynchronous goodness work. Dealing with threads efficiently is a royal pain, so the next best option is to run in an event loop so code can run while background threads are blocked on IO.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Not necessarily. You can preserve state in an asynchronous system pretty easily. For example, in Javascript, you can use bind() to bind a this to a function, thereby preserving state explicitly when the function returns:
function State() {
// make sure that whenever doStuff is called it maintains its state
this.doStuff = this.doStuff.bind(this);
}
State.prototype.doStuff = function () {
};
Asynchronous means not waiting for an operation to finish, but registering a listener instead. This happens all the time in other languages, notably anything that needs to accept input from the user. For example, in a Java GUI, you don't block waiting for the user to press a button, but you register a listener with the GUI.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
Technically, all languages are synchronous, even Javascript. However, Javascript works a lot better in an asynchronous design because it was designed to be single threaded.
Basically there are two types of programs:
CPU bound- the only way to make it go faster is to get more CPU time
IO bound- spends a lot of time waiting for data, so a faster processor won't matter
Video games, number crunchers and compilers are CPU bound, whereas web servers and GUIs are generally IO bound. Javascript is relatively slow (because of how complex it is), so it wouldn't be able to compete in a CPU bound scenario (trust me, I've written my fair share of CPU-bound Javascript).
Instead of coding in terms of classes and objects, Javascript lends itself to coding in terms of simple functions that can be strung together. This works very well in asynchronous design, because algorithms can be written to process data incrementally as it comes in. IO (especially network IO) is very slow, so there's quite a bit of time between packets of data.
Example
Let's suppose you have 1000 live connections, each delivering a packet every millisecond, and processing each packet takes 1 microsecond (very reasonable). Let's also assume each connection sends 5 packets.
In a single-threaded, synchronous application, each connection will be handled in series. The total time taken is (5*1 + 5*.001) * 1000 milliseconds, or ~5005 milliseconds.
In a single-threaded, asynchronous application, each connection will be handled in parallel. Since every packet takes 1 millisecond, and processing each packet takes .001 milliseconds, we can process every connection's packet between packets, so our formula becomes: 1000*.001 + 5*1 milliseconds, or ~6 milliseconds.
The traditional solution to this problem was to create more threads. This solved the IO problem, but then when the number of connections rose, so did the memory usage (threads cost lots of memory) and CPU usage (multiplexing 100 threads onto 1 core is harder than 1 thread on 1 core).
However, there are downsides. If your web application happens to also need to do some heavy number crunching, you're SOL because while you're crunching numbers, connections need to wait. Threading solves this because the OS can swap out your CPU-intensive task when data is ready for a thread waiting on IO. Also, node.js is bound to a single core, so you can't take advantage of your multi-core processor unless you spin up multiple instances and proxy requests.
Javascript does not compile into anything. It's "evaluated" at runtime, just like PHP & Ruby. Therefore it is a scripting language just like PHP/Ruby. (it's official name is actually ECMAScript).
The 'model' that Node adheres to is a bit different than PHP/Ruby. Node.js uses an 'event loop' (the single thread) that has the one goal of taking network requests and handling them very quickly, and if for any reason it encounters an operation that takes a while (API request, database query -- basically anything involving I.O. (input/output)) it passes that off to a background 'worker' thread and goes off to do something else while the worker thread waits for the long task to complete. When that happens the main 'event loop' will take the results and continue deal with them.
PHP/Ruby following a threading model. Essentially, for every incoming network request, the application server spins up an isloated thread or process to handle the request. This does not scale tremendously well and Node's approach is cited as one of its core strengths compared to this model.
Asynchronous means stateless and that the connection is persistent
whilst synchronous is the (almost) opposite.
No. Synchronous instructions are completed in a natural order, from first to last. Asynchronous instructions mean that if a step in the flow of a program takes a relatively long time, the program will continue executing operations and simply return to this operation when complete.
Could JavaScript be made into a synchronous language?
Certain operations in JavaScript are synchronous. Others are asynchronous.
For example:
Blocking operations:
for(var k = 0; k < 1; k = k - 1;){
alert('this will quickly get annoying and the loop will block execution')
alert('this is blocked and will never happen because the above loop is infinite');
Asynchronous:
jQuery.get('/foo', function (result) { alert('This will occur 2nd, asynchronously'); });
alert('This will occur 1st. The above operation was skipped over and execution continued until the above operation completes.');
Could JavaScript be made into a synchronous language?
Javascript is not an "asynchronous language"; rather, node.js has a lot of asynchronous APIs. Asynchronous-ness is a property of the API and not the language. The ease with which functions can be created and passed around in javascript makes it convenient to pass callback functions, which is one way to handle control flow in an asynchronous API, but there's nothing inherently asynchronous about javascript. Javascript can easily support synchronous APIs.
Why is node.js asynchronous?
Node.js favors asynchronous APIs because it is single-threaded. This allows it to efficiently manage its own resources, but requires that long-running operations be non-blocking, and asynchronous APIs are a way to allow for control of flow with lots of non-blocking operations.