JavaScript and Multi-threading by browser for async operations? - javascript

I'm experimenting with laying out data in indexedDB object stores and using Promise.all to extract the data, build HTML, and add it to different sections of a tabbed display within a single-page web application.
If Promise.all is used to extract the data from different object stores or different key ranges of the same store and then build HTML fragments and insert them before resolving each individual promise, is the browser really performing these steps concurrently such that the process will complete quicker?
To be more specific, there is one promise for the extraction of data for each tab in the display. If the transaction completes successfully then the data is passed to a function that builds and inserts the HTML and, when it returns, the promise resolves. The promises for each tab are grouped in the Promise.all and, when that resolves, the program navigates to the first tab and displays it.
Is this really working on multiple things at the same time or is it just providing a timing method to show the first tab only after it is known that data and HTML for all tabs have been successfully gathered and built?
Is it accurate that this is quicker than chaining the promises to run one after the other with then statements but not faster than just kicking off each promise individually and placing a then on the first tab's promise to display it?
I know JS is a single-thread language but the documentation for some of these methods reads that "in a separate thread..."; so, I don't quite understand if the browser is performing things at the same time or not.
Thank you.

First off, let's just stipulate that there's no use of webWorker threads so we're just talking about regular Javascript in a web page.
In that case, there's only ONE thread of Javascript running at any given time and the whole system is event driven (more on that later).
So, when running actual Javascript code things can only run one at a time. But, as soon as you make a function call that has a native-code backed non-blocking, asynchronous implementation, then the Javascript calls out to the native code, the asynchronous operation is initiated and then it immediately returns control back to the Javascript interpreter so more Javascript can run. In this manner, you can start multiple asynchronous operations at once that will proceed independent of the Javascript interpreter.
How much independent they are and how parallel they are depends upon the implementation of those asynchronous operations. If they are all accessing the same database, there may or may not be some real parallelism between database operations available - it all depends upon the database implementation.
Now, let's assume you did start multiple database operations and you are tracking when they a re all done with Promise.all(). As each one finishes, it will insert an event in the JS event queue. When the JS interpreter has nothing else to do, it will pull the next event from the event queue and run the callback associated with the completion of that asynchronous operation. When using Promise.all(), that callback will (among other things), mark that single operation as done and store its result for later access. When, all the operations you are tracking with Promise.all() have completed in this manner, then the promise that Promise.all() returns will resolve and you can get access to the array of results.
Now to your specific questions...
If Promise.all is used to extract the data from different object stores or different key ranges of the same store and then build HTML fragments and insert them before resolving each individual promise, is the browser really performing these steps concurrently such that the process will complete quicker?
If each asynchronous operation is capable of efficient running in parallel with others (at the native code level), then "yes" the total time to completion will be faster if the operations are run in parallel, than if they are sequenced.
Is this really working on multiple things at the same time or is it just providing a timing method to show the first tab only after it is known that data and HTML for all tabs have been successfully gathered and built?
That depends upon the specific asynchronous operations and if multiple of them can be run in parallel. Usually, they can, but not always.
Is it accurate that this is quicker than chaining the promises to run one after the other with then statements but not faster than just kicking off each promise individually and placing a then on the first tab's promise to display it?
Usually, yes.
I know JS is a single-thread language but the documentation for some of these methods reads that "in a separate thread..."; so, I don't quite understand if the browser is performing things at the same time or not.
The browser runs your Javascript (the executing of actual Javascript instructions) as a single thread (ignoring webWorkers here) with no parallelism. But, when you call an asynchronous operation, that operation has an implementation in native code and that native code behind it is free to use threads or other asynchronous, non-blocking OS APIs/tools to do its job. That may allow parallelism between separate asynchronous calls, thus allowing multiple calls to proceed in parallel.
The IndexedDB interface contains many asynchronous interfaces which means it is event driven and must be running asynchronously with some native code behind it. You would have to test a given implementation of that database in a given browser to see how well it parallelizes multiple requests in-flight at the same time. Each browser implementation could have different characteristics.

Related

Why does Javascript have much fewer blocking functions than Python

Moving from Javascript to Python, and looking at asyncio has me a little confused.
As someone who is new to the fundamental concepts of concurrency, I just assumed a superficial understanding of Javascript concurrency.
A basic understanding from using async / await in Javascript:
If we run any processes inside an async function, and await the response of the function, we are essentially waiting for the function to set a value on the Promise.
Makes total sense - when the Promise is given a value, we can also use callbacks such as .then() to handle the response. Alternatively, just await.
Whatever the underlying implementation of asynchronicity here is (for example all processes running on a single thread with an event loop), should it matter how we interface with it?
Now, I move to Python and start playing with asyncio. We have Futures, just like Promises. All of a sudden, I can't use my standard libraries, such as request.get(...), but I need to use non blocking network requests in libraries such as aiohttp.
What does blocking / non-blocking mean here? I assume it means the single thread that the event loop is on is blocked, so we cant process other functions in parallel.
So my 2 questions then are:
What causes the single thread to be blocked? For example in requests.get(...)
Why are most functions non-blocking in Javascript, but not in Python (i.e we don't need specific libraries such as aiohttp).
And what about languages like Go with their goroutines? Is it just a case because its a new language with concurrency built in from the beginning, that the concept of blocking functions don't exist. Or in Go it's not a single thread, so everything can inherently be parallelised?
Thanks :)
Event loop
Javascript, and python's async io make use of a concurrency model based on event loops.
(Note the plural because you could have multiple event loops which handle different kinds of tasks e.g. disk io, network io, ipc, parallel computations etc)
The general concept of an event loop is that, you have a number of things to do, so you put those things in a queue, and once in a while (like every nanosecond), the event loop picks an event from the queue, and runs it for a short while (maybe a millisecond or so), and either shoves it back in the queue if it hasn't completed, or waits until it yields control back to the event loop.
Now to answer some of your questions:
What does blocking / non-blocking mean here? I assume it means the
single thread that the event loop is on is blocked, so we cant process
other functions in parallel.
Blocking event loop
Blocking the event loop occurs when the event loop is running a task, and the task has either not finished or given back control to the event-loop, for a period of time longer than the event loop has scheduled it to run.
In the case of python's requests library, they make use of a synchronous http library, which doesn't respect the event loop; Therefore, running such a task in the loop will starve other tasks which are patiently waiting their turn to run, until the request is finished.
Why are most functions non-blocking in Javascript, but not in Python
(i.e we don't need specific libraries such as aiohttp).
JS
Everything in Javascript can block the event loop. The only way not to block the event loop is to make heavy use of callbacks via setTimeout. However, if care is not taken, even those callbacks can block the event loop if they run too long without yielding control back to the event loop via another setTimeout call.
(If you've never had to use setTimeout, but have used promises and async network requests in JS, then you are probably making use of a library that does. Most of the popular networking libraries used in browsers (ajax, axios, fetch, etc), are based on the popular XMLHttpRequest API, which provides async network IO.)
Python
In python, the story is slightly different: Before asyncio, there was no such thing as as "event loop". Everything must run to completion before python interpreter moves on to the next thing. This is part of what makes python very easy to learn (and dare I say, to create...). The reason for this, comes in the form of the python GIL, which in simple terms enforces a single order of execution for any python program. I encourage you to click that link, and read why the GIL exists.
And what about languages like Go with their goroutines?
Note: I am not a go programmer, but I read something
How is Go different?
The only difference between the way go handles goroutines and how python asyncio/js do their event loops, is that go makes more use of os threads to ensure that threads are scheduled fairly and make full use of the machine they run in.
While js callbacks/asyncio tasks will often run in the same thread as the event loop, goroutines are able to run in seperate OS threads and over multiple cores, thus giving them higher availability and higher parallelism. (In that case, we could almost consider goroutines to be closer to OS threads in terms of how much time they actually get to run, as compared to green threads which are bound by the amount of time the event loop's thread runs.)

How is JavaScript synchronous if it can only complete one operation at a time and synchronous means things happen at the same time

I'm confused about synchronous JavaScript vs asynchronous. If by default JavaScript is single threaded and can only complete one operation at a time, line by line, wouldn't this be 'asynchronous' i.e. things do not occur at the same time? How is it synchronous?
Also a piece of async code like a promise, the promise allows the rest of the code to run while it waits to resolve. Wouldn't this be synchronous i.e. letting multiple operations happen at once?
I'm confused as this seems the wrong way round in my mind.
Something can be both single-threaded and asynchronous.
Firstly, let's talk about the difference between a thread and a process.
The basic definition is that separate processes have seperate memory spaces - they can't access each other's memory.
Whereas separate threads share the same memory space.
If we think of a thread as a queue of instructions, a multithreaded application can have two of these queues of instructions operating at the same time, but each accessing the same memory (and potentially screwing up the the state of the memory for the other thread.).
JavaScript is single threaded
All this means is that there is this one queue of instructions.
Now, this does that mean that JavaScript might not be suitable for doing embarrassingly parallel processing like sorting a trillion numbers using the quick sort algorithm, because you can't make use of a computer's multiple processors.
So how does the asynchronous work?
It comes down to the JavaScript event loop, and the fact that JavaScript is non-blocking.
To give an example, if I write some code that looks like:
const response = await fetch("/api/someData");
or not using async/await:
fetch("/api/someData").then(response => {
//Use the response here.
});
And say it takes one second for this response to return, the JavaScript engine doesn't just sit there doing nothing until the reponse returns.
Instead, the event loop continues and it continues processing everything else that it can, until the promise resolves and that code can continue.
If you want more details on exactly how the event loop works, I recommend reading that Mozilla documentation, or this post.

synchronous vs async nodejs

Say I have a sample code running in NodeJS
function (){
///OPERATION 1
///OPERATION 2
}
Considering none of the operations require any sort of time out, by default would javascript run both at the same time or finish operation 1 then operation 2?
No two lines of JavaScript ever run simultaneously within the same process. Ever.
See concurrency vs parallelism.
Asynchronous code in Node.js - assuming there are no worker processes involved - is always running with concurrency and never parallelism. Concurrency is easier to program and helps us build complex machinery quickly. To "scale up" you may need to bring in worker processes to do work in parallel.
In your example, if both operations are synchronous, they will run in the order they are written (1, 2). In fact, if operation 1 is synchronous it will always run first no matter what. If both are asynchronous, then how you experience it depends on how long they each take to complete! If operation 1 is asynchronous but operation 2 is synchronous, then they will seem to run in reverse order (2, 1). This has to do with the way functions get scheduled on the event loop, so understanding that will help this all make sense.
Take a breath. Time for a deep dive.
To be clear, in reality, lines of code always get executed in order. We don't have GOTO and the JavaScript engine does not mysteriously jump around to different places. The key thing to understand is that when someone says a function is asynchronous it is really also partly synchronous. Something is happening synchronously. Otherwise it would be an empty function. Instead, it just means that only a tiny bit of work is done synchronously, usually that simply consists of scheduling work for later, and then the rest of it happens later.
So above when I said that if they are both asynchronous then "it depends", it's merely the completion or result of that function (which you experience via a callback or Promise) whose order is undefined in relation to the completion of other asynchronous functions. This isn't the case for fully synchronous functions simply because the world stops for synchronous functions.
If you have two functions both trying to retrieve the same data from two different sources, one from your hard disk and one from the internet, which will finish first? If they are both asynchronous, then it's a trick question. Probably the hard disk is faster, but don't bet your life on it. Still, one of them technically gets kicked off first, synchronously.
This paradigm of scheduling things for later and not waiting for the result before continuing (non-blocking) is one of the ways Node.js manages to have such great performance even without worker processes / parallelism. For I/O in particular, such as reading a file from the disk, there are "quiet periods" of inactivity where there is nothing for the process to do. In this case, waiting before continuing is a huge waste of time. It is more appropriate to use that opportunity to interleave other functions in the meantime. This is what asynchronous functions do. They delay work to allow us to interleave other work. Usually they achieve this via process.nextTick() or setImmediate().
All that said, nothing in life is free. Delaying work has a cost and if you misuse timers you will slow down your program. The goal is to make everything that has unavoidable delays (like I/O) asynchronous and almost nothing else. However asynchronous behavior "pollutes the stack". Anything that uses an asynchronous function becomes asynchronous in nature. You could return a value synchronously and pretend like it's not the case (make it invisible to the outside world), but that is usually a bad idea because then you cannot propagate errors or the result at all.
If you are still confused about how to look at a program and figure out when everything runs, have a look at async and await. It is a wonderful way to write asynchronous code that looks like more traditional synchronous code. And it is arriving in Node 7.

Creating an asychroneous api for nodeJS (or browser)

Most NodeJS programmers know why it's bad to block NodeJS's single-threaded event loop. For example, when reading a file, we know it's better to use fs.readFile instead of fs.readFileSync. My question is: how does one go about creating an asyn API in the first place if the underlying task to be performed by the API is inherently synchronous? Another words, how do I go about creating an API such as fs.readFileSync, and not use the event-loop thread to perform the underlying task? Do I have to go outside of NodeJS & Javascript to do this?
how does one go about creating an asyn API in the first place if the
underlying task to be performed by the API is inherently synchronous?
In node.js, there are the following choices for creating your own asynchronous operation:
Base it on existing async operations (such as fs.readFile()). If your main operation itself is synchronous and can only be done synchronously in node.js, then this option would obviously not solve your problem.
Break your code into small chunks using setTimeout(), setImmediate() or nextTick() so that other things can interleave with your operation and you won't block other things from sharing the CPU. This doesn't prevent CPU usage, but it does allow sharing of the CPU with other operations. Here's an example of iterating over an array in chunks that doesn't block other operations from sharing the CPU: Best way to iterate over an array without blocking the UI.
Move your synchronous operation into another process (either another node.js process or any other process) that can do its thing and then communicate back asynchronously when done using any appropriate inter-process communication mechanism (TCP, stdio, etc...).
Write a node.js plug-in where you have the ability to use native OS threading or native IO events and you can then create a new operation in node.js that has an async interface.
Do I have to go outside of NodeJS & Javascript to do this?
Items #1, #2, #3 above can all be done from Javascript. Item #4 involves create a native code plug-in (where you can have access to native threading).
Another words, how do I go about creating an API such as
fs.readFileSync
To truly create an API like fs.readFileSync() from scratch, you would have to use either option #3 (do it synchronously, but in another process and then communicate back the result asynchronously) or option #4 (access OS services at a lower level than is built into node.js via a native code plug-in).

I know that callback function runs asynchronously, but why?

Which part of syntax provides the information that this function should run in other thread and be non-blocking?
Let's consider simple asynchronous I/O in node.js
var fs = require('fs');
var path = process.argv[2];
fs.readFile(path, 'utf8', function(err,data) {
var lines = data.split('\n');
console.log(lines.length-1);
});
What exactly makes the trick that it happens in background? Could anyone explain it precisely or paste a link to some good resource? Everywhere I looked there is plenty of info about what callback is, but nobody explains why it actually works like that.
This is not the specific question about node.js, it's about general concept of callback in each programming language.
EDIT:
Probably the example I provided is not best here. So let's do not consider this node.js code snippet. I'm asking generally - what makes the trick that program keeps executing when encounter callback function. What is in syntax
that makes callback concept a non-blocking one?
Thanks in advance!
There is nothing in the syntax that tells you your callback is executed asynchronously. Callbacks can be asynchronous, such as:
setTimeout(function(){
console.log("this is async");
}, 100);
or it can be synchronous, such as:
an_array.forEach(function(x){
console.log("this is sync");
});
So, how can you know if a function will invoke the callback synchronously or asynchronously? The only reliable way is to read the documentation.
You can also write a test to find out if documentation is not available:
var t = "this is async";
some_function(function(){
t = "this is sync";
});
console.log(t);
How asynchronous code work
Javascript, per se, doesn't have any feature to make functions asynchronous. If you want to write an asynchronous function you have two options:
Use another asynchronous function such as setTimeout or web workers to execute your logic.
Write it in C.
As for how the C coded functions (such as setTimeout) implement asynchronous execution? It all has to do with the event loop (or mostly).
The Event Loop
Inside the web browser there is this piece of code that is used for networking. Originally, the networking code could only download one thing: the HTML page itself. When Mosaic invented the <img> tag the networking code evolved to download multiple resources. Then Netscape implemented progressive rendering of images, they had to make the networking code asynchronous so that they can draw the page before all images are loaded and update each image progressively and individually. This is the origin of the event loop.
In the heart of the browser there is an event loop that evolved from asynchronous networking code. So it's not surprising that it uses an I/O primitive as its core: select() (or something similar such as poll, epoll etc. depending on OS).
The select() function in C allows you to wait for multiple I/O operations in a single thread without needing to spawn additional threads. select() looks something like:
select (max, readlist, writelist, errlist, timeout)
To have it wait for an I/O (from a socket or disk) you'd add the file descriptor to the readlist and it will return when there is data available on any of your I/O channels. Once it returns you can continue processing the data.
The javascript interpreter saves your callback and then calls the select() function. When select() returns the interpreter figures out which callback is associated with which I/O channel and then calls it.
Conveniently, select() also allows you to specify a timeout value. By carefully managing the timeout passed to select() you can cause callbacks to be called at some time in the future. This is how setTimeout and setInterval are implemented. The interpreter keeps a list of all timeouts and calculates what it needs to pass as timeout to select(). Then when select() returns in addition to finding out if there are any callbacks that needs to be called due to an I/O operation the interpreter also checks for any expired timeouts that needs to be called.
So select() alone covers almost all the functionality necessary to implement asynchronous functions. But modern browsers also have web workers. In the case of web workers the browser spawns threads to execute javascript code asynchronously. To communicate back to the main thread the workers must still interact with the event loop (the select() function).
Node.js also spawns threads when dealing with file/disk I/O. When the I/O operation completes it communicates back with the main event loop to cause the appropriate callbacks to execute.
Hopefully this answers your question. I've always wanted to write this answer but was to busy to do so previously. If you want to know more about non-blocking I/O programming in C I suggest you take a read this: http://www.gnu.org/software/libc/manual/html_node/Waiting-for-I_002fO.html
For more information see also:
Is nodejs representing Reactor or Proactor design pattern?
Performance of NodeJS with large amount of callbacks
First of all, if something is not Async, it means it's blocking. So the javascript runner stops on that line until that function is over (that's what a readFileSync would do).
As we all know, fs is a IO library, so that kind of things take time (tell the hardware to read some files is not something done right away), so it makes a lot of sense that anything that does not require only the CPU, it's async, because it takes time, and does not need to freeze the rest of the code for waiting another piece of hardware (while the CPU is idle).
I hope this solves your doubts.
A callback is not necessarily asynchronous. Execution depends entirely on how fs.readFile decides to treat the function parameter.
In JavaScript, you can execute a function asynchronously using for example setTimeout.
Discussion and resources:
How does node.js implement non-blocking I/O?
Concurrency model and Event Loop
Wikipedia:
There are two types of callbacks, differing in how they control data flow at runtime: blocking callbacks (also known as synchronous callbacks or just callbacks) and deferred callbacks (also known as asynchronous callbacks).

Categories

Resources