How does JavaScript's Single Threaded Model handle time consuming tasks? - javascript

This question is regarding the sinlge threaded model of JavaScript. I understand that javascript is non-block in nature cause of its ability to add a callbacks to the async event queue. But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? How does nodejs handle such a problem? And is this an unavoidable problem for developers on the front end? I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?

But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded?
Yes.
How does nodejs handle such a problem?
Node.js handles nothing. How you handle concurrency is up to you and your application. Now, Node.js does have a few tools available to you. The first thing you have to understand is that Node.js is basically V8 (JavaScript engine) with a lightweight library split between JavaScript and native code bolted on. While your JavaScript code is single-threaded by nature, the native code can and does create threads to handle your work.
For example, when you ask Node.js to load a file from disk, your request is passed off to native code where a thread pool is used, and your data is loaded from disk. Once your request is made, your JavaScript code continues on. This is the meaning of "non-blocking" in the context of Node.js. Once that file on disk is loaded, the native code passes it off to the Node.js JavaScript library, which then executes your callback with the appropriate parameters. Your code continued to run while the background work was going on, but when your callback is dealing with that data, other JavaScript code is indeed blocked from running.
This architecture allows you to get much of the benefit of multithreaded code without having to actually write any multithreaded code, keeping your application straightforward.
I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?
My philosophy is always to use what you need. It's true that if a request comes in to your application and you have a lot of JavaScript processing of data that is blocking, other requests will not be processed during this time. Remember though that if you are doing this sort of work, you are likely CPU bound anyway and doing double the work will cause both requests to take longer.
In practice, the majority of web applications are IO bound. They shuffle data from a database, reformat it, and send it out over the network. The part where they handle data is actually not all that time consuming when compared to the amount of time the application is simply waiting to hear back from the upstream data source. It is in these applications where Node.js really shines.
Finally, remember that you can always spawn child processes to better distribute the load. If your application is that rare application where you do 99% of your work load in CPU-bound JavaScript and you have a box with many CPUs and/or cores, split the load across several processes.

Your question is a very large one, so I am just going to focus on one part.
if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? (...) Is it really because long tasks in javascript will actually block other tasks?
Non-blocking is a beautiful thing XD
The best practices include:
Braking every function down into its minimum functional form.
Keep CallBacks asynchronies, THIS is an excellent post on the use of CallBacks
Avoid stacking operations, (Like nested Loops)
Use setTimeout() to brake up potentially blocking code
And many other things, Node.JS is the gold standard of none blocking so its worth a look.
--
--
setTimeout() is one of the most important functions in no-blocking code
So lets say you make a clock function that looks like this:
function setTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
}
while(true){setTime();}
Its quite problematic, because this code will happily loop its self until the end of time. No other function will ever be called. You want to brake up the operation so other things can run.
function startTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
setTimeout(startTime(),1000);
}
'setTimeout();' brakes up the loop and executes it every 1-ish seconds. An infinite loop is a bit of an extreme example. The point is 'setTimeout();' is great at braking up large operation chains into smaller ones, making everything more manageable.

Related

Writing custom, true Asynchronous functions in Javascript/Node

How do the NodeJS built in functions achieve their asynchronicity?
Am I able to write my own custom asynchronous functions that execute outside of the main thread? Or do I have to leverage the built in functions?
Just a side note, true asynchronous doesn't really mean anything. But we can assume you mean parallelism?.
Now depending on what your doing, you might find there is little to no benefit in using threads in node. Take for example: nodes file system, as long as you don't use the sync versions, it's going to automatically run multiple requests in parallel, because node is just going to pass these requests to worker threads.
It's the reason when people say Node is single threaded, it's actually incorrect, it's just the JS engine that is. You can even prove this by looking at the number of threads a nodeJs process takes using your process monitor of choice.
So then you might ask, so why do we have worker threads in node?. Well the V8 JS engine that node uses is pretty fast these days, so lets say you wanted to calculate PI to a million digits using JS, you could do this in the main thread without blocking. But it would be a shame not to use those extra CPU cores that modern PC's have and keep the main thread doing other things while PI is been calculated inside another thread.
So what about File IO in node, would this benefit been in a worker thread?.. Well this depends on what you do with the result of the file-io, if you was just reading and then writing blocks of data, then no there would be no benefit, but if say you was reading a file and then doing some heavy calculations on these files with Javascript (eg. some custom image compression etc), then again a worker thread would help.
So in a nutshell, worker threads are great when you need to use Javascript for some heavy calculations, using them for just simple IO may in fact slow things down, due to IPC overheads.
You don't mention in your question what your trying to run in parallel, so it's hard to say if doing so would be of benefit.
Javascript is mono-thread, if you want to create 'thread' you can use https://nodejs.org/api/worker_threads.html.
But you may have heard about async function and promises in javascript, async function return a promise by default and promise are NOT thread. You can create async function like this :
async function toto() {
return 0;
}
toto().then((d) => console.log(d));
console.log('hello');
Here you will display hello then 0
but remember that even the .then() will be executed after it's a promise so that not running in parallel, it will just be executed later.

Do Timers run on their Own threads in Node.js?

I am a bit confused here I know Javascript is a single-threaded language but while reading about the event loop. I got to know that in case of setTimeout or setInterval javascript calls web API provided by the browser which spawns a new thread to execute timer on that thread. but what happens in the case of node.js environment with timers how do they execute/work?
No threads are used for timers in node.js.
Timers in node.js work in conjunction with the event loop and don't use a thread. Timers in node.js are stored in a sorted linked list with the next timer to fire at the start of the linked list. Each time through the event loop, it checks to see if the first timer in the linked list has reached its time. If so, it fires that timer. If not, it runs any other events that are waiting in the event loop.
On each subsequent cycle through the event loop, it keeps checking to see if its time for the next timer or not. When a new timer is added, it is inserted into the linked list in its proper sorted order. When it fires or is cancelled, it is removed from the linked list.
If the event loop has nothing to do, it may sleep for a brief period of time, but it won't sleep past the timer for the next timer.
Other references on the topic:
How does nodejs manage timers internally
Libuv timer code in nodejs
How many concurrent setTimeouts before performance issues?
Multiple Client Requests in NodeJs
Looking for a solution between setting lots of timers or using a scheduled task queue
Node runs on a single thread but asynchronous work happens elsewhere. For example, libuv provides a pool of 4 threads that it may use, but wont if there's a better option.
The node documentation says
Node.js runs JavaScript code in the Event Loop (initialization and callbacks), and offers a Worker Pool to handle expensive tasks like file I/O. Node.js scales well, sometimes better than more heavyweight approaches like Apache. The secret to the scalability of Node.js is that it uses a small number of threads to handle many clients. If Node.js can make do with fewer threads, then it can spend more of your system's time and memory working on clients rather than on paying space and time overheads for threads (memory, context-switching). But because Node.js has only a few threads, you must structure your application to use them wisely.
A more detailed look at the event loop
No. Timers are just scheduled on the same thread and will call their callbacks when the time expires.
Depending on what OS your are on and what javascript interpreters you use they will use various APIs form poll to epoll to kqueue to overlapped I/O on Windows but in general asynchronous APIs have similar features. So let's ignore platform differences and look at a cross-platform API that exists on all OSes: the POSIX select() system call.
The select function in C looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
Where nfds is total number of file descriptors (including network sockets) you are waiting/listening on, readfds is the list/set of read file descriptors you are waiting on, writefds is the list/set of write file descriptors, exceptfds is the list/set of error file descriptors (think stderr) and timeval is the timeout for the function.
This system call blocks - yes, in non-blocking, asynchronous code there is a piece of blocking system call. The main difference between non-blocking code and blocking threaded code is that the entire program blocks in only one place, the select() function (or whatever equivalent you use).
This function only returns if any of the file descriptors have activity on them or if the timeout expires.
By managing the timeout and calculating the next value of timeval you can implement a function like setTimeout
I've written much deeper explanations of how this works in answers to the following related questions:
I know that callback function runs asynchronously, but why?
Event Queuing in NodeJS
how node.js server is better than thread based server
Node js architecture and performance
Performance of NodeJS with large amount of callbacks
Does javascript process using an elastic racetrack algorithm
Is there any other way to implement a "listening" function without an infinite while loop?
I recommend you at least browse each of the answers I wrote above because they are almost all non-duplicates. They sometimes overlap but explain different aspects of asynchronous code execution.
The gist of it is that javascript does not execute code in parallel to implement timers. It doesn't need to. Instead it waits in parallel. Once you understand the difference between running code in parallel and waiting (doing nothing) in parallel you will understand how things like node.js achieve high performance and how events work better.

Multithreading javascript

I want to create a real thread which manages some operations in javascript.
After several search, i found 'Web Workers', 'setTimeout' or 'setInterval'.
The problem is that 'Web Workers' don't have access to global variables and therefore can't modify my global arrays directly (or i do not know how).
'setTimeout' is not really what i need.
'setInterval' sets my problem, however it is probably that after many times my operations could last longer. Therefore i am afraid that two interval overlaps.
Finally i need a infinite loop which executes a series of operations once after another. Does it exist or do I have to content myself with 'setInterval'? Is there an alternative with jQuery or other? If it is not, is what I can expect in the near future to see the developer make it available?
I'm going to assume you're talking about in a web browser.
JavaScript in web browsers has a single main UI thread, and then zero or more web worker threads. Web workers are indeed isolated from the main UI thread (and each other) and so don't have access to globals (other than their own). This is intentional, it makes both implementing the environment and using it dramatically simpler and less error-prone. (Even if that isolation weren't enforced, it's good practice for multi-threaded programming anyway.) You send messages to, and receive messages from, web workers via postMessage and the message event.
JavaScript threads (the main UI thread and any web workers) work via a thread-specific task queue (aka "job queue"): Anything that needs to happen on a JavaScript thread (the initial run of the code when a page loads, handling of an event, timer callbacks [more below]) adds a task to the queue. The JavaScript engine runs a loop: Pick up the next task, run it, pick up the next, run it, etc. When there are no tasks, the thread goes quiet waiting for a task to arrive.
setTimeout doesn't create a separate thread, it just schedules a task (a call to a callback) to be added to the task queue for that same thread after a delay (the timeout). Once the timeout occurs, the task is queued, and when the task reaches the front of the queue the thread will handle it.
setInterval does exactly what setTimeout does, but schedules a recurring callback: Once the timeout occurs, it queues the task, then sets up another timeout to queue the task again later. (The rules around the timing are a bit complex.)
If you just want something to recur, forever, at intervals, and you want that thing to have access to global variables in the main UI thread, then you either:
Use setInterval once, which will set up recurring calls back to your code, or
Use setTimeout, and every time you get your callback, use setTimeout again to schedule the next one.
From your description, it sounds as though you may be calling setInterval more than once (for instance, on each callback), which quickly bogs down the thread as you're constantly telling it to do more and more work.
The last thing is easy: webworker start their work when they get a message to (onmessage) and sit idle otherwise. (that's highly simplified, of course).
Global variables are not good for real multi-threading and even worse with the reduced thing JavaScript offers. You have to rewrite your workers to work standalone with only the information given.
Subworkers have a messaging system which you might be able to make good use of.
But the main problem with JavaScript is: once asynchronous always asynchronous. There is no way to "join" threads or a "wait4" or something similar. The only thing that can do both is the XMLHttprequest, so you can do it over a webserver but I doubt the lag that causes would do any good. BTW: synchronous XMLHttprequest is deprecated says Mozilla which also has a page listing all of the way where a synchronous request is necessary or at least very useful.

JavaScript and single-threadedness

I always hear that JavaScript is single-threaded; that when JavaScript is executed, it's all run in the same global mosh pit, all in a single thread.
While that may be true, that single execution thread may spawn new threads, asynchronousy reqeiving data back to the main thread, correct? For example, when an XMLHttpRequest is sent, doesn't the browser create a new thread that performs the HTTP transaction, then invoke callbacks back in the main thread when the XMLHttpRequest returns?
What about timers--setTimeout and setInterval? How do those work?
Is this single-threadedness the result of the language? What has stopped JavaScript from having multi-threaded execution before the new Web Workers draft?
XMLHttpRequest, notably, does not block the current thread. However, its specifics within the runtime are not outlined in any specification. It may run in a separate thread or within the current thread, making use of non-blocking I/O.
setTimeout and setInterval set timers that, when run down to zero, add an item for execution, either a line of code of a function/callback, to the execution stack, starting the JavaScript engine if code execution has stopped. In other words, they tell the JavaScript engine to do something after it has finished doing whatever it's doing currently. To see this in action, set multiple setTimeout(s) within one method and call it.
Your JavaScript itself is single-threaded. It may, however, interact with other threads in the browser (which is frequently written with something like C and C++). This is how asynchronous XHR's work. The browser may create a new thread (or it may re-use an existing one with an event loop.)
Timers and intervals will try to make your JavaScript run later, but if you have a while(1){ ; } running don't expect a timer or interval to interrupt it.
(edit: left something out.)
The single-threadedness is largely a result of the ECMA specification. There's really no language constructs for dealing with multiple threads. It wouldn't be impossible to write a JavaScript interpreter with multiple threads and the tools to interact with them, but no one really does that. Certainly no one will do it in a web browser; it would mess everything up. (If you're doing something server-side like Node.js, you'll see that they have eschewed multithreading in the JavaScript proper in favor of a snazzy event loop, and optional multi-processing.)
See this post for a description of how the javascript event queue works, including how it's related to ajax calls.
The browser certainly uses at least one native OS thread/process to handle the actual interface to the OS to retrieve system events (mouse, keyboard, timers, network events, etc...). Whether there is more than one native OS-level thread is dependent upon the browser implementation and isn't really relevant to Javascript behavior. All events from the outside world go through the javascript event queue and no event is processed until a previous javascript thread of execution is completed and the next event is then pulled from the queue given to the javascript engine.
Browser may have other threads to do the job but your Javascript code will still be executed in one thread. Here is how it would work in practice.
In case of time out, browser will create a separate thread to wait for time out to expire or use some other mechanism to implement actual timing logic. Then timeout expires, the message will be placed on main event queue that tells the runtime to execute your handler. and that will happen as soon as message is picked up by main thread.
AJAX request would work similarly. Some browser internal thread may actually connect to server and wait for the response and once response is available place appropriate message on main event queue so main thread executes the handler.
In all cases your code will get executed by main thread. This is not different from most other UI system except that browser hides that logic from you. On other platforms you may need to deal with separate threads and ensure execution of handlers on UI thread.
Putting it more simply than talking in terms of threads, in general (for the browsers I'm aware of) there will only be one block of JavaScript executing at any given time.
When you make an asynchronous Ajax request or call setTimeout or setInterval the browser may manage them in another thread, but the actual JS code in the callbacks will not execute until some point after the currently executing block of code finishes. It just gets queued up.
A simple test to demonstrate this is if you put a piece of relatively long running code after a setTimeout but in the same block:
setTimeout("alert('Timeout!');", 5);
alert("After setTimeout; before loop");
for (var i=0, x=0; i < 2000000; i++) { x += i };
alert("After loop");
If you run the above you'll see the "After setTimeout" alert, then there'll be a pause while the loop runs, then you'll see "After loop", and only after that will you see "Timeout!" - even though clearly much longer than 5ms has passed (especially if you take a while to close the first alert).
An often-quoted reason for the single-thread is that it simplifies the browser's job of rendering the page, because you don't get the situation of lots of different threads of JavaScript all trying to update the DOM at the same time.
Javascript is a language designed to be embedded. It can and has been used in programs that execute javascript concurrently on different operating threads. There isn't much demand for an embedded language to explicitly control the creation of new threads of execution, but it could certainly be done by providing a host object with the required capabilities. The WHATWG actually includes a justification for their decision not to push a standard concurrent execution capability for browsers.

In Node.js, If i am writing a long running function should I be using setTimeout

or something else to queue up the rest of my function? and use callbacks or does node handle that automatically?
I imagine that I would need to start my code and if there are other things that need to occur I should be giving up my functions control to give other events control. Is this the case? Or can i be stingy and node will cut off my function when I have used enough time?
Thanks.
If your long-running function does a lot of I/O just make sure that you do this in a non-blocking way. This is how node.js achieves concurrency even though it only has a single thread: As soon as any task needs to wait for something, another task gets the CPU.
If your long-running function needs uninterrupted CPU time (or the I/O cannot be made asynchronously) , then you probably need to fork out a separate process, because otherwise every one else will have to wait until you are done.
Or can i be stingy and node will cut off my function when I have used enough time?
No. This is totally cooperative multi-tasking. Node cannot preempt you.
You should put your long running function or the code which takes long to execute into separate process because it can, for example, block other incoming requests while this code/function is executing. From node.js website:
But what about multiple-processor concurrency? Aren't threads
necessary to scale programs to multi-core computers? You can start new
processes via child_process.fork() these other processes will be
scheduled in parallel.
I would suggest to watch these articles/presentations in order to get a bigger picture on this topic:
Understanding the node.js event loop
Understanding event loops and writing great code for Node.js
YUI Theater — Tom Hughes-Croucher: “How to Stop Writing Spaghetti Code” (45 min.)

Categories

Resources