Multithreading javascript - javascript

I want to create a real thread which manages some operations in javascript.
After several search, i found 'Web Workers', 'setTimeout' or 'setInterval'.
The problem is that 'Web Workers' don't have access to global variables and therefore can't modify my global arrays directly (or i do not know how).
'setTimeout' is not really what i need.
'setInterval' sets my problem, however it is probably that after many times my operations could last longer. Therefore i am afraid that two interval overlaps.
Finally i need a infinite loop which executes a series of operations once after another. Does it exist or do I have to content myself with 'setInterval'? Is there an alternative with jQuery or other? If it is not, is what I can expect in the near future to see the developer make it available?

I'm going to assume you're talking about in a web browser.
JavaScript in web browsers has a single main UI thread, and then zero or more web worker threads. Web workers are indeed isolated from the main UI thread (and each other) and so don't have access to globals (other than their own). This is intentional, it makes both implementing the environment and using it dramatically simpler and less error-prone. (Even if that isolation weren't enforced, it's good practice for multi-threaded programming anyway.) You send messages to, and receive messages from, web workers via postMessage and the message event.
JavaScript threads (the main UI thread and any web workers) work via a thread-specific task queue (aka "job queue"): Anything that needs to happen on a JavaScript thread (the initial run of the code when a page loads, handling of an event, timer callbacks [more below]) adds a task to the queue. The JavaScript engine runs a loop: Pick up the next task, run it, pick up the next, run it, etc. When there are no tasks, the thread goes quiet waiting for a task to arrive.
setTimeout doesn't create a separate thread, it just schedules a task (a call to a callback) to be added to the task queue for that same thread after a delay (the timeout). Once the timeout occurs, the task is queued, and when the task reaches the front of the queue the thread will handle it.
setInterval does exactly what setTimeout does, but schedules a recurring callback: Once the timeout occurs, it queues the task, then sets up another timeout to queue the task again later. (The rules around the timing are a bit complex.)
If you just want something to recur, forever, at intervals, and you want that thing to have access to global variables in the main UI thread, then you either:
Use setInterval once, which will set up recurring calls back to your code, or
Use setTimeout, and every time you get your callback, use setTimeout again to schedule the next one.
From your description, it sounds as though you may be calling setInterval more than once (for instance, on each callback), which quickly bogs down the thread as you're constantly telling it to do more and more work.

The last thing is easy: webworker start their work when they get a message to (onmessage) and sit idle otherwise. (that's highly simplified, of course).
Global variables are not good for real multi-threading and even worse with the reduced thing JavaScript offers. You have to rewrite your workers to work standalone with only the information given.
Subworkers have a messaging system which you might be able to make good use of.
But the main problem with JavaScript is: once asynchronous always asynchronous. There is no way to "join" threads or a "wait4" or something similar. The only thing that can do both is the XMLHttprequest, so you can do it over a webserver but I doubt the lag that causes would do any good. BTW: synchronous XMLHttprequest is deprecated says Mozilla which also has a page listing all of the way where a synchronous request is necessary or at least very useful.

Related

Why does Javascript have much fewer blocking functions than Python

Moving from Javascript to Python, and looking at asyncio has me a little confused.
As someone who is new to the fundamental concepts of concurrency, I just assumed a superficial understanding of Javascript concurrency.
A basic understanding from using async / await in Javascript:
If we run any processes inside an async function, and await the response of the function, we are essentially waiting for the function to set a value on the Promise.
Makes total sense - when the Promise is given a value, we can also use callbacks such as .then() to handle the response. Alternatively, just await.
Whatever the underlying implementation of asynchronicity here is (for example all processes running on a single thread with an event loop), should it matter how we interface with it?
Now, I move to Python and start playing with asyncio. We have Futures, just like Promises. All of a sudden, I can't use my standard libraries, such as request.get(...), but I need to use non blocking network requests in libraries such as aiohttp.
What does blocking / non-blocking mean here? I assume it means the single thread that the event loop is on is blocked, so we cant process other functions in parallel.
So my 2 questions then are:
What causes the single thread to be blocked? For example in requests.get(...)
Why are most functions non-blocking in Javascript, but not in Python (i.e we don't need specific libraries such as aiohttp).
And what about languages like Go with their goroutines? Is it just a case because its a new language with concurrency built in from the beginning, that the concept of blocking functions don't exist. Or in Go it's not a single thread, so everything can inherently be parallelised?
Thanks :)
Event loop
Javascript, and python's async io make use of a concurrency model based on event loops.
(Note the plural because you could have multiple event loops which handle different kinds of tasks e.g. disk io, network io, ipc, parallel computations etc)
The general concept of an event loop is that, you have a number of things to do, so you put those things in a queue, and once in a while (like every nanosecond), the event loop picks an event from the queue, and runs it for a short while (maybe a millisecond or so), and either shoves it back in the queue if it hasn't completed, or waits until it yields control back to the event loop.
Now to answer some of your questions:
What does blocking / non-blocking mean here? I assume it means the
single thread that the event loop is on is blocked, so we cant process
other functions in parallel.
Blocking event loop
Blocking the event loop occurs when the event loop is running a task, and the task has either not finished or given back control to the event-loop, for a period of time longer than the event loop has scheduled it to run.
In the case of python's requests library, they make use of a synchronous http library, which doesn't respect the event loop; Therefore, running such a task in the loop will starve other tasks which are patiently waiting their turn to run, until the request is finished.
Why are most functions non-blocking in Javascript, but not in Python
(i.e we don't need specific libraries such as aiohttp).
JS
Everything in Javascript can block the event loop. The only way not to block the event loop is to make heavy use of callbacks via setTimeout. However, if care is not taken, even those callbacks can block the event loop if they run too long without yielding control back to the event loop via another setTimeout call.
(If you've never had to use setTimeout, but have used promises and async network requests in JS, then you are probably making use of a library that does. Most of the popular networking libraries used in browsers (ajax, axios, fetch, etc), are based on the popular XMLHttpRequest API, which provides async network IO.)
Python
In python, the story is slightly different: Before asyncio, there was no such thing as as "event loop". Everything must run to completion before python interpreter moves on to the next thing. This is part of what makes python very easy to learn (and dare I say, to create...). The reason for this, comes in the form of the python GIL, which in simple terms enforces a single order of execution for any python program. I encourage you to click that link, and read why the GIL exists.
And what about languages like Go with their goroutines?
Note: I am not a go programmer, but I read something
How is Go different?
The only difference between the way go handles goroutines and how python asyncio/js do their event loops, is that go makes more use of os threads to ensure that threads are scheduled fairly and make full use of the machine they run in.
While js callbacks/asyncio tasks will often run in the same thread as the event loop, goroutines are able to run in seperate OS threads and over multiple cores, thus giving them higher availability and higher parallelism. (In that case, we could almost consider goroutines to be closer to OS threads in terms of how much time they actually get to run, as compared to green threads which are bound by the amount of time the event loop's thread runs.)

Do Timers run on their Own threads in Node.js?

I am a bit confused here I know Javascript is a single-threaded language but while reading about the event loop. I got to know that in case of setTimeout or setInterval javascript calls web API provided by the browser which spawns a new thread to execute timer on that thread. but what happens in the case of node.js environment with timers how do they execute/work?
No threads are used for timers in node.js.
Timers in node.js work in conjunction with the event loop and don't use a thread. Timers in node.js are stored in a sorted linked list with the next timer to fire at the start of the linked list. Each time through the event loop, it checks to see if the first timer in the linked list has reached its time. If so, it fires that timer. If not, it runs any other events that are waiting in the event loop.
On each subsequent cycle through the event loop, it keeps checking to see if its time for the next timer or not. When a new timer is added, it is inserted into the linked list in its proper sorted order. When it fires or is cancelled, it is removed from the linked list.
If the event loop has nothing to do, it may sleep for a brief period of time, but it won't sleep past the timer for the next timer.
Other references on the topic:
How does nodejs manage timers internally
Libuv timer code in nodejs
How many concurrent setTimeouts before performance issues?
Multiple Client Requests in NodeJs
Looking for a solution between setting lots of timers or using a scheduled task queue
Node runs on a single thread but asynchronous work happens elsewhere. For example, libuv provides a pool of 4 threads that it may use, but wont if there's a better option.
The node documentation says
Node.js runs JavaScript code in the Event Loop (initialization and callbacks), and offers a Worker Pool to handle expensive tasks like file I/O. Node.js scales well, sometimes better than more heavyweight approaches like Apache. The secret to the scalability of Node.js is that it uses a small number of threads to handle many clients. If Node.js can make do with fewer threads, then it can spend more of your system's time and memory working on clients rather than on paying space and time overheads for threads (memory, context-switching). But because Node.js has only a few threads, you must structure your application to use them wisely.
A more detailed look at the event loop
No. Timers are just scheduled on the same thread and will call their callbacks when the time expires.
Depending on what OS your are on and what javascript interpreters you use they will use various APIs form poll to epoll to kqueue to overlapped I/O on Windows but in general asynchronous APIs have similar features. So let's ignore platform differences and look at a cross-platform API that exists on all OSes: the POSIX select() system call.
The select function in C looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
Where nfds is total number of file descriptors (including network sockets) you are waiting/listening on, readfds is the list/set of read file descriptors you are waiting on, writefds is the list/set of write file descriptors, exceptfds is the list/set of error file descriptors (think stderr) and timeval is the timeout for the function.
This system call blocks - yes, in non-blocking, asynchronous code there is a piece of blocking system call. The main difference between non-blocking code and blocking threaded code is that the entire program blocks in only one place, the select() function (or whatever equivalent you use).
This function only returns if any of the file descriptors have activity on them or if the timeout expires.
By managing the timeout and calculating the next value of timeval you can implement a function like setTimeout
I've written much deeper explanations of how this works in answers to the following related questions:
I know that callback function runs asynchronously, but why?
Event Queuing in NodeJS
how node.js server is better than thread based server
Node js architecture and performance
Performance of NodeJS with large amount of callbacks
Does javascript process using an elastic racetrack algorithm
Is there any other way to implement a "listening" function without an infinite while loop?
I recommend you at least browse each of the answers I wrote above because they are almost all non-duplicates. They sometimes overlap but explain different aspects of asynchronous code execution.
The gist of it is that javascript does not execute code in parallel to implement timers. It doesn't need to. Instead it waits in parallel. Once you understand the difference between running code in parallel and waiting (doing nothing) in parallel you will understand how things like node.js achieve high performance and how events work better.

Is Javascript event loop task queue overflow possible?

Is it possible to define a boundary that shouldn't be crossed for the application to scale well regarding task scheduling (over)use?
Questions :
Is there a certain cost of doing setTimeout? Let say 0.1ms or CPU time? There is certainly order of magnitude lower cost than spawning a thread in different environments. But is there any?
Is it better to avoid using setTimout for micro tasks that take like 1-2 ms ?
Is there something that doesn't like scheduling? For instance I noticed of some sort of IndexedDb starvation for write locks when scheduling Store retrieval and other things
Can DOM operations be scheduled safely ?
I'm asking because I started using Scala.js and an Rx implementation Monifu that is using scheduling at massive scale. Sometimes one line of code submits like 5 tasks to an event loop's queue so basically I'm asking myself, is there anything like task queue overflow that would slow the performance down? I'm asking this question especially when running test suites where hundreds of tasks might be enqueued per second.
Which leads to another question, is it possible to list cases when one should use RunNow/Trampoline scheduler and when Queue/Async scheduler in regards to Rx? I'm wondering about this every time I write stuff like obs.buffer(3).last.flatMap{..} which itself schedules multiple tasks
Some notes about scheduling in Monifu - Monifu tries to collapse asynchronous pipelines, so if the downstream observers are synchronous in nature, then Monifu will avoid sending tasks into the Scheduler. Monifu also does back-pressure, so it controls how many tasks are submitted into the Scheduler, therefore you cannot end up in a situation in which the browser's queue blows up.
For example, something like this ... Observable.range(0,1000).foldLeft(0)(_+_).map(_ + 10).filter(_ % 2 == 0) is only sending a single task in the scheduler for starting that initial loop, otherwise the whole pipeline is entirely synchronous if the observer is also synchronous and should not send any other tasks in that queue. And it sends the first task in the queue because it has no idea about how large that source will be and usually subscribing to a data-source is done in relation to some UI updates that you don't want to block.
There are 3 large exceptions:
you're using a data-source that doesn't support back-pressure (like a web-socket connection)
you're having a real asynchronous boundary in the receives (i.e. the observer), which can happen for example when communicating with external services and that's a real Future that you don't know when it will be complete
Some solutions possible ...
in case the server communication doesn't support back-pressure, in such a case the easiest thing to do is to modify the server to support it - also, normal HTTP requests are naturally back-pressured (i.e. it's as easy as Observable.interval(3.seconds).flatMap(_ => httpRequest("..."))
if that's not an option, Monifu has buffering strategies ... so you can have an unbounded queue, but you can also have a queue that triggers buffer overflow and closes the connection, or buffering that tries to do back-pressure, you can also start dropping new events when the buffer is full and I'm working on another buffering strategy for dropping older events - with the purpose of avoiding blown queues
if you're using "merge" on a source of sources that can be unlimited, then don't do that ;-)
if you're doing requests to external services, then try optimizing those - for example if you want to track the history of events by sending them to a web service, you can group data and do batched requests and so on
BTW - on the issue of browser-side and scheduling of tasks, one thing I'm worrying about is that Monifu does not break work efficiently enough. In other words it probably should break longer synchronous loops into smaller ones, because what's worse than suffering performance issues are latencies issues visible in the UI, because some loop is blocking your UI updates. I would rather have multiple smaller tasks submitted to the Scheduler, instead of a bigger one. In the browser you basically have cooperative multi-tasking, everything is done on the same thread, including UI updates, which means it's a very bad idea to have pieces of work that block this thread for too long.
That said, I'm now in the process of optimizing and paying more attention to the Javascript runtime. On setTimeout it is being used because it's more standard than setImmediate, however I'll do some work on these aspects.
But if you have concrete samples whose performance sucks, please communicate them, as most issues can be fixed.
Cheers,

How do I delegate some javascript (rendering the page) to have higher importance then other javascript (fetching data)

Right now my mobile web app on startup goes and talks to a few apis when it is lunched. These are intertwined with the javascript that renders the page.
When they fire at the same time page rendering gets all choppy, possibly because the rendering is using hardware acceleration, or maybe it's just normal on mobile (iphone) when running too much JS at the same time.
I want to architect the app in a way that if something like a user taking an actions to change the rendering of the page, or anything related to the UI is fired, it takes precedence over any other JS.
I want the experience to be quick and snappy even if it makes some action (talking with apis) take longer then it should.
Any idea how to achieve something like this?
Thanks!!!
A given javascript thread runs to completion without interruption as Javascript is single threaded (except for Web Workers, but I don't think that's what we're talking about here).
As such, the only way to prioritize work is to create a queue of work to be done. Do a unit of work from the queue and then decide which unit of work on the queue to do next. If all your work to be done is in the queue, then you can decide which items should be done first when it's time to grab the next unit of work off the queue.
The smaller/shorter the units of work are, the more granular your switching between tasks can be. If you use setTimeout() with a very short time between each item in the queue, that gives a chance for any UI events (like clicks or other timers) to run.
You can even fire off a bunch of ajax requests and as the responses come in, you can put the responses in the queue to be parsed/handled when there's time.
The reason for the choppy behavior is that you have a limit of how many Ajax requests can be outstanding.
You could write a javascript function / object that manages a priority queue of Ajax requests along with a callback for each. Requests are fired off in order of priority / position in the queue. New requests with high priority are executed before less important ones this way even if they may have been requested later, something like the jquery Ajax Manager plugin + priorities added.

JavaScript and single-threadedness

I always hear that JavaScript is single-threaded; that when JavaScript is executed, it's all run in the same global mosh pit, all in a single thread.
While that may be true, that single execution thread may spawn new threads, asynchronousy reqeiving data back to the main thread, correct? For example, when an XMLHttpRequest is sent, doesn't the browser create a new thread that performs the HTTP transaction, then invoke callbacks back in the main thread when the XMLHttpRequest returns?
What about timers--setTimeout and setInterval? How do those work?
Is this single-threadedness the result of the language? What has stopped JavaScript from having multi-threaded execution before the new Web Workers draft?
XMLHttpRequest, notably, does not block the current thread. However, its specifics within the runtime are not outlined in any specification. It may run in a separate thread or within the current thread, making use of non-blocking I/O.
setTimeout and setInterval set timers that, when run down to zero, add an item for execution, either a line of code of a function/callback, to the execution stack, starting the JavaScript engine if code execution has stopped. In other words, they tell the JavaScript engine to do something after it has finished doing whatever it's doing currently. To see this in action, set multiple setTimeout(s) within one method and call it.
Your JavaScript itself is single-threaded. It may, however, interact with other threads in the browser (which is frequently written with something like C and C++). This is how asynchronous XHR's work. The browser may create a new thread (or it may re-use an existing one with an event loop.)
Timers and intervals will try to make your JavaScript run later, but if you have a while(1){ ; } running don't expect a timer or interval to interrupt it.
(edit: left something out.)
The single-threadedness is largely a result of the ECMA specification. There's really no language constructs for dealing with multiple threads. It wouldn't be impossible to write a JavaScript interpreter with multiple threads and the tools to interact with them, but no one really does that. Certainly no one will do it in a web browser; it would mess everything up. (If you're doing something server-side like Node.js, you'll see that they have eschewed multithreading in the JavaScript proper in favor of a snazzy event loop, and optional multi-processing.)
See this post for a description of how the javascript event queue works, including how it's related to ajax calls.
The browser certainly uses at least one native OS thread/process to handle the actual interface to the OS to retrieve system events (mouse, keyboard, timers, network events, etc...). Whether there is more than one native OS-level thread is dependent upon the browser implementation and isn't really relevant to Javascript behavior. All events from the outside world go through the javascript event queue and no event is processed until a previous javascript thread of execution is completed and the next event is then pulled from the queue given to the javascript engine.
Browser may have other threads to do the job but your Javascript code will still be executed in one thread. Here is how it would work in practice.
In case of time out, browser will create a separate thread to wait for time out to expire or use some other mechanism to implement actual timing logic. Then timeout expires, the message will be placed on main event queue that tells the runtime to execute your handler. and that will happen as soon as message is picked up by main thread.
AJAX request would work similarly. Some browser internal thread may actually connect to server and wait for the response and once response is available place appropriate message on main event queue so main thread executes the handler.
In all cases your code will get executed by main thread. This is not different from most other UI system except that browser hides that logic from you. On other platforms you may need to deal with separate threads and ensure execution of handlers on UI thread.
Putting it more simply than talking in terms of threads, in general (for the browsers I'm aware of) there will only be one block of JavaScript executing at any given time.
When you make an asynchronous Ajax request or call setTimeout or setInterval the browser may manage them in another thread, but the actual JS code in the callbacks will not execute until some point after the currently executing block of code finishes. It just gets queued up.
A simple test to demonstrate this is if you put a piece of relatively long running code after a setTimeout but in the same block:
setTimeout("alert('Timeout!');", 5);
alert("After setTimeout; before loop");
for (var i=0, x=0; i < 2000000; i++) { x += i };
alert("After loop");
If you run the above you'll see the "After setTimeout" alert, then there'll be a pause while the loop runs, then you'll see "After loop", and only after that will you see "Timeout!" - even though clearly much longer than 5ms has passed (especially if you take a while to close the first alert).
An often-quoted reason for the single-thread is that it simplifies the browser's job of rendering the page, because you don't get the situation of lots of different threads of JavaScript all trying to update the DOM at the same time.
Javascript is a language designed to be embedded. It can and has been used in programs that execute javascript concurrently on different operating threads. There isn't much demand for an embedded language to explicitly control the creation of new threads of execution, but it could certainly be done by providing a host object with the required capabilities. The WHATWG actually includes a justification for their decision not to push a standard concurrent execution capability for browsers.

Categories

Resources