Is Javascript event loop task queue overflow possible? - javascript

Is it possible to define a boundary that shouldn't be crossed for the application to scale well regarding task scheduling (over)use?
Questions :
Is there a certain cost of doing setTimeout? Let say 0.1ms or CPU time? There is certainly order of magnitude lower cost than spawning a thread in different environments. But is there any?
Is it better to avoid using setTimout for micro tasks that take like 1-2 ms ?
Is there something that doesn't like scheduling? For instance I noticed of some sort of IndexedDb starvation for write locks when scheduling Store retrieval and other things
Can DOM operations be scheduled safely ?
I'm asking because I started using Scala.js and an Rx implementation Monifu that is using scheduling at massive scale. Sometimes one line of code submits like 5 tasks to an event loop's queue so basically I'm asking myself, is there anything like task queue overflow that would slow the performance down? I'm asking this question especially when running test suites where hundreds of tasks might be enqueued per second.
Which leads to another question, is it possible to list cases when one should use RunNow/Trampoline scheduler and when Queue/Async scheduler in regards to Rx? I'm wondering about this every time I write stuff like obs.buffer(3).last.flatMap{..} which itself schedules multiple tasks

Some notes about scheduling in Monifu - Monifu tries to collapse asynchronous pipelines, so if the downstream observers are synchronous in nature, then Monifu will avoid sending tasks into the Scheduler. Monifu also does back-pressure, so it controls how many tasks are submitted into the Scheduler, therefore you cannot end up in a situation in which the browser's queue blows up.
For example, something like this ... Observable.range(0,1000).foldLeft(0)(_+_).map(_ + 10).filter(_ % 2 == 0) is only sending a single task in the scheduler for starting that initial loop, otherwise the whole pipeline is entirely synchronous if the observer is also synchronous and should not send any other tasks in that queue. And it sends the first task in the queue because it has no idea about how large that source will be and usually subscribing to a data-source is done in relation to some UI updates that you don't want to block.
There are 3 large exceptions:
you're using a data-source that doesn't support back-pressure (like a web-socket connection)
you're having a real asynchronous boundary in the receives (i.e. the observer), which can happen for example when communicating with external services and that's a real Future that you don't know when it will be complete
Some solutions possible ...
in case the server communication doesn't support back-pressure, in such a case the easiest thing to do is to modify the server to support it - also, normal HTTP requests are naturally back-pressured (i.e. it's as easy as Observable.interval(3.seconds).flatMap(_ => httpRequest("..."))
if that's not an option, Monifu has buffering strategies ... so you can have an unbounded queue, but you can also have a queue that triggers buffer overflow and closes the connection, or buffering that tries to do back-pressure, you can also start dropping new events when the buffer is full and I'm working on another buffering strategy for dropping older events - with the purpose of avoiding blown queues
if you're using "merge" on a source of sources that can be unlimited, then don't do that ;-)
if you're doing requests to external services, then try optimizing those - for example if you want to track the history of events by sending them to a web service, you can group data and do batched requests and so on
BTW - on the issue of browser-side and scheduling of tasks, one thing I'm worrying about is that Monifu does not break work efficiently enough. In other words it probably should break longer synchronous loops into smaller ones, because what's worse than suffering performance issues are latencies issues visible in the UI, because some loop is blocking your UI updates. I would rather have multiple smaller tasks submitted to the Scheduler, instead of a bigger one. In the browser you basically have cooperative multi-tasking, everything is done on the same thread, including UI updates, which means it's a very bad idea to have pieces of work that block this thread for too long.
That said, I'm now in the process of optimizing and paying more attention to the Javascript runtime. On setTimeout it is being used because it's more standard than setImmediate, however I'll do some work on these aspects.
But if you have concrete samples whose performance sucks, please communicate them, as most issues can be fixed.
Cheers,

Related

Do Timers run on their Own threads in Node.js?

I am a bit confused here I know Javascript is a single-threaded language but while reading about the event loop. I got to know that in case of setTimeout or setInterval javascript calls web API provided by the browser which spawns a new thread to execute timer on that thread. but what happens in the case of node.js environment with timers how do they execute/work?
No threads are used for timers in node.js.
Timers in node.js work in conjunction with the event loop and don't use a thread. Timers in node.js are stored in a sorted linked list with the next timer to fire at the start of the linked list. Each time through the event loop, it checks to see if the first timer in the linked list has reached its time. If so, it fires that timer. If not, it runs any other events that are waiting in the event loop.
On each subsequent cycle through the event loop, it keeps checking to see if its time for the next timer or not. When a new timer is added, it is inserted into the linked list in its proper sorted order. When it fires or is cancelled, it is removed from the linked list.
If the event loop has nothing to do, it may sleep for a brief period of time, but it won't sleep past the timer for the next timer.
Other references on the topic:
How does nodejs manage timers internally
Libuv timer code in nodejs
How many concurrent setTimeouts before performance issues?
Multiple Client Requests in NodeJs
Looking for a solution between setting lots of timers or using a scheduled task queue
Node runs on a single thread but asynchronous work happens elsewhere. For example, libuv provides a pool of 4 threads that it may use, but wont if there's a better option.
The node documentation says
Node.js runs JavaScript code in the Event Loop (initialization and callbacks), and offers a Worker Pool to handle expensive tasks like file I/O. Node.js scales well, sometimes better than more heavyweight approaches like Apache. The secret to the scalability of Node.js is that it uses a small number of threads to handle many clients. If Node.js can make do with fewer threads, then it can spend more of your system's time and memory working on clients rather than on paying space and time overheads for threads (memory, context-switching). But because Node.js has only a few threads, you must structure your application to use them wisely.
A more detailed look at the event loop
No. Timers are just scheduled on the same thread and will call their callbacks when the time expires.
Depending on what OS your are on and what javascript interpreters you use they will use various APIs form poll to epoll to kqueue to overlapped I/O on Windows but in general asynchronous APIs have similar features. So let's ignore platform differences and look at a cross-platform API that exists on all OSes: the POSIX select() system call.
The select function in C looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
Where nfds is total number of file descriptors (including network sockets) you are waiting/listening on, readfds is the list/set of read file descriptors you are waiting on, writefds is the list/set of write file descriptors, exceptfds is the list/set of error file descriptors (think stderr) and timeval is the timeout for the function.
This system call blocks - yes, in non-blocking, asynchronous code there is a piece of blocking system call. The main difference between non-blocking code and blocking threaded code is that the entire program blocks in only one place, the select() function (or whatever equivalent you use).
This function only returns if any of the file descriptors have activity on them or if the timeout expires.
By managing the timeout and calculating the next value of timeval you can implement a function like setTimeout
I've written much deeper explanations of how this works in answers to the following related questions:
I know that callback function runs asynchronously, but why?
Event Queuing in NodeJS
how node.js server is better than thread based server
Node js architecture and performance
Performance of NodeJS with large amount of callbacks
Does javascript process using an elastic racetrack algorithm
Is there any other way to implement a "listening" function without an infinite while loop?
I recommend you at least browse each of the answers I wrote above because they are almost all non-duplicates. They sometimes overlap but explain different aspects of asynchronous code execution.
The gist of it is that javascript does not execute code in parallel to implement timers. It doesn't need to. Instead it waits in parallel. Once you understand the difference between running code in parallel and waiting (doing nothing) in parallel you will understand how things like node.js achieve high performance and how events work better.

Multithreading javascript

I want to create a real thread which manages some operations in javascript.
After several search, i found 'Web Workers', 'setTimeout' or 'setInterval'.
The problem is that 'Web Workers' don't have access to global variables and therefore can't modify my global arrays directly (or i do not know how).
'setTimeout' is not really what i need.
'setInterval' sets my problem, however it is probably that after many times my operations could last longer. Therefore i am afraid that two interval overlaps.
Finally i need a infinite loop which executes a series of operations once after another. Does it exist or do I have to content myself with 'setInterval'? Is there an alternative with jQuery or other? If it is not, is what I can expect in the near future to see the developer make it available?
I'm going to assume you're talking about in a web browser.
JavaScript in web browsers has a single main UI thread, and then zero or more web worker threads. Web workers are indeed isolated from the main UI thread (and each other) and so don't have access to globals (other than their own). This is intentional, it makes both implementing the environment and using it dramatically simpler and less error-prone. (Even if that isolation weren't enforced, it's good practice for multi-threaded programming anyway.) You send messages to, and receive messages from, web workers via postMessage and the message event.
JavaScript threads (the main UI thread and any web workers) work via a thread-specific task queue (aka "job queue"): Anything that needs to happen on a JavaScript thread (the initial run of the code when a page loads, handling of an event, timer callbacks [more below]) adds a task to the queue. The JavaScript engine runs a loop: Pick up the next task, run it, pick up the next, run it, etc. When there are no tasks, the thread goes quiet waiting for a task to arrive.
setTimeout doesn't create a separate thread, it just schedules a task (a call to a callback) to be added to the task queue for that same thread after a delay (the timeout). Once the timeout occurs, the task is queued, and when the task reaches the front of the queue the thread will handle it.
setInterval does exactly what setTimeout does, but schedules a recurring callback: Once the timeout occurs, it queues the task, then sets up another timeout to queue the task again later. (The rules around the timing are a bit complex.)
If you just want something to recur, forever, at intervals, and you want that thing to have access to global variables in the main UI thread, then you either:
Use setInterval once, which will set up recurring calls back to your code, or
Use setTimeout, and every time you get your callback, use setTimeout again to schedule the next one.
From your description, it sounds as though you may be calling setInterval more than once (for instance, on each callback), which quickly bogs down the thread as you're constantly telling it to do more and more work.
The last thing is easy: webworker start their work when they get a message to (onmessage) and sit idle otherwise. (that's highly simplified, of course).
Global variables are not good for real multi-threading and even worse with the reduced thing JavaScript offers. You have to rewrite your workers to work standalone with only the information given.
Subworkers have a messaging system which you might be able to make good use of.
But the main problem with JavaScript is: once asynchronous always asynchronous. There is no way to "join" threads or a "wait4" or something similar. The only thing that can do both is the XMLHttprequest, so you can do it over a webserver but I doubt the lag that causes would do any good. BTW: synchronous XMLHttprequest is deprecated says Mozilla which also has a page listing all of the way where a synchronous request is necessary or at least very useful.

Is node.js event loop like an ajax call?

I am confused with node.js' advantages over other tech. I've been reading this article : http://www.toptal.com/nodejs/why-the-hell-would-i-use-node-js and this How to decide when to use Node.js? to familiarize myself with it and have left me confused.
I am familiar with cpu intensive task like the computation of the Fibonacci series but that's where my understanding ends.
For example, I have a Rest API that does all the computation or recommendation and is housed on a different server from the machine running node, then node.js won't have any trouble with having to deal with cpu intensive task. Just call the api then tell the client that your request is acknowledged.
I can't shake this thinking about comparing node.js with a simple ajax call to send the request from a form to the server, display a ticker then show the result. I am guessing that node.js is a web server, doing lot's of "ajax" type calls and handling concurrent connections.
Are my assumptions correct?
Is it also correct to assume that retrieving data from a database is an io operation but creating a complex report from that data a cpu intensive one?
You are right about handling many ajax requests, however thats true in worker based model also (php/python workers threads)
Main difference for event based system there will be only one worker doing all sorts of computation part of code (such as filtering data, adding processing etc). When it calls io ops like read from file, or db etc. node doesn't have control over that, instead of waiting on that to finish it puts a call back in the queue and moves on with next processing in queue (if any).
For analogy think of pizza outlet, if only one person is taking order and handing over the order to kitchen, once its ready cutting it, packing and giving it to customer. Where ever there is wait, he just moves on to next task. This is what node does, that person wont hang-on next to kitchen until pizza gets cooked.
In case of worker based approach think of a bank teller and you see couple of them (may be 5 or so) they take every kind of request but they dont switch between customer / request.
Refer to these resources for a deeper understanding of how JavaScript event loop works.
https://www.youtube.com/watch?v=8aGhZQkoFbQ
http://latentflip.com/loupe/
I can't answer all your doubts, but would like you to have some clarity over AJAX.
AJAX - Asynchronous JavaScript + XML is a technique to make requests to a server. Nodejs server knows how to handle such requests, but saying that is the only thing it can do is absolutely wrong. Nodejs is single threaded, hence async. Whether it is good for CPU intensive tasks, I would say why not, unless you want to solve issues in a multithreaded fashion.

Why is node.js asynchronous?

Nobody has actually asked this (from all the 'suggestions' I'm getting and also from searching before I asked here).
So why is node.js asynchronous?
From what I have deduced after some research:
Languages like PHP and Python are scripting languages (I could be wrong about the actual languages that are scripting languages) whilst JavaScript isn't. (I suppose this derives from the fact that JS doesn't compile?)
Node.js runs on a single thread whilst scripting languages use multiple threads.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Maybe the answer is found somewhere stated above, but I'm still not sure.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
PS. I know some of you will ask "why would you want to make JS synchronous?" in your answers, but the truth is that I don't. I'm just asking these types of questions because I'm sure there are more people out there than just myself that have thought about such questions.
Node.js runs on a single thread whilst scripting languages use multiple threads.
Not technically. Node.js uses several threads, but only one execution thread. The background threads are for dealing with IO to make all of the asynchronous goodness work. Dealing with threads efficiently is a royal pain, so the next best option is to run in an event loop so code can run while background threads are blocked on IO.
Asynchronous means stateless and that the connection is persistent whilst synchronous is the (almost) opposite.
Not necessarily. You can preserve state in an asynchronous system pretty easily. For example, in Javascript, you can use bind() to bind a this to a function, thereby preserving state explicitly when the function returns:
function State() {
// make sure that whenever doStuff is called it maintains its state
this.doStuff = this.doStuff.bind(this);
}
State.prototype.doStuff = function () {
};
Asynchronous means not waiting for an operation to finish, but registering a listener instead. This happens all the time in other languages, notably anything that needs to accept input from the user. For example, in a Java GUI, you don't block waiting for the user to press a button, but you register a listener with the GUI.
My second and last question related to this topic is this:
Could JavaScript be made into a synchronous language?
Technically, all languages are synchronous, even Javascript. However, Javascript works a lot better in an asynchronous design because it was designed to be single threaded.
Basically there are two types of programs:
CPU bound- the only way to make it go faster is to get more CPU time
IO bound- spends a lot of time waiting for data, so a faster processor won't matter
Video games, number crunchers and compilers are CPU bound, whereas web servers and GUIs are generally IO bound. Javascript is relatively slow (because of how complex it is), so it wouldn't be able to compete in a CPU bound scenario (trust me, I've written my fair share of CPU-bound Javascript).
Instead of coding in terms of classes and objects, Javascript lends itself to coding in terms of simple functions that can be strung together. This works very well in asynchronous design, because algorithms can be written to process data incrementally as it comes in. IO (especially network IO) is very slow, so there's quite a bit of time between packets of data.
Example
Let's suppose you have 1000 live connections, each delivering a packet every millisecond, and processing each packet takes 1 microsecond (very reasonable). Let's also assume each connection sends 5 packets.
In a single-threaded, synchronous application, each connection will be handled in series. The total time taken is (5*1 + 5*.001) * 1000 milliseconds, or ~5005 milliseconds.
In a single-threaded, asynchronous application, each connection will be handled in parallel. Since every packet takes 1 millisecond, and processing each packet takes .001 milliseconds, we can process every connection's packet between packets, so our formula becomes: 1000*.001 + 5*1 milliseconds, or ~6 milliseconds.
The traditional solution to this problem was to create more threads. This solved the IO problem, but then when the number of connections rose, so did the memory usage (threads cost lots of memory) and CPU usage (multiplexing 100 threads onto 1 core is harder than 1 thread on 1 core).
However, there are downsides. If your web application happens to also need to do some heavy number crunching, you're SOL because while you're crunching numbers, connections need to wait. Threading solves this because the OS can swap out your CPU-intensive task when data is ready for a thread waiting on IO. Also, node.js is bound to a single core, so you can't take advantage of your multi-core processor unless you spin up multiple instances and proxy requests.
Javascript does not compile into anything. It's "evaluated" at runtime, just like PHP & Ruby. Therefore it is a scripting language just like PHP/Ruby. (it's official name is actually ECMAScript).
The 'model' that Node adheres to is a bit different than PHP/Ruby. Node.js uses an 'event loop' (the single thread) that has the one goal of taking network requests and handling them very quickly, and if for any reason it encounters an operation that takes a while (API request, database query -- basically anything involving I.O. (input/output)) it passes that off to a background 'worker' thread and goes off to do something else while the worker thread waits for the long task to complete. When that happens the main 'event loop' will take the results and continue deal with them.
PHP/Ruby following a threading model. Essentially, for every incoming network request, the application server spins up an isloated thread or process to handle the request. This does not scale tremendously well and Node's approach is cited as one of its core strengths compared to this model.
Asynchronous means stateless and that the connection is persistent
whilst synchronous is the (almost) opposite.
No. Synchronous instructions are completed in a natural order, from first to last. Asynchronous instructions mean that if a step in the flow of a program takes a relatively long time, the program will continue executing operations and simply return to this operation when complete.
Could JavaScript be made into a synchronous language?
Certain operations in JavaScript are synchronous. Others are asynchronous.
For example:
Blocking operations:
for(var k = 0; k < 1; k = k - 1;){
alert('this will quickly get annoying and the loop will block execution')
alert('this is blocked and will never happen because the above loop is infinite');
Asynchronous:
jQuery.get('/foo', function (result) { alert('This will occur 2nd, asynchronously'); });
alert('This will occur 1st. The above operation was skipped over and execution continued until the above operation completes.');
Could JavaScript be made into a synchronous language?
Javascript is not an "asynchronous language"; rather, node.js has a lot of asynchronous APIs. Asynchronous-ness is a property of the API and not the language. The ease with which functions can be created and passed around in javascript makes it convenient to pass callback functions, which is one way to handle control flow in an asynchronous API, but there's nothing inherently asynchronous about javascript. Javascript can easily support synchronous APIs.
Why is node.js asynchronous?
Node.js favors asynchronous APIs because it is single-threaded. This allows it to efficiently manage its own resources, but requires that long-running operations be non-blocking, and asynchronous APIs are a way to allow for control of flow with lots of non-blocking operations.

How do I delegate some javascript (rendering the page) to have higher importance then other javascript (fetching data)

Right now my mobile web app on startup goes and talks to a few apis when it is lunched. These are intertwined with the javascript that renders the page.
When they fire at the same time page rendering gets all choppy, possibly because the rendering is using hardware acceleration, or maybe it's just normal on mobile (iphone) when running too much JS at the same time.
I want to architect the app in a way that if something like a user taking an actions to change the rendering of the page, or anything related to the UI is fired, it takes precedence over any other JS.
I want the experience to be quick and snappy even if it makes some action (talking with apis) take longer then it should.
Any idea how to achieve something like this?
Thanks!!!
A given javascript thread runs to completion without interruption as Javascript is single threaded (except for Web Workers, but I don't think that's what we're talking about here).
As such, the only way to prioritize work is to create a queue of work to be done. Do a unit of work from the queue and then decide which unit of work on the queue to do next. If all your work to be done is in the queue, then you can decide which items should be done first when it's time to grab the next unit of work off the queue.
The smaller/shorter the units of work are, the more granular your switching between tasks can be. If you use setTimeout() with a very short time between each item in the queue, that gives a chance for any UI events (like clicks or other timers) to run.
You can even fire off a bunch of ajax requests and as the responses come in, you can put the responses in the queue to be parsed/handled when there's time.
The reason for the choppy behavior is that you have a limit of how many Ajax requests can be outstanding.
You could write a javascript function / object that manages a priority queue of Ajax requests along with a callback for each. Requests are fired off in order of priority / position in the queue. New requests with high priority are executed before less important ones this way even if they may have been requested later, something like the jquery Ajax Manager plugin + priorities added.

Categories

Resources