How to keep a reactive UI while doing heavy computation with WebAssembly? - javascript

I use a C++ library doing heavy image processing at loading, compiled with emscripten and embedded in an angular application.
This code freezes the UI for a few seconds, which is never nice for the user.
I guess the two options here are
to split the heavy computation into several asynchronous calls
to use threads (WebWorkers)
Although I'm not sure how feasible each one is, depending on the computation code.
What are the advantages/inconvenients of each? What's the usual way to deal with heavy computation with JS/WASM?

I've done both.
Just asynchronous would still be on the main thread so it doesn't have any benefit in your situation. However, if you can split the processing into smaller chunks and feed them over time to a requestIdleCallback, it can be pretty effective. The downside of that is that you have no real control of when (if ever) it will finish. Depending on how critical the output is it might not be the best for you. The upside is that you do have access to all of the APIs and not just the ones available to web workers. Here's an example of implementation of "splitting into smaller chunks and feeding it to a requestIdleCallback", and here's how you use it.
Using a web worker thread has the big upside of completely freeing up the main thread, which will allow you to get the result faster. The downsides are that you only have access to the web worker APIs, you have to find a way to communicate your output back to the main thread (if necessary), and it will still eat up a lot of CPU (meaning that while the main thread is technically "free" it might still be slowed down). If you can split your task into smaller chunks, you could even spawn more than one thread and get a smoother / faster user experience.
main thread
might freeze the page (or alternatively take a long time / never finish)
gives you access to all of the APIs
easier to start with, a little more complicated one you get into idle-until-urgent designs
passing data to / from won't be a bottleneck
web worker
frees up the main thread (but might still slow down the client as a whole)
not all APIs are available
pretty straightforward once you get a handle on communicating w/ the main thread
gets complicated if you try to spawn more workers to work in parallel
if your output isn't straightforward to structure as a SharedArray, passing big amounts of data might be an issue / a bottleneck

Related

What exactly are web workers and when to use them

I was reading up something about XMLHttpRequest (Is there any reason to use a synchronous XMLHttpRequest?) here on SO where I read on a thread from 2010 that, with the introduction of 'threads' in HTML5, developers might start to use synchronous APIs. Searching a bit on google, I found the MDN page on web workers.
I am writing Javascript and Node from about a year now (assume a beginner), and I am still to encounter something that makes use of these web workers. Maybe I need to read more code.
Now my question is, even though they seem to be very useful, why isn't it seen much in the wild? Also, what are the general use cases and guidelines when using them? Is it possible to reap the multithreaded processing benefits in Nodejs environment? If so, why are all Nodejs APIs still asynchronous?
Thank you.
A web-worker is strictly a clientside thing, so it has nothing to do with Node.js (EDIT: actually, see this module).
You might have heard that JavaScript is strictly single-threaded: if a function is doing some heavy calculation, nothing else is getting done, including animating icons, repainting the window, nothing. Thus, clientside JS should always avoid heavy computation, large loops and anything else that might usurp the thread for more than a fraction of a second.
Web-workers are the solution for that. Each web-worker is running in its own thread, and it can block as much as it wants - it won't affect the normal operation of the web page. The tradeoff is that it cannot have any access to the DOM: the fact that it doesn't affect the rendering means you cannot affect rendering with it. :) If a web-worker wants to render something, it would have to send a message to the main thread to do it.
Implementation-wise, each web-worker needs to be in a separate JS file. The reason why you don't see more of them is probably twofold: the average Joe probably doesn't know how to use them, and they are only needed when you need serious computation and don't want it to block your main thread - which is not that common in the first place, and when it is, the computation is commonly offloaded to the server (on clientside) or to separate processes (in Node.js).
Read more on HTML5 Rocks.

Why is node.js not suitable for heavy CPU apps?

Node.js servers are very efficient concerning I/O and large number of client connection. But why is node.js not suitable for heavy CPU apps in comparison to a traditional multithreading server?
I read it here Felix Baumgarten
Node is, despite its asynchronous event model, by nature single threaded. When you launch a Node process, you are running a single process with a single thread on a single core. So your code will not be executed in parallel, only I/O operations are parallel because they are executed asynchronous. As such, long running CPU tasks will block the whole server and are usually a bad idea.
Given that you just start a Node process like that, it is possible to have multiple Node processes running in parallel though. That way you could still benefit from your multithreading architecture, although a single Node process does not. You would just need to have some load balancer in front that distributes requests along all your Node processes.
Another option would be to have the CPU work in separate processes and make Node interact with those instead of doing the work itself.
Related things to read:
Node.js and CPU intensive requests
Understanding the node.js event loop
A simple Node.js server is single-threaded, meaning that any operation that takes a long time to execute will block the rest of your program from running. Node.js apps manage to maintain a high level of concurrency by working as a series of events. When an event handler is waiting for something to happen (such as reading from the database), it tells Node to go ahead and process another event in the meantime. But since a single thread can only execute one instruction at a time, this approach can't save you from a function that needs to keep actively executing for a long time. In a multithreaded architecture, even if one function takes a long time to compute the result, other threads can still process other requests — and as long as you have a core that is not fully used at the time, there's a good chance they can do it about as quickly as if no other requests were running at all.
In order to deal with this, production Node.js apps that expect to hog a lot of CPU will usually be run in clusters. This means that instead of having several threads in one program's memory space, you run several instances of the same program under the control of one "master" instance. Each process is single-threaded, but since you have several of them, you end up gaining the benefits of multiple threads.
Node is flawless if you are having asynchronous tasks because java script will run these things by worker pool. But if you run CPU intense tasks (where you heavily use CPU ) Ex you have a billion users and you want to sort those people on name. Its quit a Intense tasks, and this is synchronous which will block other code from running.
So its not a good idea to use node for these kind of applications. Technically you can find alternatives to address those kind of tasks. The above example is better addressed in a Db. then passing that result is great.
In the same way avoid Intense task and keep your CPU cool for better performance
You can have a look at this package, the-computer, which may help you do some cpu intensive works in a single instance of node.js app in a simple way.
Definitely it is not as effective as raw c++ libs, but it can cover most general computing cases, keeping you in node.js garden while allowing you leverage the cores of the cup.
Node.js runs JavaScript code in a single thread, which means that your code can only do one task at a time. However, Node.js itself is multithreaded and provides hidden threads through the libuv library, which handles I/O operations like reading files from a disk or network requests. Through the use of hidden threads, Node.js provides asynchronous methods that allow your code to make I/O requests without blocking the main thread.
Although Node.js has hidden threads, you cannot use them to offload CPU-intensive tasks, such as complex calculations, image resizing, or video compression. Since JavaScript is single-threaded when a CPU-intensive task runs, it blocks the main thread and no other code executes until the task completes. Without using other threads, the only way to speed up a CPU-bound task is to increase the processor speed.
💡 Node.js introduced the worker-threads module, which allows you to create threads and execute multiple JavaScript tasks in parallel. Once a thread finishes a task, it sends a message to the main thread that contains the result of the operation so that it can be used with other parts of the code. The advantage of using worker threads is that CPU-bound tasks don’t block the main thread and you can divide and distribute a task to multiple workers to optimize it.
ref: https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

Advantages of Web Workers and how they were achieved before?

I have read about Web Workers on http://www.whatwg.org/specs/web-apps/current-work/multipage/workers.html and I think I understand their purpose, but I am wondering if one of the main purposes of web workers, namely "allows long tasks to be executed without yielding to keep the page responsive." could be already achieved without web workers?
Like Registering Callbaks also allow long tasks to be executed, and only interrupt when they are ready, wtihout blocking, isn't that the same?
Callbacks allow you to manage concurrency. That is handling tasks. Not always in an easy way.
Not only do webworkers allow you to do concurrency in an easier way, they also let you have parallelism, that is tasks really running in parallel : they don't necessarily block each other and they don't block the UI.
In order to have a long javascript based running task in your browser before web worker, you had to micro-manage it to cut it in small parts in order to allow the UI to keep responsive. And of course having more than one long running task was more complex.
We know web browsers increased a lot over the past few years and it is primarily because of lot of work done on its engines, ex- V8 (Google), Chakra (Microsoft). The JavaScript so far runs in a single thread. The problem with single-threaded architecture is that it blocks the code and UI becomes unresponsive in case of running the complex script. There are various ways to solve this problem:
Offload work to the server, but to make apps faster fat client is preferred
Use asynchronous calls, but many complex ecosystem of async calls & promises could
lead into callback hell
Leverage multi-threading. Interesting!
Web Workers solve this issue by providing the capability of multi-threading in JavaScript.

Should Node.js be used for intensive processing?

Let's say I'm building a 3-tier web site, with Mongo DB on the back end and some really lightweight javascript in the browser (let's say just validation on forms, maybe a couple of fancy controls which fire off some AJAX requests).
I need to choose a technology for the 'middle' tier (we could segment this into sub-tiers but that detail isn't the focus here, just the overall technology choice), where I want to crunch some raw data coming out of the DB, and render this into some HTML which I push to the browser. A fairly typical thin-client web architecture.
My safe choice would be to just implement this middle tier in Java, using some libraries like Jongo to talk to the Mongo DB and maybe Jackson to marshal/unmarshal JSON to talk to my fancy controls when they make AJAX requests. And some Java templating framework for rendering my HTML on the server.
However, I'm really intrigued by the idea of throwing all this out the window and using Node.js for this middle tier, for the following reasons:
I like javascript (the good parts), and let's say for this application's business logic it would be more expressive than Java.
It's javascript everywhere. No need to switch between languages, and indeed the OO and functional paradigms, when working anywhere on the stack. There's no translation plumbing between the tiers, JSON is supported natively everywhere.
I can reuse validation logic on the client and server.
If in the future I decide to do the HTML rendering client-side in the browser, I can reuse the existing templates with something like Backbone with a pretty minimal refactoring / retesting effort.
If you're at this point and like Node, all the above will seem obvious. So I should choose Node right?
BUT... this is where it falls down for me: as we all know Node is based around a single-threaded async I/O web server model. This is great for my scalability and performance in terms of servicing requests for data, but what about my business logic? What about my template rendering? Won't this stuff cause a huge bottleneck for all requests on the single thread?
Two obvious solutions come to mind, but neither of them sits right:
Keep the 'blocking' business logic in there and just use a cluster of Node instances and a load balancer, to service requests in true parallel. Ok great, so why isn't Node just multi-threaded in the first place? Or was this always the idea, to Keep It Simple Stupid and avoid the possibility of multi-threaded complexity in the base case, making the programmer do the extra setup work on top of this if multi-core processing power is desired?
Keep a single node instance, and keep it non-blocking by just calling out to some java implementation of my business logic running on some other, muti-threaded, app server. Ok, this option completely nullifies every pro I listed of using Node (in fact it adds complexity over just using Java), other than the possible gains in performance and scalability for CRUD requests to the DB.
Which leads me finally to the point of my question - am I missing some huge important piece of the Node puzzle, have I just got my facts completely wrong, or is Node just unsuitable for crunching business logic on the server? Put another way, is Node just useful for sitting over a database and servicing many CRUD requests in a more performant and scalable way than some other implementation which blocks on I/O? And you have to do all your business logic in some tier below, or even client-side, to maintain any reasonable levels of performance and scalability?
Considering all the buzz over Node, I'd rather hoped it brought more to the table than this. I'd love to be convinced otherwise!
On any given system you have N cpus available (1-64, or whatever N happens to be). In any CPU-intensive application, you're going to be stuck with a throughput of N cpus. There's no magical way to fix that by adding more than N threads/processes/whatever. Either your code has to be more efficient, or you need more CPUs. More threads won't help.
One of the little-appreciated facts about multiple-CPU performance is that if you need to run N+1 CPU-intensive operations at the same time, your throughput per CPU goes down quite a bit. A CPU-intensive process tends to hang on to that CPU for a very long time before giving it up, starving the other tasks pretty badly. In the majority of cases, it is blocking I/O and the concomitant task-switching that makes modern OS multitasking work as well as it does. If more of our every-day common tasks were CPU-bound, we would discover we needed a lot more CPUs in our machines than we do currently.
The nice thing that Node.js brings to the server party efficiency-wise is a thorough use of each thread. Ideally, you end up with less task switching. This isn't a huge win, but having N threads handling N*C connections asynchronously is going to have a performance advantage over N*C blocking threads running on the same number of CPUs. But the bottom line on CPUs remains the same: if you have more than N worth of actual CPU work to be done, you're going to feel some pain.
The last time I looked at the Node.js API there was a way to launch a server with one listener plus one worker thread per CPU. If you can do that, I would be inclined to go with Node.js provided a few caveats are met:
The Javascript-everywhere approach buys you some simplicity. For something complicated, I would be concerned about the asynchronous programming style making things harder rather than easier.
The template-processing and other CPU-intensive tasks aren't appreciably slower in Node.js than your other language/platform choices.
The database drivers are reliable.
There is one downside that I can see:
If a thread crashes, you lose all of the connections being serviced by that thread.
Finally, try to remember that programmer time is generally more expensive than servers or bandwidth.

mootools: I want to implement architecture similar to Big pipe in Facebook

I am developing an application in mootools. I have used Reqeust class to implement pipelining it.
I want to develop a superior method to handle client server requests. I referred the following article to understand how big pipe works in facebook.
http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919
In facebook, a javascript function is called on arrival of any server response to update data user screen. (see the screenshot)
http://img815.imageshack.us/img815/5154/facebookna.jpg
if i get a basic model of such architecture, i can start building application using that
code.
can some one please provide me such a basic model?
Till now i have designed an architecture in which response_data is stored in a global variable and then a function called to update data to user screen.(Used synchronous Request here) which is very slow.
so which method is superior 'synchronous or Asynchronous'?
Firstly, thanks for the read, it was a very interesting blog post.
You may want to look into this libary which was inspired by Facebook's BigPipe. Note: I'm not endorsing it as I've never used it, but building it yourself is not trivial.
With regards to whether synchronous and asynchronous is better, that depends. Synchronous is simpler - the dependencies are obvious, and there's no overhead. Asynchronous is only an advantage if your resources are not fully utilised, and your processing can be easily broken down into independant blocks. I can't tell what you're trying to do, so you need to make the decision yourself where the performance bottleneck actually is, and whether architecting your application such that multiple sections can be downloaded, processed and rendered in parallel will actually provide an advantage.
As an example, if you're downloading a single, massive block of data to be rendered as a table in the browser, then breaking that data into multiple parallel downloads will improve performance - at the cost of creating some queuing system to deal with out-of-order responses. On the other hand, though technically slower, batching the download into synchronous blocks so that one block is downloaded and rendered before the next one is requested, will still do wonders to perceived performance, and is a much simpler alternative.

Categories

Resources