What exactly are web workers and when to use them

What exactly are web workers and when to use them - javascript

I was reading up something about XMLHttpRequest (Is there any reason to use a synchronous XMLHttpRequest?) here on SO where I read on a thread from 2010 that, with the introduction of 'threads' in HTML5, developers might start to use synchronous APIs. Searching a bit on google, I found the MDN page on web workers.
I am writing Javascript and Node from about a year now (assume a beginner), and I am still to encounter something that makes use of these web workers. Maybe I need to read more code.
Now my question is, even though they seem to be very useful, why isn't it seen much in the wild? Also, what are the general use cases and guidelines when using them? Is it possible to reap the multithreaded processing benefits in Nodejs environment? If so, why are all Nodejs APIs still asynchronous?
Thank you.

A web-worker is strictly a clientside thing, so it has nothing to do with Node.js (EDIT: actually, see this module).
You might have heard that JavaScript is strictly single-threaded: if a function is doing some heavy calculation, nothing else is getting done, including animating icons, repainting the window, nothing. Thus, clientside JS should always avoid heavy computation, large loops and anything else that might usurp the thread for more than a fraction of a second.
Web-workers are the solution for that. Each web-worker is running in its own thread, and it can block as much as it wants - it won't affect the normal operation of the web page. The tradeoff is that it cannot have any access to the DOM: the fact that it doesn't affect the rendering means you cannot affect rendering with it. :) If a web-worker wants to render something, it would have to send a message to the main thread to do it.
Implementation-wise, each web-worker needs to be in a separate JS file. The reason why you don't see more of them is probably twofold: the average Joe probably doesn't know how to use them, and they are only needed when you need serious computation and don't want it to block your main thread - which is not that common in the first place, and when it is, the computation is commonly offloaded to the server (on clientside) or to separate processes (in Node.js).
Read more on HTML5 Rocks.

Related

Can node prevent an infinite loop?

Since node runs a single threaded model with event looping I wonder how node prevents the entire application to fail if you write a code like:
while(true){ doSomething()}
where doSomething is a synchronous function (a blocking piece of code)
Note that it doesn't make any sense to write a function like doSomething but nothing prevents you to make a mistake
The problem here is that, since it's single threaded, it won't allow any other parts of the application to run (for instance, a web server would stop accepting new connections) because this function would never end. In a Multi threaded environment you would loose this thread alone.
Is there anything that node can do for you to prevent these kind of problems?

I wonder how node prevents the entire application to fail if you write an infinite loop
nodejs does not prevent such an infinite loop. It will just run that loop forever or until some resource is exhausted (if the loop is consuming some resource like memory).
If node can't prevent this kind of situations, is this a design fault or there's no way to prevent these kind of problems?
I don't think most people consider it a design fault - though that's purely an opinion and different people may have a different opinion. It is a consequence of the way nodejs was designed which has many other benefits.
The only way to prevent such problems is to not write faulty code that does this. Honestly, it's not too hard to avoid writing this type of code once you're aware that it's an issue to avoid.
The problem here is that, since it's single threaded, it won't allow any other parts of the application to run (for instance, a web server would stop accepting new connections) because this function would never end. In a Multi threaded environment you would loose this thread alone
Correct. This is something you learn when coding in nodejs. I've never found it a hard thing to avoid. nodejs is an single-threaded event driven system, not a multi-threaded system. As such, you program with events, not long running loops that poll or check conditions. It is a rather straightforward concept to learn and use once you understand this is how nodejs works. It is different than some other environments. But, how to use asynchronous operations in nodejs is just something you have to learn to program in that environment. It's not avoidable and is just part of the character of nodejs. There is no way that nodejs could have the type of architecture it has without having to learn this to program in it. If you want a different architecture (for whatever personal reason), then pick a different environment, not nodejs.
The single-threadedness massively simplifies many other things (far, far fewer opportunities for race conditions) and improves scalability in some circumstances (with asynchronous I/O) vs. threaded environments. For situations where you want multiple CPUs to be applied to your problem, it is generally straightforward in node.js to either use the built-in clustering module or to fire up worker processes and feed them work. Data is often shared among multiple processes via some sort of database (either file-based or RAM-based) that handles much of the multi-process synchronization for you.

It doesn't. This seems like less of a question and more an open statement. Node will loop infinitely and all your parallel code will stop running.

it's not possible to find such issue in the node.js program itself. however a node.js script with an infinite loop will use lead to 100% cpu . so this can be monitored and you can use tools to restart the program. I don't recommand to do this, you should fix your infinite loop first, but it s sometimes hard to find the issue with large codebase. last time it happened to be I used a remote debugger to find the infinite loop.

What does it mean when they say JavaScript is single-threaded?

It says in a lot of website that JavaScript is single-threaded. When they say this, do they mean the JavaScript runtime?
I might be misunderstanding something but isn't JavaScript just a programming language and the programs you create with it should be the ones labeled as single-threaded? But maybe I'm not understanding something so can someone please explain what I am don't getting?

JavaScript, the language, is nearly silent on the topic of threading. Whether it's single- or multi-threaded is a matter of the environment in which it runs. There are single-threaded JavaScript environments, and multi-threaded JavaScript environments. The only real requirement the specification makes is that a thread have a job queue and that once a job (a unit of code to run, like a call to an event handler) is started, it runs to completion on the thread before another job from that queue is started. That is, JavaScript has run-to-completion semantics.
JavaScript on browsers isn't single-threaded and hasn't been for years. There is one main thread (the thread the UI is handled on), and any number of web worker threads. The web workers don't have direct access to the UI, they send messaegs to the UI thread, and it does the UI updates. The threads don't directly share data, they share data explicitly via messaging. This separation makes programming multi-threaded code dramatically simpler and less error-prone than if multiple threads had access to the UI and the same common data area. (Writing correct multi-threaded code where any thread can access anything at any time is hard.)
Outside the browser, JavaScript in NodeJS is run on a single thread. There was a fork of Node to add multi-threading, but I don't think it ever went anywhere.
JavaScript on the JVM (Rhino, Nashorn) has always been multi-threaded, supported by the threading facilities of the JVM.

While TJC's answer is of course correct, I don't think it addresses the issue of what people actually mean when they say that "JavaScript is single-threaded". What they're actually summarising (inaccurately) is that the run-time must behave as if has a single thread of execution, which cannot be pre-empted, and which must run to completion. The actual runtime can do anything it likes as long as the end result behaves in this way.
This means that while a JavaScript program may appear to be massively parallel with lot of threads interacting with each other, it's actually nothing of the sort. A kernel controls everything using the queue, event loop, and run-to-completion semantics (briefly) described here.
This is exactly the same problem that Hardware Description Languages (VHDL, Verilog, SystemC (though not actually a language), and so on) face. They give the illusion of massive parallelism by having a runtime kernel cycle between 'processes' which are not pre-emptible, and which must run until defined suspend points. The point of this is to ensure that models are executed in a determinate, repeatable fashion.
The difference between the HDLs and JS is that this is very well defined and fundamental for HDLs, while it's glossed over for JS. This is an extract from the SystemC LRM, which covers it briefly - it's much better defined in the VHDL LRM, for example.
Since process instances execute without interruption, only a single
process instance can be running at any one time, and no other process
instance can execute until the currently executing process instance
has yielded control to the kernel. A process shall not pre-empt or
interrupt the execution of another process. This is known as
co-routine semantics or co-operative multitasking.

Run code off the main thread?

I know it's possible to run js-ctypes off the main thread so it acts async by using ChromeWorker. But ChromeWorkers can't use XPCOM.
So I was wondering if there is a way to run other synchronous stuff off the main thread?
I was hoping to use it for things like nsIZipWriter, nsIToolkitProfileService::Lock/Unlock`, etc.

In Javascript, the only way to run off-the-main-thread code is WebWorker/ChromeWorker, which indeed does not have XPCOM access.
Actually, there used to be a way to use XPCOM from workers, and I was initially upset when it got removed again, but now I appreciate that it was the right thing to do: Much (most?) of XPCOM is not thread-safe, not even when using what appears to be self-contained instances of XPCOM classes, because in the end many of things end up calling some non-thread-safe services as part of their implementation. This leads to data and/or memory corruption and eventual crashes and data loss. Problem here was/is that it does not always corrupt memory, because there is not always a data race, and instead just causes havoc each X-times you run the code. People often used to develop and test their stuff and it happened to worked or at least looked like it worked, but once more people (aka. the users) started executing code, crashes started to pile up.
It is possible to run code off-the-main-thread in C++ code, but it has the same problem, much of XPCOM not being thread-safe, and therefore you'll need to be vary careful what you run in a different thread, i.e. only access stuff that was explicitly marked thread-safe, but even with such a marker there might be thread-safety bugs.
So, you cannot use XPCOM in another thread from JS (unless there are dedicated components doing this for you, like nsIAsyncStreamCopier) and even running XPCOM in another thread from C++ requires a lot of knowledge, skill and time to debug things if there are crashes after all.
If you really want to, then things like a zip-writer could be reasonably easy implemented in JS and run in a Worker. E.g. the zip format isn't particularly hard to implement, in particular if you don't need actual compression, and OS.File allows you to mostly conveniently do file I/O from a worker.

I think yes you can run sync stuff in an async way.
See: https://developer.mozilla.org/en-US/Add-ons/Code_snippets/Threads

mootools: I want to implement architecture similar to Big pipe in Facebook

I am developing an application in mootools. I have used Reqeust class to implement pipelining it.
I want to develop a superior method to handle client server requests. I referred the following article to understand how big pipe works in facebook.
http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919
In facebook, a javascript function is called on arrival of any server response to update data user screen. (see the screenshot)
http://img815.imageshack.us/img815/5154/facebookna.jpg
if i get a basic model of such architecture, i can start building application using that
code.
can some one please provide me such a basic model?
Till now i have designed an architecture in which response_data is stored in a global variable and then a function called to update data to user screen.(Used synchronous Request here) which is very slow.
so which method is superior 'synchronous or Asynchronous'?

Firstly, thanks for the read, it was a very interesting blog post.
You may want to look into this libary which was inspired by Facebook's BigPipe. Note: I'm not endorsing it as I've never used it, but building it yourself is not trivial.
With regards to whether synchronous and asynchronous is better, that depends. Synchronous is simpler - the dependencies are obvious, and there's no overhead. Asynchronous is only an advantage if your resources are not fully utilised, and your processing can be easily broken down into independant blocks. I can't tell what you're trying to do, so you need to make the decision yourself where the performance bottleneck actually is, and whether architecting your application such that multiple sections can be downloaded, processed and rendered in parallel will actually provide an advantage.
As an example, if you're downloading a single, massive block of data to be rendered as a table in the browser, then breaking that data into multiple parallel downloads will improve performance - at the cost of creating some queuing system to deal with out-of-order responses. On the other hand, though technically slower, batching the download into synchronous blocks so that one block is downloaded and rendered before the next one is requested, will still do wonders to perceived performance, and is a much simpler alternative.

Ruby plugin for web browser?

Am I correct that if someone wrote a Ruby plugin for a web browser and a user installed that plugin then it would be possible to replace javascript with ruby on the frontend?
Aren't there any plugins for this? Or even for using other languages than javascript on the browser side?

You could use http://ironruby.net/ in a Silverlight Plugin, but I have not a clue about how easy DOM interaction is this way.
But I BEG YOU don't do it! Please, use the Open Web Stack to solve your problems.
If you don't leave your Ruby world of comfort, you will not only hurt your users experience "WTF? Why do I need Silverlight for this page?" but you will also get stuck in your small little Ruby world without learning anything new and exciting.
It would be better for both of you, if you'd just go ahead and learn JavaScript.
Because remember: "Learning is a good thing!"

One thing is A FACT: as of 2010 JavaScript does not have a thread stopping "sleep" function (other than the one that just burns CPU cycles).
I have been working with JavaScript for at least a year before posting this comment and I have come to a conclusion that the lack of a thread-stopping sleep function is a real show-stopper for threading related code.
A consequence of the lack of the sleep function is that it's not possible to simulate a Ruby/C#/C++/etc. like threading model in JavaScript, which in turn means that it's not possible to translate any of the threading enabled languages to JavaScript, no matter, what one does, unless the JavaScript is supplemented with a (preferably non-CPU-cycle-burning) sleep function.
If one surfs around, then one can find many comments that state that the sleep function is not even necessary, that the setTimeout is sufficient, etc., but I guess that people, who state that, have not tried to implement a threading framework in JavaScript. (Think of mutexes, critical sections. I refuse to go into a discussion that the critical sections/synchronization are/is not necessary for cases, where widget content consists of multiple data components that form an "atomic whole".)
The second show-stopper for the whole DOM-model is the implementation that renders DOM elements IN THE BACKGROUND THREAD.
Here's, what happens:
In Javascript:
create_my_awsome_widget_in_DOM();
edit_my_awsome_widget_by_editing_DOM_inside_it()
if_we_are_lucky_we_reach_here_without_crashing_the_app()
As the DOM is rendered in background (read: in a separate thread), there will be a race condition between the thread that initiated the DOM editing, by making a call to the create_my_awsome_widget_in_DOM(), and the DOM rendering. If the rendering thread is "quick enough" to render the DOM before the JavasSript thread calls the edit_my_awsome_widget_by_editing_DOM_inside_it(), everything works fine, but if it's the other way around, then the JavaScript starts to modify region of the DOM that does not (yet) exist.
Essentially it means that due to the background DOM rendering the create_my_awsome_widget_in_DOM() and edit_my_awsome_widget_by_editing_DOM_inside_it() are executed in a random order and obviously the application crashes, if the edit_my_awsome_widget_by_editing_DOM_inside_it() is called before the create_my_awsome_widget_in_DOM().

There might be a way to do it indirectly. Here is the original presentation at RubyConf 2008. The topic:
This talk is about the many paths towards getting ruby running in your web browser. I'll first talk about why this is even a good idea. I'll then talk briefly about each approach I've investigated and the differing amounts of FAIL I encountered with each. Next I'll focus on the most promising contender, rubyjs, a ruby compiler which outputs javascript.
The project rubyjs still exists, but it appears to be dead. The idea probably was a little too crazy.

mruby seems like an interesting option for running ruby in a web browser:
http://qiezi.me/projects/mruby-web-irb/mruby.html
It's not a typical plugin as it does not require installation, it's javascript (compiled from C) running ruby code.

Technically that would be correct, assuming the browser/plugin also provided an extensive API to deal with the DOM and such. I am not aware of any plugins that make this possible, but it's an interesting idea.

Develop Reference

JavaScript is the programming language of the Web.