I am currently on a big Javascript project with a lot of libraries.
I would like to have some part of this project to run on separate thread. There is already something inJavascript doing that : the web workers.
Though, the web workers can't access the window object, and a lot of the libraries use it. Is there a way to automatically change the call to the window object (in the libraries used for the web workers), into a message sent to the parent thread ?
Then, the parent thread would perform the action that the worker want and send back the result to the worker.
Is it possible to do that ? And id yes, do you have any idea how ?
Thank you !
I'm afraid there's no real solution to this. What you'd probably want is a special object in your worker, which, at every property access, passes the execution to the dispatching thread - which handles the request using the original window object.
To do this, you would need some sort of catch-all accessor method which would run whenever a property is referenced. Sadly, there's no such thing in Javascript, see this detailed discussion (especially T.J. Crowder's answer): Is it possible to implement dynamic getters/setters in JavaScript?
ECMAScript 6 introduces a new mechanism called Proxy (currently supported in FF and IE12 (go figure!)), which would enable you to do these dynamic property lookups, technically - but I feel there's a more fundamental problem with your idea: you're aiming to turn a local call into a message across the boundaries of single threaded environments.
Message passing from and to the worker threads must be asynchronous (as a javascript "thread" cannot be interrupted until it yields), which would mean that even if you do manage to set up a proxy like that, it'd be effectively turning a usually synchronous operation (ie. a property access) into an asynchronous one, which is a pretty big issue, especially if you're looking for a drop-in replacement in order to use some existing libraries.
Related
I was thinking about this today and I realized I don't have a clear picture here.
Here are some statements I think to be true (please correct me if I'm wrong):
the DOM is a collection of interfaces specified by W3C.
when parsing HTML source code, the browser creates a DOM tree which has nodes that implement DOM interfaces.
the ECMAScript spec has no reference of browser host objects (DOM, BOM, HTML5 APIs etc.).
how the DOM is actually implemented depends on browser internals and is probably different among most of them.
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
I am curious about what happens behind the scenes when I call document.getElementById('foo'). Does the call get delegated to browser native code by the interpreter or does the browser have JS implementations of all host objects? Do you know about any optimizations they do in regard to this?
I read this overview of browser internals but it didn't mention anything about this. I will look through the Chrome and FF source when I have time, but I thought about asking here first. :)
All of your bullet points are correct, except:
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
should be "...and translate it to native code". SpiderMonkey (the JS engine in Firefox) worked as a bytecode interpreter for a long time before the current JS speed arms race.
On Mozilla's JS-to-DOM bridge:
The host objects are typically implemented in C++, though there is an experiment underway to implement DOM in JS. So when a web page calls document.getElementById('foo'), the actual work of retrieving the element by its ID is done in a C++ method, as hsivonen noted.
The specific way the underlying C++ implementation gets called depends on the API and also changed over time (note that I'm not involved in the development, so might be wrong about some details, here's a blog post by jst, who was actually involved in creating much of this code):
At the lowest level every JS engine provides APIs to define host objects. For example, the browser can call JS_DefineFunctions (as demonstrated in the SpiderMonkey User Guide) to let the engine know that whenever script calls a function with the specified name, a provided C callback should be called. Same for other aspects of the host objects (e.g. enumeration, property getters/setters, etc.)
For the core ECMAScript functionality and in some tricky DOM cases the JS engine/the browser uses these APIs directly to define host objects and their behaviors, but it requires a lot of common boilerplate code for e.g. checking parameter types, converting them to the appropriate C++ types, error handling etc.
For reasons I won't go into, let's say historically, Mozilla made heavy use of XPCOM for many of its objects, including much of the DOM. One feature of XPCOM is its binding to JS called XPConnect. Among other things, XPConnect can take an interface definition in IDL (such as nsIDOMDocument; or more precisely its compiled representation), expose an object with the specified properties to the script, and later, when a script calls getElementById, perform the necessary parameter checks/conversions and route the call directly to a C++ method (nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn))
The way XPConnect worked was quite inefficient: it registered generic functions as callbacks to be executed when a script accesses a host object, and these generic functions figured out what they needed to do in every particular case dynamically. This post about quickstubs walks you through one example.
"Quick stubs" mentioned in the previous link is a way to optimize JS->C++ calls time by trading some code size for it: instead of always using generic C++ functions that know how to make any kind of call, the specialized code is automatically generated at the Firefox build time for a pre-defined list of "hot" calls.
Later on the JIT (tracemonkey at that time) was taught to generate the code calling C++ methods as part of the native code generated for "hot" paths in JS. I'm not sure how the newer JITs (jaegermonkey) work in this regard.
With "paris bindings" the objects are exposed to webpage JS without any reliance on XPConnect, instead generating all the necessary glue JSClass code based on WebIDL (instead of XPCOM-era IDL). See also posts by developers who worked on this: jst and khuey. Also see How is the web-exposed DOM implemented?
I'm fuzzy on details of the three last points in particular, so take it with a grain of salt.
The most recent improvements are listed as dependencies of bug 622298, but I don't follow them closely.
JS calls to DOM methods like getElementById cause the JS engine to call into the C++ code that implements the DOM. For example, in Firefox, the call ends up in nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn).
As you can see, Firefox maintains a hashtable that maps ids to elements in C++ as an optimization in this case, so it doesn't walk the whole DOM tree looking for the id.
The DOM is implemented as a language-independent library pretty much in all major browser implementations, which means it's in a different library from the Javascript engine. For example in IE, the JS engine is implemented in jscript.dll while the DOM is implemented in mshtml.dll. Safari has Nitro(JS) and WebCore(DOM). Chrome has V8(JS) and WebCore(DOM), and Firefox has SpiderMonkey/TraceMonkey(JS) and Gecko(DOM).
What this means is that anytime your JS has to access the DOM, it has to reach over to the DOM library - which is inherently slow because of all the marshaling that has to take place. An analogy that has been used is 2 pieces of land connected by a toll bridge, any time you touch the DOM, you must cross over the bridge and cross back - paying a performance toll.
References
Video: Building High Performance Web Applications and Sites
Book: High Performance Javascript (Chapter 3 on the DOM)
I have a node server for loading certain scripts that can be written by anyone. I understand that when I fire up my Node server, modules load for the first time in the global scope. When one requests a page, it gets loaded by the "start server" callback; and I can use all the already loaded modules per request. But I haven't encountered a script where global variables get changed during request time and affects every single other instance in the process (maybe there is).
My question is, how safe is it, in terms of server crashing, to alter the global data? Also, suppose that that I have written a proper locking mechanism that will "pause" the server for all instances for a very short amount of time until the proper data is loaded.
Node.js is single threaded. So it's impossible for two separate requests to alter a global variable simultaneously. So in theory, it's safe.
However, if you're doing stuff like keep user A's data temporarily in a variable and then when user A later submits another request use that variable be aware that user B may make a request in between potentially altering user A's data.
For such cases keeping global values in arrays or objects is one way of separating user data. Another strategy is to use a closure which is a common practice in callback-intensive or event/promise oriented libraries such as socket.io.
When it comes to multithreading or multiprocessing, message passing style API like node's built-in cluster module has the same guarantees of not clobbering globals since each process have its own global. There are several multithreading modules that's implemented similarly - one node instance per thread. However, shared memory style APIs can't make such guarantees since each thread is now a real OS thread which may preempt each other and clobber each others memory. So if you ever decide to try out one of the multithreading modules, be aware of this issue.
It is possible to implement fake shared memory using message passing though - sort of like how we do it with ajax or socket.io. So I'd personally avoid shared memory style multithreading unless I really, really need to cooperatively work on a very large dataset that would bog down message passing architectures.
Then again, remember, the web is a giant message passing architecture with the messages being HTML and XML and json. So message passing scales to Google size.
I have a web application based on apache. php, js and jquery. All works fine.
On the client side there is a small library in JS/jquery, offering some generic list handling methods. In the past I used callbacks to handle those few issues where those methods had to behave slightly different. That way I can reuse methods like list handling, dialog handling and stuff for different part of the application. However recently the number of callbacks I had to hand through when stepping into the library grew and I am trying a redesign:
Instead of specifying all callbacks as function arguments I created a central catalog object in the library. Each module of the application registers its own variant of callbacks into that catalog upon initialization. At runtime the methods lookup the required callbacks in that catalog instead of expecting it specified in their list of arguments. This cleans up things quite a lot.
However I have one thing I still cannot get rid of: I require a single argument (I call it context, mode might be another term) that is used by the methods to lookup the required callback in the catalog. This context has to be handed through to all methods. Certainly better than all sorts of different callbacks being specified everywhere, but I wonder if I can get rid of that last one to.
But where do I specify that context, if not as method argument ? I am pretty new to JS and jquery, so I failed to find an approach for this. Apparently I don't want to use global vars, and to be frank I doubt that I can simply store a context in a single variable anyway, since because of all the event handlers and external influences methods might be called in different contexts at the same time, or at least interleaving. So I guess I need something closer to a function stack. Maybe I can simply push a context object to the stack and read that from within the layers of the library that need to know ? The object would be removed when I leave the library again. Certainly other approaches exist too.
Here are so many experienced coders how certainly can give a newbie like a short hint, a starting point that leads to an idea, how to implement this. How is such thing 'usually' done ?
I tried round a while, exploring the arguments.callee.caller hierarchy. I thought maybe I could set a prototype member inside a calling function, then, when execution steps further down I could simply traverse the call stack upwards until I find a caller holding such property and use that value as context.
However I also saw the ongoing discussions that reveal two things: 1.) arguments.callee appears to be depreciated and 2.) it appears to be really expensive. So that is a no go.
I also read about the Function.caller alternative (which appears not to be depreciated and much more efficient, however until now I failed to explore that trail...
As written currently passing the context/mode down simply works by specifying an additional argument in the function calls. It carries a unique string that is used as a key when consulting the catalog. So something like this (not copied, but written as primitive example):
<!-- callbacks -->
callback_inner_task_base:function(arg1,arg2){
// do something with args
}
callback_inner_task_spec:function(arg1,arg2){
// do something with args
}
<!-- catalog -->
Catalog.Callback:function(context,slot){
// some plausibility checks...
return Catalog[context][slot];
}
Catalog.base.slot=callback_inner_task_base;
Catalog.spec.slot=callback_inner_task_spec;
<!-- callee -->
do_something:function(arg1,arg2,context){
...
// callback as taken from the catalog
Catalog.Callback(callback,'inner_task')(arg1,arg2);
...
}
<!-- caller -->
init:function(...){
...
do_something('thing-1',thing-2','base');
do_something('thing-1',thing-2','spec');
...
}
But where do I specify that context, if not as method argument ?
Use a function property, such as Catalog.Callback.context
Use a monad
I've been reading about web workers in HTML5, but I know JavaScript is single-threaded.
How are web workers doing multi-threaded work then? or how are they simulating it if it's not truly multi-threaded?
As several comments have already pointed out, Workers really are multi-threaded.
Some points which may help clarify your thinking:
JavaScript is a language, it doesn't define a threading model, it's not necessarily single threaded
Most browsers have historically been single threaded (though that is changing rapidly: IE, Chrome, Firefox), and most JavaScript implementations occur in browsers
Web Workers are not part of JavaScript, they are a browser feature which can be accessed through JavaScript
A bit late, but I just asked myself the same question and I came up with the following answer:
Javascript in browsers is always single-threaded, and a fundamental consequence is that "concurrent" access to variables (the principal headache of multithreaded programming) is actually not concurrent; this is true with the exception of webworkers, which are actually run in separate threads and concurrent access to variables must be dealt with in a somewhat explicit way.
I am not a JavaScript ninja, but I too was convinced that JavaScript in browser is provided as a single threaded process, without paying much attention to whether it was true or to the rationale behind this belief.
A simple fact that supports this assumption is that when programming in JavaScript you don't have to care about concurrent access to shared variables. Every developer, without even thinking of the problem, writes code as if every access to a variable is consistent.
In other words, you don't need to worry about the so called Memory model.
Actually there is no need of looking at WebWorkers to involve parallel processing in JavaScript. Think of an (asynchronous) AJAX request. And think how carelessly you would handle concurrent access to variables:
var counter = 0;
function asyncAddCounter() {
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4) {
counter++;
}
};
xhttp.open("GET", "/a/remote/resource", true);
xhttp.send();
}
asyncAddCounter();
counter++;
What is the value of counter at the end of the process? It is 2.
It doesn't matter that it is read and written "concurrently", it will never result in a 1. This means that access to counter is always consistent.
If two threads where really accessing the value concurrently, they both could start off by reading 0 and both write 1 in the end.
In browsers, the actual data-fetching of a remote resource is hidden to the developer, and its inner workings are outside the scope of the JavaScript API (what the browser let's you control in terms of JavaScript instructions). As far as the developer is concerned, the result of the network request is processed by the main thread.
In short, the actual carrying out of the request is not visible, but the invocation of the callback (handling the result by custom JavaScript code) is executed by the main thread.
Possibly, if it wasn't for the webworkers, the term "multithreading" wouldn't ever enter the Javascript world.
The execution of the request and the asynchronous invocation of the callback is actually achieved by using event loops, not multithreading. This is true for several browsers and obviously for Node.js. The following are some references, in some cases a bit obsolete, but I guess that the main idea is still retained nowadays.
Firefox: Concurrency model and Event Loop - JavaScript | MDN
Chrome uses libevent in certain OS.
IE: Understanding the Event Model (Internet Explorer)
This fact is the reason why JavaScript is said to be Event-driven but not multithreaded.
Notice that JavaScript thus allows for asynchronous idioms, but not parallel execution of JavaScript code (outside webworkers). The term asynchronous just denotes the fact that the result of two instructions might be processed in scrambled order.
As for WebWorkers, they are JavaScript APIs that give a developer control over a multithreaded process.
As such, they provide explicit ways to handle concurrent access to shared memory (read and write values in different threads), and this is done, among the others, in the following ways:
you push data to a web worker (which means that the new thread reads data) by structured clone: The structured clone algorithm - Web APIs | MDN. Essentially there is no "shared" variable, instead the new thread is given a fresh copy of the object.
you push data to a web worker by transferring ownership of the value: Transferable - Web APIs | MDN. This means that the just one thread can read its value at any time.
as for the results returned by the web workers (how they "write"), the main thread access the results when prompted to do so (for instance with the instruction thisWorker.onmessage = function(e) {console.log('Message ' + e.data + ' received from worker');}). It must be by means of the usual Event Loop, I must suppose.
the main thread and the web worker access a truly shared memory, the SharedArrayBuffer, which is thread-safely accessed using the Atomic functions. I found this clearly exposed in this article: JavaScript: From Workers to Shared Memory
note: webworkers cannot access the DOM, which is truly shared!
You spawn a .js file as a "worker", and it runs processes in a separate thread. You can pass JSON data back and forth between it and the "main" thread. Workers don't have access to certain things like the DOM, though.
So if, say, you wanted to solve complicated math problems, you could let the user enter things into the browser, pass those variables off to the worker, let it do the computation in the background while in the main thread you let the user do other things, or show a progress bar or something, and then when the worker's done, it passes the answer back, and you print it to the page. You could even do multiple problems asynchronously and pass back the answers out of order as they finish. Pretty neat!
The browser kicks of a thread with the javascript you want to execute. So its a real thread, with this web workers thing, your js is no longer single-threaded.
The answer that claimed that "JavaScript is a language, it doesn't define a threading model, it's not necessarily single-threaded" is directly copy-pasted from a medium article... and it confuses without solving the doubt.
It's not necessarily single-threaded, like all other languages. YES... BUT
Javascript is a LANGUAGE meant for Single-threaded programming, and that is the beauty of it and makes it simple and easy to implement.
It is designed around a single Call stack.
Maybe in the future, with new implementations, it will become a Language for multi-threaded programming... but for now, Mehhhhhh.
The Node V8 is still single-threaded, yet it achieves multi-threaded capabilities by creating worker threads on LIBUV which is written in C++.
Same way, even though Javascript is not meant for Multithreading you can achieve limited multithreading by using Browser APIs.
Every time you open a TAB on a browser, it creates a new thread, and the process is the same with web workers.
It works internally BUT does not have access to any window objects.
Yes, People may call it Multithreaded if it makes em Happy,
But in 2021 the Answer is
"JS is meant for Single-threaded programming,(or a single-threaded language) but limited multi-threading can be achieved by using Browser APIs such as Web Workers"
Actually the main confusion, I think here, comes that people are finding clever ways to do things concurrently. If you think about it JavaScript is clearly not multithreaded and yet we have ways to do things in parallel, so what is going on?
Asking the right question is what will bring the answer here. Who is responsible for the threads? There is one answer above, saying that JS is just a language, not a threading model. Completely true!JavaScript has nothing to do with it. The responsibility falls on V8. Check this link for more information -> https://v8.dev/
So V8 is allowing a single thread per JS Context, which means, no matter how hard you try, spawning a new thread is simply impossible. Yet people spawn so-called workers and we get confused. For this to be answered, I ask you the following. Is it possible to maybe start 2 V8 and both of them to interpret some JS code? Precisely the solution to our problem. Workers communicate with messages because their context is different. They are other things that don't know anything about our context, therefore they need some information that comes in the form of a message.
As we are all aware, JavaScript is single-threaded: all code is queued and executed in a sequence.
Using Web Workers, we can run JavaScript processes concurrently (or at least, as close to concurrently as this language allows). The primary benefit of this approach is to handle the manipulation of data in background threads without interfering with the user-interface.
Using web worker:
Web Workers allow you to run JavaScript in parallel on a web page, without blocking the user interface.
Web workers executes in separate thread
Need to host all the worker code in separate file
They aren’t automatically garbage collected, So you need to control them.
To run worker use worker.postMessage(“”);
To stop worker there are two methods terminate() from caller code
and close() from Worker itself
Instantiating a worker will cost some memory.
Web Workers run in an isolated thread. As a result, the code that they execute needs to be contained in a separate file. But before we do that, the first thing to do is create a new Worker object in your main page. The constructor takes the name of the worker script:
var worker = new Worker('task.js');
If the specified file exists, the browser will spawn a new worker thread, which is downloaded asynchronously. The worker will not begin until the file has completely downloaded and executed. If the path to your worker returns an 404, the worker will fail silently.
After creating the worker, start it by calling the postMessage() method:
worker.postMessage(); // Start the worker.
Communicating with a Worker via Message Passing
Communication between a work and its parent page is done using an event model and the postMessage() method. Depending on your browser/version, postMessage() can accept either a string or JSON object as its single argument. The latest versions of the modern browsers support passing a JSON object.
Below is a example of using a string to pass 'Hello World' to a worker in doWork.js. The worker simply returns the message that is passed to it.
Main script:
var worker = new Worker('doWork.js');
worker.addEventListener('message', function(e) {
console.log('Worker said: ', e.data);
}, false);
worker.postMessage('Hello World'); // Send data to our worker.
doWork.js (the worker):
self.addEventListener('message', function(e) {
self.postMessage(e.data); // Send data back to main script
}, false);
When postMessage() is called from the main page, our worker handles that message by defining an onmessage handler for the message event. The message payload (in this case 'Hello World') is accessible in Event.data. This example demonstrates that postMessage() is also your means for passing data back to the main thread. Convenient!
References:
http://www.tothenew.com/blog/multi-threading-in-javascript-using-web-workers/
https://www.html5rocks.com/en/tutorials/workers/basics/
https://dzone.com/articles/easily-parallelize-jobs-using-0
I just found we can intercept the javascript alert() native call and hook the user code before the actual execution. check out the sample code..
function Test(){
var alertHook=function(aa){
this.alert(aa);
}
this.alert("aa");
this.alert = alertHook;
alert("aa");
}
so everytime i call alert("aa") is been intercepted by my alertHook local function. But the below implementation with the small change does not work.
function Test(){
var alertHook=function(aa){
alert(aa);
}
alert("aa");
alert = alertHook; //throws Microsoft JScript runtime error: Object doesn't support this action
alert("aa");
}
it throws Microsoft JScript runtime error: Object doesn't support this action.
I dont know how this.alert = alertHook; let me intercept the call, but alert=alertHook; not.??
So i assume using this to intercept any native js methods.? is that right?
And is that acceptable? because this way i can completely replacing any native JS calls with my own methods??
UPDATE:
I asked is that acceptable? because how this is a good approach having eval() and letting users to replace native function calls?
And its responsibility of a language to protect developers from the misleading features, replacing the native js calls in a window level(or in a common framework js file) would crash the whole system.. isn't it??
i may be wrong in my opinion because i dont understand the reason behind this feature..? I never seen a language that let developer to replace its own implementation..
Depending on how Test(); is being called, this should be the window Object.
I believe Microsoft allows overwriting native JS functions only by specifying the window object.
So window.alert = alertHook; should work anywhere.
is it acceptable?
Yes it is. This is a major strength for the flexibility of the language, although I'm sure there's better alternatives instead of overwriting native behavior.
Overwriting native JavaScript functions isn't really a security issue. It could be one if you're running someone elses code that does it; but if you're running someone elses code there's a lot of other security issues you should be concerned about.
In my opinion, it never is good practice to redefine the native functions. It's rather better to use wrappers (for instance, create a debug function that directs its output to alert or console.log or ignores the calls or whatever suits your needs).
As for why JScript throws an exception with your second example and not the first one, it's easy. In the first example, you create a property called alert in your local scope, so when you refer alert you'll be referring this.alert rather than window.alert. In the second example, the alert you're referencing is the one from window, so assigning a different function to it will fail.
And its responsibility of a language to protect developers from the misleading features, replacing the native js calls in a window level(or in a common framework js file) would crash the whole system.. isn't it??
Not true, replacing the native call only hooks into it, replaces it: it does not rewrite the native at all. Crashing the "whole" system; JavaScript runs in a Virtual Machine, it's interpreted, so the chance of crashing the "whole" system (i.e. Blue Screen of Death?) is very very small. If so: it's not the programmers fault, but the implementation of JavaScript which is causing the error.
You can consider it as a feature: for instance, if you load a JavaScript from someone else's hand, you can reimplement some functions to extend.
Protection to the programmer is like keeping a dog on the leash: only unleash it, when you trust the dog! Since JavaScript runs in a Virtual Machine, any programmer can be unleashed -- if the implementation is secure enough, which it is (most of the time?)