Transferring data using Emscripten Worker API without copying

Transferring data using Emscripten Worker API without copying - javascript

Is there a way to get Emscripten to transfer, and not copy, data between web workers and the main UI thread?
Emscripten has an API that manages communication between Web Workers, which I believes just uses the postMessage / onmessage mechanism under the hood. Looking in the source for the Emscripten Worker API, it appears that it doesn't use the transferList option when it calls postMessage, and so the data gets copied.
Actually, I think it gets copied at least twice: first by the browser between the threads, and then a second time by Emscripten to get it into the Emscripten-managed heap-space. And if you want the data to continue to survive on the receiving end after the callback, it would have to be copied a third time, as according to the docs the data passed to the callback is only guaranteed to exist during the callback.
Repeating my question from the top: is there a way to get Emscripten to avoid all this copying by transferring, and not copying, data between web workers and the main UI thread?

This is possible if you make use of SharedArrayBuffer. Recently, the Emscripten guys added experimental support for pthread in Emscripten, which use this feature. However, only the Firefox nightly supports SharedArrayBuffers at the moment, so this is not widely adopted yet.

Related

Web Workers - do they create actual threads?

I have always thought that web workers create separate threads, but today I ran into the spec on w3c website. Below is a citation about web workers:
This allows for thread-like operation with message-passing as the
coordination mechanism.
The question is - if it is thread-like, not actual thread what is an advantage(performance wise) of using this technology?
Any help will be appreciated!

Yes, web workers create actual threads (or processes, the spec is flexible on this). According to the Web Workers specification, when a worker is created the first step is:
Create a separate parallel execution environment (i.e. a separate thread or process or equivalent construct), and run the rest of these steps in that context.
For the purposes of timing APIs, this is the official moment of creation of the worker.
(W3C Web Workers specification section 4.4)
So it is explicitly specified that code running in Web Workers run in actual threads or processes.
Although it is possible to implement Workers without threads (note the "equivalent construct" language) for use on systems that don't support threads, all browser implementations implement Web Workers as threads.

A web worker runs in a single thread isolated from the main thread, the way they pass messages around is thread-like and works differently depending on whether you're using dedicated (can only be accessed from the script that created it) or shared (can be accessed by any script within the same domain via a port object) workers.
EDIT:
Updated answer to reflect my comment from months ago. While a SINGLE web worker runs in an isolated thread it doesn't mean each additional worker will run in the same thread.

According to MDN,
Web Workers are a mechanism by which a script operation can be made to run in a background thread separate from the main execution thread of a web application. The advantage of this is that laborious processing can be performed in a separate thread, allowing the main (usually the UI) thread to run without being blocked/slowed down.
So, each worker does not create a separate thread, but all workers are running in a single separate thread.
I guess, like just in other things, the implementation and approach may differ from browser to browser.

May a Web Worker render on WebGL-Canvas?

I don't understand how web-workers works... Are web-workers parallel or just preempted?
Is it safe for a web-workers to render to a webgl context?
If I have only a web-worker rendering to webgl context, and my main "thread" is not invoking the worker also, is it safe to the web-worker to render to the webgl context?

When a Web Worker is created, you give it a URI pointing to a JavaScript file. It loads that JavaScript file in a new OS-level thread. You can't control affinity to specific cores or thread priorities, (as of this writing), but the underlying Thread created is real and unfettered. By default, JavaScript running in a Web Worker's thread has NO access to the DOM: you are not given access to the window object nor any DOM-related Classes.
The semantics of the Web Worker thread make it nearly completely unmoored from the default DOM thread. First thing that's interesting is that the Web Worker can run in 100% CPU usage infinite loops without worrying about freezing up the UI. This means the dreaded "Warning: Unresponsive script" message box cannot be triggered by a Web Worker!
The tradeoff of being unmoored is your ability to synchronize and communicate between DOM and worker threads is limited. The explicit conduit between worker and DOM is the postMessage() API for sending data and the onmessage event for receiving events. You can postMessage with strings and objects where your data is cloned from source thread's heap into target thread's heap. onmessage events are only received by your Web Worker when it is idle. This means, in order for onmessage events to be delivered from the DOM to your Web Worker in a timely fashion, the worker must yield frequently; this may put a wrinkle in the way you want to write your code.
It's important to understand that there are a special class of "Transferable Objects" in modern JavaScript implementations where objects that you send to postMessage() are not cloned, but rather ownership of the object is transferred from one thread to another. These are the types of data you want to send to postMessage() whenever possible; any time data is cloned when calling postMessage(), you create TONS of garbage to be GC'd and system performance will suffer.
The collection of Transferable Object types out there has been steadily growing, and the Mozilla Development Network, WhatWG, and W3C are great places to watch the spec research in this area. I couldn't even tell you all of the things that can be Transferred across threads in the browser nowadays, but if I made you a comprehensive list, it'd likely be out of date in a year or less.
Regarding your original question, on Firefox 44+, you can now partially transfer a Canvas to a WebWorker via the HTMLCanvasElement#transferControlToOffscreen function. transferControlToOffscreen creates a Transferable OffscreenCanvas object that you can postMessage over to a Web Worker. On the Web Worker, you can acquire a webgl CanvasContext and issue drawing commands to the canvas from the worker thread without having direct access to the actual canvas tag's DOM that still lives over with the DOM thread.
https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/transferControlToOffscreen
https://hacks.mozilla.org/2016/01/webgl-off-the-main-thread/

This question has an answer that basically states you can't use webgl from a web worker as web workers don't have access to the DOM and you have to call getContext() on a canvas object to get the webgl context.

Is there a way to do multi-threaded coding in NodeJS?

Based on my understanding, only I/O in NodeJS is non-blocking. If we do, for example, lots of heavy math operations, other users cannot access to the server until it's done.
I was wondering if there is a non-blocking way to do heavy computation in NodeJS? Just curious.

If you have long-running calculations that you want to do with Node, then you will need to start up a separate process to handle those calculations. Generally this would be done by creating some number of separate worker processes and passing the calculations off to them. By doing this, you keep the main Node event loop unblocked.
On the implementation side of things, you have two main options.
The first being that you manually spin off child processes using Node's child-process API functions. This one is nice because your calculations wouldn't even have to be javascript. The child process running the calculations could even be C or something.
Alternatively, the Web Worker specification, has several implementations available through NPM if you search for 'worker'. I can't speak to how well these work since I haven't used any, but they are probably the way to go.
Update
I'd go with option one now. The current child process APIs support sending messages and objects between processes easily in a worker-like way, so there is little reason to use a separate worker module.

You can use Hook.io to run a separate node process for your heavy computation and communicate between the two. Hook.io is particularly useful because it has auto-healing meshes meaning that if one of your hooks (processes) crashes it can be restarted automatically.

Use multiple NodeJS instances and communicate over sockets.

Use multiple node instances and communicate over node-zeromq, HTTP, plain TCP sockets, IPC (e.g. unix domain sockets), JSON-RPC or other means. Or use the web workers API as suggested above.
The multiple instances approach has its advantages and disadvantages. Disadvantages are that there is a burden of starting those instances and implementing own exchange protocols. Advantages are that scaling to many computers (as opposed to many cores/processors within a single computer) is possible.

I think this is way to late, but this is an awesome feature of nodejs you've to know about.
The only way abot multithreading is by spawning new processes, right so far.
But nodejs has an implemented message feature between spawned node-forks.
http://nodejs.org/docs/latest/api/child_processes.html#child_process.fork
Great work from the devs, you can pass objects etc. as message to your childs and backwards

You can use node cluster module.
https://nodejs.org/api/cluster.html

I would use JXCore - it's a polished nodejs based engine which runs your code but has several options including the multi threading you are searching for. Running this in production is a charm!
Project's source: https://github.com/jxcore/jxcore
Features include:
Support for core Node.JS features
Embeddable Interface
Publish to Mobile Platforms (Android, iOS ..)
Supports Multiple JavaScript Engines
Multi-threading Capabilities
Process Configuration & Monitor
In-memory File System
Application Packaging
Support for the latest JavaScript features (ES6, ASM.JS ...)
Support for Universal Windows Platform (uwp) api

multi threading using an iframe

I am trying to simulate multi threading using an iframe but I have come across a situation which I do not know if it actually utilizes the iframe process (thread) on its own.
For instance, If I call a method which lays inside an iframe, will it run using the thread created by the iframe or will it run using the main parent window thread?
If it is the latter, then is it possible to change the scope so that the iframe calls the method (so that the program uses a different thread from that of the parent window)
EDIT:
Maybe I should have been more clear on this but I do not want to use WebWorkers simply because I do not have access to the DOM elements.

If you want to run some background tasks just use WebWorkers.
Generally you don't need to multi thread js code. You should use event loops instead.

Take a look at Using web workers from the MDN docs.
The Worker interface spawns real OS-level threads, and concurrency can
cause interesting effects in your code if you aren't careful. However,
in the case of web workers, the carefully controlled communication
points with other threads means that it's actually very hard to cause
concurrency problems. There's no access to non-thread safe components
or the DOM and you have to pass specific data in and out of a thread
through serialized objects. So you have to work really hard to cause
problems in your code.
John Resig wrote Computing with JavaScript Web Workersn back in 2009 on this topic. However, according to When can I use, there is no IE support until IE10 so it may not fit your needs.

Why doesn't Node.js have a native DOM?

When I discovered that Node.js was built using the V8 JavaScript engine, I thought:
Great, web scraping will be easier as the page
will be rendered like in the browser, with a
"native" DOM supporting XPath and any AJAX calls on
the page executed.
Why doesn't it have a native DOM when it uses the same JavaScript engine as Chrome?
Why doesn't it have a mode to run JavaScript in retrieved pages?
What am I not understanding about JavaScript engines vs the engine in a web browser?
Many thanks!

The DOM is the DOM, and the JavaScript implementation is simply a separate entity. The DOM represents a set of facilities that a web browser exposes to the JavaScript environment. There's no requirement however that any particular JavaScript runtime will have any facilities exposed via the global object.
What Node.js is is a stand-alone JavaScript environment completely independent of a web browser. There's no intrinsic link between web browsers and JavaScript; the DOM is not part of the JavaScript language or specification or anything.
I use the old Rhino Java-based JavaScript implementation in my Java-based web server. That environment also has nothing at all to do with any DOM. It's my own application that's responsible for populating the global object with facilities to do what I need it to be able to do, and it's not a DOM.
Note that there are projects like jsdom if you want a virtual DOM in your Node project. Because of its very nature as a server-side platform, a DOM is a facility that Node can do without and still make perfect sense for a wide variety of server applications. That's not to say that a DOM might not be useful to some people, but it's just not in the same category of services as things like process control, I/O, networking, database interop, and so on.
There may be some "official" answer to the question "why?" out there, but it's basically just the business of those who maintain Node (the Node Foundation now). If some intrepid developer out there decides that Node should ship by default with a set of modules to support a virtual DOM, and successfully works and works and makes that happen, then Node will have a DOM.

P.S: When reading this question I was also wondering if V8 (node.js is built on top of this) had a DOM
Why when it uses the same JS engine as Chrome doesn't it have a native
DOM?
But I searched google and found Google's V8 page which recites the following:
JavaScript is most commonly used for client-side scripting in a
browser, being used to manipulate Document Object Model (DOM) objects
for example. The DOM is not, however, typically provided by the
JavaScript engine but instead by a browser. The same is true of
V8—Google Chrome provides the DOM. V8 does however provide all the
data types, operators, objects and functions specified in the ECMA
standard.
node.js uses V8 and not Google Chrome.
Likewise, why doesn't it have a mode to run JS in retrieved pages?
I also think we don't really need it that bad. Ryan Dahl created node.js as one man (single programmer). Maybe now he (his team) will develop this, but I was already extremely amazed by the amount of code he produced (crazy). He wanted to make a non-blocking easy/efficient library, which I think he did a mighty good job at.
But then again, another developer created a module which is pretty good and actively developed (today) at https://github.com/tmpvar/jsdom.
What am I not understanding about Javascript engines vs the engine in
a web browser? :)
Those are different things as is hopefully clear from the quote above.

The Document Object Model (DOM in short) is a programming interface for HTML and XML documents and it represents the page so that programs can change the document structure, style, and content. More on this subject.
The necessary distinction between client-side (browser) and server-side (Node.js) and their main goals:
Client-side: accessing and displaying information of the web
Server-side: providing stable and reliable ways to deliver web information
Why is there no DOM in Node.js be default?
By default, Node.js doesn't have access, nor have any knowledge about the actual DOM in your own browser. Node.js just delivers the data, that will be used by your own browser to process and render the whole website, the DOM included. The server provides the data to your browser to use and process. That is the intended way.
Why wouldn't you want to access the DOM in Node.js?
Accessing your browser's actual DOM using Node.js would be just simply out of the goal of the server. Your own browser's role is to display the data coming from the server. However it is certainly possible and there are multiple solutions in different level of depths and varieties to pre-render, manipulate or change the DOM using AJAX calls. We'll see what future trends will bring.
Why would you want to access the DOM in Node.js?
By default, you shouldn't access your own, actual DOM (at least some data of it) using Node.js. Client-side and server-side are separated in terms of role, functionality, and responsibility based on years of experience and knowledge. Although there are several situations, where there are solid reasons to do so:
Gathering usage data (A/B testing, UI/UX efficiency and feedback)
Headless testing (Development, automation, web-scraping)
How can you access the DOM in Node.js?
jsdom: pure-JavaScript implementation, good for testing your own DOM/browser-related project
cheerio: great solution if you like/often use jQuery
puppeteer: Google's own way to provide headless testing using Google Chrome
own solution (your possible future project link here)
Although these solutions do not provide a way to access your browser's own, actual DOM by default, but you can create a project to send some form of data about your DOM to the server, then use/render/manipulate that data based on your needs.
...and yes, web-scraping and web development in terms of tools and utilities became more sophisticated and certainly easier in several fields.

node.js chose not to include it in their standard library. For any functionality, there is an inevitable tradeoff between comprehensiveness, scalability, and maintainability.
That doesn't mean it's not potentially useful. There is at least one JavaScript DOM implementation intended for NodeJS (among other CommonJS implementations).

You seem to have a flawed assumption that V8 and the DOM are inextricably related, that's not the case. The DOM is actually handled by Webkit, V8 doesn't handle the DOM, it handles Javascript calls to the DOM. Don't let this discourage you, Node.js has carved out a significant niche in the realtime server market, but don't let anybody tell you it's just for servers. Node makes it possible to build almost anything with JavaScript.
It is possible to do what you're talking about. For example there is the very good jsdom library if you really need access to the DOM, and node-htmlparser, there are also some really good scraping libraries that take advantage of these like apricot.

2018 answer: mainly for historical reasons, but this may change in future.
Historically, very little DOM manipulation was done on the server. Addiotinally, as other answers allude, the JS stdlib and the DOM are seperate libraries - if you're using node, for, say, Unix scripting, then HTMLElement and NodeList etc aren't really relevant to that.
However: server-side DOM manipulation is now a very common part of delivering web apps. Web servers need to understand the structure of pages, and, if asked to render a resource as HTML, deliver HTML content that reflects the initial state of a web application. This means web apps load much faster than if the server simply delivers a stub page and has the browsers then do the work of filling in the real content. Currently this is done with JSDom and similar, but in the same way node has Request and Response objects built in, having DOM functions maintained as part of the stdlib would help with this task.

Javascript != browser. Javascript as a language is not tied to browsers; node.js is simply an implementation of Javascript that is intended for servers, not browsers. Hence no DOM.

If you read DOM as 'linked objects immediately accessible from my script' then the answer 'it does, but it's very different from set of objects available from web document script'. The main reason is that node is 'evented I/O for V8', not 'HTML tree objects for V8'

Node is a runtime environment, it does not render a DOM like a browser.

Because there isn't a DOM. DOM stands for Document Object Model. There is no document in Node, so not DOM to manipulate it. That is definitively a browser thing.
You can use a library like cheerio though which gives you some simple DOM manipulation.
Node is server-level JavaScript. It's just the language applied to a basic system API, more like C++ or Java.

It seems people have answered 'why' but not how. A quick answer of how is that in a web browser, a document object is exposed (hence DOM , document object model). On windows this object is called document object. You can refer to this page and look at the methods it exposes which are for handling HTML documents like createElement. I don't use node.js or haven't done COM programming in a while but I'd imagine you could use DOM in node.js by simply calling the COM object IHTMLDocument3. Of course for other platforms like Mac OS X or Linux you would probably have to use something from their OS api. This should allow you to easily build a webpage server side using DOM, or to scrape incoming web pages.

Node.js is for serverside programming. There is no DOM to be rendered in the server.

1) What does it mean for it to have a D ocument O bject M odel? There's no document to represent.
2) You're most of the time you're not retrieving pages. You can, but most Node apps probably won't be.
3) Without a document and a browser, Javascript is just another programming language. So you may ask why there isn't a DOM in C# or Java

Develop Reference

JavaScript is the programming language of the Web.