Convert between Blobs and ArrayBuffers without copying? - javascript

I can't find any performance info in the documentation about converting between ArrayBuffers and Blobs in Javascript. I believe the standard methods are:
const blob = new Blob([arrayBuffer])
and
const resp = new Response(blob)
const ab = await resp.arrayBuffer()
Note that I'm only concerned with Blobs that are in-memory, not File-based (so no I/O time is involved.) I did some timings in a codepen: https://codepen.io/oberbrunner/pen/ExjWLWY
which shows that both of those operations scale in time according to the size of the ArrayBuffer, so they are likely copying the data.
In fact just creating a new Response(blob) takes longer with a bigger blob, if my timing code is accurate. Also, the new Blob.arrayBuffer() function also appears to be O(n), although it's faster than the Response way. But not yet well-supported by browsers in 2020.
So my question is: is there authoritative documentation on the expected performance of these operations? Am I just using them wrong? Are there faster (constant-time, zero-copy) ways?

At the risk of answering an older question, for posterity -- the authoritative documentation you are looking for would be the [relevant sections of] File API and ECMAScript-2021 specifications for Blob and ArrayBuffer respectively.
As far as I have been able to determine, neither specification mandates a particular allocation scheme for either class of object; while specification of ArrayBuffer by ECMAScript may appear to be more suggestive of a particular data allocation mechanism, it's only so within the scope of the specification itself -- I cannot find anything like "you must allocate the data on the heap"; while specification of Blob is purposefully even more abstract and vague with regard to where and how data comprising the "blob" is actually allocated if at all.
Most user agents, in part guided by these specifications, will fall somewhere between naively allocating data of corresponding ArrayBuffer length, on the heap, while constructed blob objects may be efficiently backed by memory mapped files (in case of File blobs, for instance) which will basically make them backed by page file on some operating system -- meaning no RAM reserved at all until so required (for some operation on the blob, like converting it to an ArrayBuffer).
So a conversion from a Blob to an ArrayBuffer, while technically not specified to be O(n), would be so in most user agents, since the latter class facilitates actual immediate read and write data access (by element index etc), while Blob does not allow any immediate data access itself.
Now, I said "most user agents", because technically you could design a very elaborate data access mechanism with so copy-on-write semantics where no data is allocated when obtaining an ArrayBuffer from a Blob (through any of the methods you described in your question or the more modern APIs like Blob's arrayBuffer method) before the corresponding data needs to be read and/or modified.
In other words, all your benchmarks only pertain to concrete implementations (read user agents you have tried) which are free to implement either class and operations related to it however they seem fit as long as they do not violate their corresponding specifications.
However, if enough people will start to employ aggressive conversion of blobs back and forth, there isn't anything stopping vendors like Mozilla or Google from optimizing their implementations into something like described above (copy-on-write) which may or may not make them into O(log(n)) operations, for instance. After all, JavaScript was for a very long time called "interpreted" language -- but today calling it an interpreted language would be a bit of a misnomer, what with Chrome's V8 and Firefox's SpiderMonkey compiling native code out of it, in the name of optimization. And, again, they are free to do so -- no applicable specification of the language or a related environment mandates it be "interpeted". Same with blobs and array buffers.
Unfortunately, in practice, this means we have to live with what actually runs our programs that use blobs and array buffers -- which incur a real O(n) cost when you need to do something useful with a blob, normally.

Related

Which one is preferable: Buffer.from() or TextEncoder.encode()?

From my understanding and API docs, in Node the following are equivalent and return an Uint8Array:
Buffer.from(someString, 'utf-8')
(new TextEncoder()).encode(someString)
Is either of those on the way of becoming deprecated? Does someone know of any considerations that make either Buffer or TextEncoder/TextDecoder preferable over the other, if all that’s needed is converting UTF-8 strings to and from Uint8Arrays?
From my understanding, Buffer is Node’s original implementation of binary blobs before equivalent feature has made its way into browser JS runtime.
After browsers went with a different API, Node runtime incorporated that as well (which makes sense from code portability standpoint), and preserved the original buffer support.
As a result, in Node there are multiple ways of achieving roughly similar results when it comes to binary blobs, where some ways will also work in browser while others won’t. Buffer.from()/TextEncoder.encode() might be one of them.
I’m not sure if there’s any performance gain to be had by choosing “Node classic” Buffer API over browser-compatible TextEncoder.

Is there a way to get the address of a variable or object in javascript? [duplicate]

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.
If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?
It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.
You can using a side-channel, but you can't do anything useful with it other than attacking browser security!
The closest to virtual addresses are ArrayBuffers.
If one virtual address within an ArrayBuffer is identified,
the remaining addresses are also known, as both the addresses
of the memory and the array indices are linear.
Although virtual addresses are not themselves physical memory addresses, there are ways to translate virtual address into a physical memory address.
Browser engines allocate ArrayBuffers always page
aligned. The first byte of the ArrayBuffer is therefore at the
beginning of a new physical page and has the least significant
12 bits set to ‘0’.
If a large chunk of memory is allocated, browser engines typically
use mmap to allocate this memory, which is optimized to
allocate 2 MB transparent huge pages (THP) instead of 4 KB
pages.
As these physical pages are mapped on
demand, i.e., as soon as the first access to the page occurs,
iterating over the array indices results in page faults at the
beginning of a new page. The time to resolve a page fault is
significantly higher than a normal memory access. Thus, you can knows the index at which a new 2 MB page starts. At
this array index, the underlying physical page has the 21 least
significant bits set to ‘0’.
This answer is not trying to provide a proof of concept because I don’t have time for this, but I may be able to do so in the future. This answer is an attempt to point the right direction to the person asking the question.
Sources,
http://www.misc0110.net/files/jszero.pdf
https://download.vusec.net/papers/anc_ndss17.pdf
I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

Growable 8-bit byte buffer in JavaScript

I am looking for a type that would allow me to manipulate growable byte buffers in JavaScript; essentially, an equivalent to the Python bytearray type. My criteria for a solution is that the buffer should be:
Growable: the buffer must support resizing; for this reason ArrayBuffer/Uint8Array (on their own, at least) will not do.
Efficient: I should have a reasonable expectation that using the buffer does not generate unreasonable amounts of overhead even in (relatively) naïve engines; for this reason, an Array of numbers will not do.
Portable: this needs to run in a browser, so nothing specific to node.js; though an ideal solution would only rely directly on APIs specified in ECMA and WHATWG standards.
Is there a 8-bit byte buffer type that fulfils these criteria?
There is a ResizableArrayBuffer specification under development. As of this writing, it is at stage 2, so it has no major implementations. For now, Arrays will have to do.

Web Workers - If objects are passed by value is the memory usage doubled

Let's say i'm doing some image processing whereby I get the pixel data from a html5 canvas using .toDataURL().
If the outputted array takes up 100MB of memory and I then pass it to a webworker to do some computation, since the object is passed to the worker by value will I now have two copies of this array, thus taking up 200MB of memory?
To avoid doubling the memory usage, use the
Transferable Object based API's. The MDN docs mention this is a zero-copy operation, so you can expect it will not double your memory usage.
Since it sounds like you're dealing with an array already, this is probably a trivial change of passing the array into postMessage twice:
var theOutputtedArray = ...;
worker.postMessage(theOutputtedArray, theOutputtedArray);
This older blog post from the Chrome developers (5 years ago) discusses how it differs from regular message passing, and gives some demo code.
This slightly newer blog post from a RedHat dev gives a very in-depth comparison of the performance & stability of three different ways of passing large objects between the main thread and worker threads.
Happy reading.

How can I get the memory address of a JavaScript variable?

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.
If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?
It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.
You can using a side-channel, but you can't do anything useful with it other than attacking browser security!
The closest to virtual addresses are ArrayBuffers.
If one virtual address within an ArrayBuffer is identified,
the remaining addresses are also known, as both the addresses
of the memory and the array indices are linear.
Although virtual addresses are not themselves physical memory addresses, there are ways to translate virtual address into a physical memory address.
Browser engines allocate ArrayBuffers always page
aligned. The first byte of the ArrayBuffer is therefore at the
beginning of a new physical page and has the least significant
12 bits set to ‘0’.
If a large chunk of memory is allocated, browser engines typically
use mmap to allocate this memory, which is optimized to
allocate 2 MB transparent huge pages (THP) instead of 4 KB
pages.
As these physical pages are mapped on
demand, i.e., as soon as the first access to the page occurs,
iterating over the array indices results in page faults at the
beginning of a new page. The time to resolve a page fault is
significantly higher than a normal memory access. Thus, you can knows the index at which a new 2 MB page starts. At
this array index, the underlying physical page has the 21 least
significant bits set to ‘0’.
This answer is not trying to provide a proof of concept because I don’t have time for this, but I may be able to do so in the future. This answer is an attempt to point the right direction to the person asking the question.
Sources,
http://www.misc0110.net/files/jszero.pdf
https://download.vusec.net/papers/anc_ndss17.pdf
I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

Categories

Resources