Growable 8-bit byte buffer in JavaScript - javascript

I am looking for a type that would allow me to manipulate growable byte buffers in JavaScript; essentially, an equivalent to the Python bytearray type. My criteria for a solution is that the buffer should be:
Growable: the buffer must support resizing; for this reason ArrayBuffer/Uint8Array (on their own, at least) will not do.
Efficient: I should have a reasonable expectation that using the buffer does not generate unreasonable amounts of overhead even in (relatively) naïve engines; for this reason, an Array of numbers will not do.
Portable: this needs to run in a browser, so nothing specific to node.js; though an ideal solution would only rely directly on APIs specified in ECMA and WHATWG standards.
Is there a 8-bit byte buffer type that fulfils these criteria?

There is a ResizableArrayBuffer specification under development. As of this writing, it is at stage 2, so it has no major implementations. For now, Arrays will have to do.

Related

Which one is preferable: Buffer.from() or TextEncoder.encode()?

From my understanding and API docs, in Node the following are equivalent and return an Uint8Array:
Buffer.from(someString, 'utf-8')
(new TextEncoder()).encode(someString)
Is either of those on the way of becoming deprecated? Does someone know of any considerations that make either Buffer or TextEncoder/TextDecoder preferable over the other, if all that’s needed is converting UTF-8 strings to and from Uint8Arrays?
From my understanding, Buffer is Node’s original implementation of binary blobs before equivalent feature has made its way into browser JS runtime.
After browsers went with a different API, Node runtime incorporated that as well (which makes sense from code portability standpoint), and preserved the original buffer support.
As a result, in Node there are multiple ways of achieving roughly similar results when it comes to binary blobs, where some ways will also work in browser while others won’t. Buffer.from()/TextEncoder.encode() might be one of them.
I’m not sure if there’s any performance gain to be had by choosing “Node classic” Buffer API over browser-compatible TextEncoder.

Convert between Blobs and ArrayBuffers without copying?

I can't find any performance info in the documentation about converting between ArrayBuffers and Blobs in Javascript. I believe the standard methods are:
const blob = new Blob([arrayBuffer])
and
const resp = new Response(blob)
const ab = await resp.arrayBuffer()
Note that I'm only concerned with Blobs that are in-memory, not File-based (so no I/O time is involved.) I did some timings in a codepen: https://codepen.io/oberbrunner/pen/ExjWLWY
which shows that both of those operations scale in time according to the size of the ArrayBuffer, so they are likely copying the data.
In fact just creating a new Response(blob) takes longer with a bigger blob, if my timing code is accurate. Also, the new Blob.arrayBuffer() function also appears to be O(n), although it's faster than the Response way. But not yet well-supported by browsers in 2020.
So my question is: is there authoritative documentation on the expected performance of these operations? Am I just using them wrong? Are there faster (constant-time, zero-copy) ways?
At the risk of answering an older question, for posterity -- the authoritative documentation you are looking for would be the [relevant sections of] File API and ECMAScript-2021 specifications for Blob and ArrayBuffer respectively.
As far as I have been able to determine, neither specification mandates a particular allocation scheme for either class of object; while specification of ArrayBuffer by ECMAScript may appear to be more suggestive of a particular data allocation mechanism, it's only so within the scope of the specification itself -- I cannot find anything like "you must allocate the data on the heap"; while specification of Blob is purposefully even more abstract and vague with regard to where and how data comprising the "blob" is actually allocated if at all.
Most user agents, in part guided by these specifications, will fall somewhere between naively allocating data of corresponding ArrayBuffer length, on the heap, while constructed blob objects may be efficiently backed by memory mapped files (in case of File blobs, for instance) which will basically make them backed by page file on some operating system -- meaning no RAM reserved at all until so required (for some operation on the blob, like converting it to an ArrayBuffer).
So a conversion from a Blob to an ArrayBuffer, while technically not specified to be O(n), would be so in most user agents, since the latter class facilitates actual immediate read and write data access (by element index etc), while Blob does not allow any immediate data access itself.
Now, I said "most user agents", because technically you could design a very elaborate data access mechanism with so copy-on-write semantics where no data is allocated when obtaining an ArrayBuffer from a Blob (through any of the methods you described in your question or the more modern APIs like Blob's arrayBuffer method) before the corresponding data needs to be read and/or modified.
In other words, all your benchmarks only pertain to concrete implementations (read user agents you have tried) which are free to implement either class and operations related to it however they seem fit as long as they do not violate their corresponding specifications.
However, if enough people will start to employ aggressive conversion of blobs back and forth, there isn't anything stopping vendors like Mozilla or Google from optimizing their implementations into something like described above (copy-on-write) which may or may not make them into O(log(n)) operations, for instance. After all, JavaScript was for a very long time called "interpreted" language -- but today calling it an interpreted language would be a bit of a misnomer, what with Chrome's V8 and Firefox's SpiderMonkey compiling native code out of it, in the name of optimization. And, again, they are free to do so -- no applicable specification of the language or a related environment mandates it be "interpeted". Same with blobs and array buffers.
Unfortunately, in practice, this means we have to live with what actually runs our programs that use blobs and array buffers -- which incur a real O(n) cost when you need to do something useful with a blob, normally.

If v8 optimizes ArrayBuffer like Uint*Array

I have checked out Where to use ArrayBuffer vs typed array in JavaScript? but it doesn't describe if ArrayBuffer is optimized by v8 or not. So say you have in there different chunks of integers or floats in the ArrayBuffer, wondering if they will be optimized by v8 like they are a Uint8Array, etc.
V8 developer here. ArrayBuffers are just data containers, I don't see what you would optimize about them. What kind of optimizations would you expect for "chunks of integers or floats"?
Typed arrays are views onto ArrayBuffers; the answer to the post you linked explains that nicely. Typed arrays provide index-based access to their elements (and V8's optimizing compiler has good support for such accesses); ArrayBuffers provide no way to access their elements (so the same optimizations do not apply).

Better performant alternatives of JSON on mobile devices

I am building a webgl application. And it requires deserialization of data ~15MB (this is size of a single object, I will have around 10 of those in my application) and the bigger portion (90%) of this data is a few arrays of floating point numbers and these arrays need to be deserialized into Float32Arrays in JavaScript.
Currently I am using JSON. Since my data contains lots of repeating numbers it is highly compressible and I am happy with the network performance. Besides. I am also happy with it's performance on desktop. However loading, deserialization of the data into plain JS arrays and then converting them to Float32Arrays on mobile devices takes a lot of time.
I considered using protobuff but I saw this on https://protobuffers.codeplex.com/
Protocol Buffers are not designed to handle large messages. If you are
dealing in messages larger than a megabyte each, it may be time to
consider an alternate strategy.
So what can I do to improve performance of my application. What SERDES methods should I test?
Please walk me through this process and help me test my alternatives, I'll put more details if you ask anything in the comments section.
If your Object is like one big Array of floats, you could send the raw bytes instead of a JSON-encoded string.
XMLHttpRequest has responseType = "arraybuffer". With that your "parsing-step" is reduced to var floats = new Float32Array(xhr.response).
And it would even reduce the impact of this task to the memory, because you don't need to keep a 15MB big String + an intermediate Array containing maybe about 20MB of doubles, i guess + the resulting Float32Array containing another 10MB (half of the doubles) all at about the same time.
You have 1 ArrayBuffer containing only the raw bytes + a Float32Array that references this data in memory.
If this doesn't work for you, maybe you could explain the nature/structure of the data, that you send around.
Or maybe the code you use in the backend, if the serialization is the Problem.

Accessing binary data from Javascript, Ajax, IE: can responseBody be read from Javascript (not VB)?

First of all, I am aware of this question:
How do I load binary image data using Javascript and XMLHttpRequest?
and specifically best answer therein, http://emilsblog.lerch.org/2009/07/javascript-hacks-using-xhr-to-load.html.
So accessing binary data from Javascript using Firefox (and later versions of Chrome which actually seem to work too; don't know about Opera). So far so good.
But I am still hoping to find a way to access binary data with a modern IE (ideally IE 6, but at least IE 7+), without using VB.
It has been mentioned that XHR.messageBody would not work (if it contains zero bytes), but I was wondering if this might have been resolved with newer versions; or if there might be alternate settings that would allow simple binary data access.
Specific use case for me is that of accessing data returned by a web service that is encoded using a binary data transfer format (including byte combinations that are not legal in UTF-8 encoding).
It's possible with IE10, using responseType=arraybuffer or blob. You only had to wait for a few years...
http://msdn.microsoft.com/en-us/library/ie/br212474%28v=vs.94%29.aspx
http://msdn.microsoft.com/en-us/library/ie/hh673569%28v=vs.85%29.aspx
Ok, I have found some interesting leads, although not completely good solution yet.
One obvious thing I tried was to play with encodings. There are 2 obvious things that really should work:
Latin-1 (aka ISO-8859-1): it is single-byte encoding, mapping one-to-one with Unicode. So theoretically it should be enough to declare content type of "text/plain; charset=ISO-8859-1" and get character-per-byte. Alas, due to idiotic logics of browsers (and even more idiotic mandate by HTML 5!), there is some transcoding occuring which changes high control character range (codes 128 - 159) in strange ways. Apparently this is due to mandatory assumption that encoding really is Windows-1252 (why? For some silly reasons.. but it is what it is)
UCS-2 is a fixed-length 2-byte encoding that predated UTF-17; and simply splits 16-bit character codes into 2 bytes. Alas, browsers do not seem to support it.
UTF-16 might work, theoretically, but there is the problem of surrogate pair characters (0xD800 - 0xDFFF) which are reserved. And if byte pairs that encode these characters are included, corruption occurs.
However: it seems to conversion for Latin-1 might be reversible, and if so, I bet I could make use of it after all. All mutations are from 1 byte (0x00 - 0xFF) into larger-than-byte values, and there are no ambiguous mappings at least for Firefox. If this holds true for other browsers, it will be possible to map values back and remove ill effects of automatic transcoding. And that would then work for multiple browsers, including IE (with the caveat of needing something special to deal with null values).
Finally, some useful links for conversions of datatypes are:
http://www.merlyn.demon.co.uk/js-exact.htm#IEEE (to handle floating points to/from binary IEEE representation)
http://jsfromhell.com/classes/binary-parser (for general parsing)
You can use the JScript "VBArray" object to get at these bytes in IE (without using VBScript):
var data = new VBArray(xhr.responseBody).toArray();
I guess answer is plain "no", as per this post: how do I access XHR responseBody (for binary data) from Javascript in IE?
(or: "use VBScript to help")

Categories

Resources