If v8 optimizes ArrayBuffer like Uint*Array - javascript

I have checked out Where to use ArrayBuffer vs typed array in JavaScript? but it doesn't describe if ArrayBuffer is optimized by v8 or not. So say you have in there different chunks of integers or floats in the ArrayBuffer, wondering if they will be optimized by v8 like they are a Uint8Array, etc.

V8 developer here. ArrayBuffers are just data containers, I don't see what you would optimize about them. What kind of optimizations would you expect for "chunks of integers or floats"?
Typed arrays are views onto ArrayBuffers; the answer to the post you linked explains that nicely. Typed arrays provide index-based access to their elements (and V8's optimizing compiler has good support for such accesses); ArrayBuffers provide no way to access their elements (so the same optimizations do not apply).

Related

Growable 8-bit byte buffer in JavaScript

I am looking for a type that would allow me to manipulate growable byte buffers in JavaScript; essentially, an equivalent to the Python bytearray type. My criteria for a solution is that the buffer should be:
Growable: the buffer must support resizing; for this reason ArrayBuffer/Uint8Array (on their own, at least) will not do.
Efficient: I should have a reasonable expectation that using the buffer does not generate unreasonable amounts of overhead even in (relatively) naïve engines; for this reason, an Array of numbers will not do.
Portable: this needs to run in a browser, so nothing specific to node.js; though an ideal solution would only rely directly on APIs specified in ECMA and WHATWG standards.
Is there a 8-bit byte buffer type that fulfils these criteria?
There is a ResizableArrayBuffer specification under development. As of this writing, it is at stage 2, so it has no major implementations. For now, Arrays will have to do.

Convert between Blobs and ArrayBuffers without copying?

I can't find any performance info in the documentation about converting between ArrayBuffers and Blobs in Javascript. I believe the standard methods are:
const blob = new Blob([arrayBuffer])
and
const resp = new Response(blob)
const ab = await resp.arrayBuffer()
Note that I'm only concerned with Blobs that are in-memory, not File-based (so no I/O time is involved.) I did some timings in a codepen: https://codepen.io/oberbrunner/pen/ExjWLWY
which shows that both of those operations scale in time according to the size of the ArrayBuffer, so they are likely copying the data.
In fact just creating a new Response(blob) takes longer with a bigger blob, if my timing code is accurate. Also, the new Blob.arrayBuffer() function also appears to be O(n), although it's faster than the Response way. But not yet well-supported by browsers in 2020.
So my question is: is there authoritative documentation on the expected performance of these operations? Am I just using them wrong? Are there faster (constant-time, zero-copy) ways?
At the risk of answering an older question, for posterity -- the authoritative documentation you are looking for would be the [relevant sections of] File API and ECMAScript-2021 specifications for Blob and ArrayBuffer respectively.
As far as I have been able to determine, neither specification mandates a particular allocation scheme for either class of object; while specification of ArrayBuffer by ECMAScript may appear to be more suggestive of a particular data allocation mechanism, it's only so within the scope of the specification itself -- I cannot find anything like "you must allocate the data on the heap"; while specification of Blob is purposefully even more abstract and vague with regard to where and how data comprising the "blob" is actually allocated if at all.
Most user agents, in part guided by these specifications, will fall somewhere between naively allocating data of corresponding ArrayBuffer length, on the heap, while constructed blob objects may be efficiently backed by memory mapped files (in case of File blobs, for instance) which will basically make them backed by page file on some operating system -- meaning no RAM reserved at all until so required (for some operation on the blob, like converting it to an ArrayBuffer).
So a conversion from a Blob to an ArrayBuffer, while technically not specified to be O(n), would be so in most user agents, since the latter class facilitates actual immediate read and write data access (by element index etc), while Blob does not allow any immediate data access itself.
Now, I said "most user agents", because technically you could design a very elaborate data access mechanism with so copy-on-write semantics where no data is allocated when obtaining an ArrayBuffer from a Blob (through any of the methods you described in your question or the more modern APIs like Blob's arrayBuffer method) before the corresponding data needs to be read and/or modified.
In other words, all your benchmarks only pertain to concrete implementations (read user agents you have tried) which are free to implement either class and operations related to it however they seem fit as long as they do not violate their corresponding specifications.
However, if enough people will start to employ aggressive conversion of blobs back and forth, there isn't anything stopping vendors like Mozilla or Google from optimizing their implementations into something like described above (copy-on-write) which may or may not make them into O(log(n)) operations, for instance. After all, JavaScript was for a very long time called "interpreted" language -- but today calling it an interpreted language would be a bit of a misnomer, what with Chrome's V8 and Firefox's SpiderMonkey compiling native code out of it, in the name of optimization. And, again, they are free to do so -- no applicable specification of the language or a related environment mandates it be "interpeted". Same with blobs and array buffers.
Unfortunately, in practice, this means we have to live with what actually runs our programs that use blobs and array buffers -- which incur a real O(n) cost when you need to do something useful with a blob, normally.

What is the best way to compile JavaScript-like structures to static, fast C++?

On the development of a compiler from a language very similar to JavaScript to C++, I need a way to represent data structures. JavaScript's main data structures are Arrays and Hash-Tables. Arrays are more straighforward: I can use a vector of untyped pointers. It needs to be a vector because JS arrays are dynamic, and of pointers because JS arrays can hold any kind of object, for example:
var array = [1,2,[3,4],"test"];
I can't see a way to represent this other than that (is there?). For the hashes, I could use something similar, except including the string hashing step on access.
The problem is: JavaScript hashes are JIT-compiled into actual C++ objects which probably are much faster than hashes. This way, I'm afraid my attempt to generate C++ like that will actually result in slower code than the JavaScript version!
Does that make sense?
What would be the best approach to my compiler?
If this is an AOT compiler you can only process the hash keys that you see at compile-time, obviously. In this case you can change hash accesses to known keys to array accesses, giving each known key a small integer as index.

When to use Float32Array instead of Array in JavaScript

When does it make sense to use a Float32Array instead of a standard JavaScript Array for browser applications?
This performance test shows Float32Array to be, in general, slower - and if I understand correctly a standard Array stores numbers as 64bit - so there is no advantage in precision.
Aside from any possible performance hit, Float32Array also has the disadvantage of readability - having to use a constructor:
a = new Float32Array(2);
a[0] = 3.5;
a[1] = 4.5;
instead an array literal
a = [3.5, 4.5];
I'm asking this because I'm using the library glMatrix which defaults to Float32Array - and wondering if there's any reason I shouldn't force it to use Array instead which will allow me to use array literals.
I emailed the developer of glMatrix and my answer below includes his comments (points 2 & 3):
Creating a new object is generally quicker with Array than Float32Array. The gain is significant for small arrays, but is less (environment dependent) with larger arrays.
Accessing data from a TypedArray (eg. Float32Array) is often faster than from a normal array, which means that most array operations (aside from creating a new object) are faster with TypedArrays.
As also stated by #emidander, glMatrix was developed primarily for WebGL, which requires that vectors and matrices be passed as Float32Array. So, for a WebGL application, the potentially costly conversion from Array to Float32Array would need to be included in any performance measurement.
So, not surprisingly, the best choice is application dependent:
If arrays are generally small, and/or number of operations on them is low so that the constructor time is a significant proportion of the array's lifespan, use Array.
If code readability is as important as performance, then use Array (i.e. use [], instead of a constructor).
If arrays are very large and/or are used for many operations, then use a TypedArray.
For WebGL applications (or other applications that would otherwise require a type conversion), use Float32Array (or other TypedArray).
I would assume that the glMatrix library uses Float32Array because it is primarily used in WebGL-applications, where matrices are represented as Float32Arrays (http://www.khronos.org/registry/webgl/specs/1.0/#5.14.10).
In today browsers implementation, using Float32Array has impact in both writibility and performance if compared against vanilla Arrays. It seems that even gl-matrix authors agreed that the library need to be refactored to remove Float32Array dependency: https://github.com/toji/gl-matrix/issues/359

Does node.js provide a real array implementation?

I am using node.js as my server platform and I need to process a non sparse array of 65,000 items.
Javascript arrays are not true arrays, but actually hashes. Index access is accompagnied with conversion of the index to string and then doing a hash lookup. (see the Arrays section in http://www.crockford.com/javascript/survey.html).
So, my question is this. Does node.js implement a real array? The one that does cost us to resize or delete items, but with the true random access without any index-to-string-then-hash-lookup ?
Thanks.
EDIT
I may be asking for too much, but my array stores Javascript objects. Not numbers. And I cannot break it into many typed arrays, each holding number primitives or strings, because the objects have nested subobjects. Trying to use typed arrays will result in an unmaintainable code.
EDIT2
I must be missing something. Why does it have to be all or nothing? Either true Javascript with no true arrays or a C style extension with no Javascript benefits. Does having a true array of Javascript (untyped) objects contradicts the nature of Javascript in anyway? Java and C# have List<Object> which is essentially what I am looking for. C# even closer with List<DynamicObject>.
Node.js has the Javascript typed arrays: Int8Array, Uint8Array, Int16Array, Uint16Array, Int32Array, Uint32Array, Float32Array.
I think they are what you are asking for.
Node.js does offer a Buffer class that is probably what you're looking for:
A Buffer is similar to an array of integers but corresponds to a raw memory allocation outside the V8 heap. A Buffer cannot be resized.
Not intrinsically, no.
However depending on your level of expertise, you could write a "true" array extension using Node's C/C++ extension facility. See http://nodejs.org/api/addons.html
You want to use Low Level JavaScript (LLJS) to manipulate everything directly in C-style.
http://mbebenita.github.com/LLJS/
Notice that according to the link above, an LLJS array is more like the array you are looking for (true C-like array), rather than a Javascript array.
There is an implementation for LLJS in Node.js available , so maybe you do not have to write your own node.js C extension. Perhaps this implementation will do the trick: https://github.com/mbebenita/LLJS

Categories

Resources