When does it make sense to use a Float32Array instead of a standard JavaScript Array for browser applications?
This performance test shows Float32Array to be, in general, slower - and if I understand correctly a standard Array stores numbers as 64bit - so there is no advantage in precision.
Aside from any possible performance hit, Float32Array also has the disadvantage of readability - having to use a constructor:
a = new Float32Array(2);
a[0] = 3.5;
a[1] = 4.5;
instead an array literal
a = [3.5, 4.5];
I'm asking this because I'm using the library glMatrix which defaults to Float32Array - and wondering if there's any reason I shouldn't force it to use Array instead which will allow me to use array literals.
I emailed the developer of glMatrix and my answer below includes his comments (points 2 & 3):
Creating a new object is generally quicker with Array than Float32Array. The gain is significant for small arrays, but is less (environment dependent) with larger arrays.
Accessing data from a TypedArray (eg. Float32Array) is often faster than from a normal array, which means that most array operations (aside from creating a new object) are faster with TypedArrays.
As also stated by #emidander, glMatrix was developed primarily for WebGL, which requires that vectors and matrices be passed as Float32Array. So, for a WebGL application, the potentially costly conversion from Array to Float32Array would need to be included in any performance measurement.
So, not surprisingly, the best choice is application dependent:
If arrays are generally small, and/or number of operations on them is low so that the constructor time is a significant proportion of the array's lifespan, use Array.
If code readability is as important as performance, then use Array (i.e. use [], instead of a constructor).
If arrays are very large and/or are used for many operations, then use a TypedArray.
For WebGL applications (or other applications that would otherwise require a type conversion), use Float32Array (or other TypedArray).
I would assume that the glMatrix library uses Float32Array because it is primarily used in WebGL-applications, where matrices are represented as Float32Arrays (http://www.khronos.org/registry/webgl/specs/1.0/#5.14.10).
In today browsers implementation, using Float32Array has impact in both writibility and performance if compared against vanilla Arrays. It seems that even gl-matrix authors agreed that the library need to be refactored to remove Float32Array dependency: https://github.com/toji/gl-matrix/issues/359
Related
I have checked out Where to use ArrayBuffer vs typed array in JavaScript? but it doesn't describe if ArrayBuffer is optimized by v8 or not. So say you have in there different chunks of integers or floats in the ArrayBuffer, wondering if they will be optimized by v8 like they are a Uint8Array, etc.
V8 developer here. ArrayBuffers are just data containers, I don't see what you would optimize about them. What kind of optimizations would you expect for "chunks of integers or floats"?
Typed arrays are views onto ArrayBuffers; the answer to the post you linked explains that nicely. Typed arrays provide index-based access to their elements (and V8's optimizing compiler has good support for such accesses); ArrayBuffers provide no way to access their elements (so the same optimizations do not apply).
Javascript being a dynamic language, why is it mandatory to mention the size of the buffer when it is created?
var buffer = new Buffer(10);
I should think it's likely that Buffer instances use typed arrays behind the scenes for efficiency, or even low-level arrays (as Buffer is a native part of Node, which is written in C++, not JavaScript). Indeed, looking at node_buffer.cc, that appears to be the case. Typed arrays or low-level arrays are fixed-size, allocate-on-creation structures.
Side note: new Buffer(size) is deprecated; use Buffer.alloc instead.
From Node.js documentation :
Instances of the Buffer class are similar to arrays of integers but
correspond to fixed-sized, raw memory allocations outside the V8 heap.
The size of the Buffer is established when it is created and cannot be
resized.
Since arrays themselves need that their size be specified at initialization hence similarly for Buffer.
I have been messing with Canvas a lot lately, developing some ideas I have for a web-based game. As such I've recently run into Javascript Typed Arrays. I've done some reading for example at MDN and I just can't understand anything I'm finding. It seems most often, when someone is explaining Typed Arrays, they use analogies to other languages that are a little beyond my understanding.
My experience with "programming," if you can call it that (and not just front-end scripting), is pretty much limited to Javascript. I do feel as though I understand Javascript pretty well outside of this instance, however. I have deeply investigated and used the Object.prototype structure of Javascript, and more subtle factors such as variable referencing and the value of this, but when I look at any information I've found about Typed Arrays, I'm just lost.
With this frame-of-reference in mind, can you describe Typed Arrays in a simple, usable way? The most effective depicted use-case, for me, would be something to do with Canvas image data. Also, a well-commented Fiddle would be most appreciated.
In typed programming languages (to which JavaScript kinda belongs) we usually have variables of fixed declared type that can be dynamically assigned values.
With Typed Arrays it's quite the opposite.
You have a fixed chunk of data (represented by ArrayBuffer) that you do not access directly. Instead this data is accessed by views. Views are created at run time and they effectively declare some portion of the buffer to be of a certain type. These views are sub-classes of ArrayBufferView. The views define the certain continuous portion of this chunk of data as elements of an array of a certain type. Once the type is declared browser knows the length and content of each element, as well as a number of such elements. With this knowledge browsers can access individual elements much more efficiently.
So we dynamically assigning a type to a portion of what actually is just a buffer. We can assign multiple views to the same buffer.
From the Specs:
Multiple typed array views can refer to the same ArrayBuffer, of different types,
lengths, and offsets.
This allows for complex data structures to be built up in the ArrayBuffer.
As an example, given the following code:
// create an 8-byte ArrayBuffer
var b = new ArrayBuffer(8);
// create a view v1 referring to b, of type Int32, starting at
// the default byte index (0) and extending until the end of the buffer
var v1 = new Int32Array(b);
// create a view v2 referring to b, of type Uint8, starting at
// byte index 2 and extending until the end of the buffer
var v2 = new Uint8Array(b, 2);
// create a view v3 referring to b, of type Int16, starting at
// byte index 2 and having a length of 2
var v3 = new Int16Array(b, 2, 2);
The following buffer and view layout is created:
This defines an 8-byte buffer b, and three views of that buffer, v1,
v2, and v3. Each of the views refers to the same buffer -- so v1[0]
refers to bytes 0..3 as a signed 32-bit integer, v2[0] refers to byte
2 as a unsigned 8-bit integer, and v3[0] refers to bytes 2..3 as a
signed 16-bit integer. Any modification to one view is immediately
visible in the other: for example, after v2[0] = 0xff; v21 = 0xff;
then v3[0] == -1 (where -1 is represented as 0xffff).
So instead of declaring data structures and filling them with data, we take data and overlay it with different data types.
I spend all my time in javascript these days, but I'll take a stab at quick summary, since I've used typed arrays in other languages, like Java.
The closest thing I think you'll find in the way of comparison, when it comes to typed arrays, is a performance comparison. In my head, Typed Arrays enable compilers to make assumptions they can't normally make. If someone is optimizing things at the low level of a javascript engine like V8, those assumptions become valuable. If you can say, "Data will always be of size X," (or something similar), then you can, for instance, allocate memory more efficiently, which lets you (getting more jargon-y, now) reduce how many times you go to access memory and it's not in a CPU cache. Accessing CPU cache is much faster than having to go to RAM, I believe. When doing things at a large scale, those time savings add up quick.
If I were to do up a jsfiddle (no time, sorry), I'd be comparing the time it takes to perform certain operations on typed arrays vs non-typed arrays. For example, I imagine "adding 100,000 items" being a performance benchmark I'd try, to compare how the structures handle things.
What I can do is link you to: http://jsperf.com/typed-arrays-vs-arrays/7
All I did to get that was google "typed arrays javascript performance" and clicked the first item (I'm familiar with jsperf, too, so that helped me decide).
Take the following code example:
var myObject = {};
var i = 100;
while (i--) {
myObject["foo"+i] = new Foo(i);
}
console.log(myObject["foo42"].bar());
I have a few questions.
What kind of data structure do the major engines (IE, Mozilla, Chrome, Safari) use for storing key-value pairs? I'd hope it's some kind Binary Search tree, but I think they may use linked lists (due to the fact iterating is done in insertion order).
If they do use a search tree, is it self balancing? Because the above code with a conventional search tree will create an unbalanced tree, causing worst case scenario of O(n) for searching, rather than O(log n) for a balanced tree.
I'm only asking this because I will be writing a library which will require efficient retrieval of keys from a data structure, and while I could implement my own or an existing red-black tree I would rather use native object properties if they're efficient enough.
The question is hard to answer for a couple reasons. First, the modern browsers all heavily and dynamically optimize code while it is executing so the algorithms chosen to access the properties might be different for the same code. Second, each engine uses different algorithms and heuristics to determine which access algorithm to use. Third, the ECMA specification dictates what the result of must be, not how the result is achieved so the engines have a lot of freedom to innovate in this area.
That said, given your example all the engines I am familiar with will use some form of a hash table to retrieve the value associated with foo42 from myobject. If you use an object like an associative array JavaScript engines will tend to favor a hash table. None that I am aware of use a tree for string properties. Hash tables are worst case O(N), best case O(1) and tend to be closer to O(1) than O(N) if the key generator is any good. Each engine will have a pattern you could use to get it to perform O(N) but that will be different for each engine. A balanced tree would guarantee worst case O(log N) but modifying a balanced tree while keeping it balanced is not O(log N) and hash tables are more often better than O(log N) for string keys and are O(1) to update (once you determine you need to, which is the same big-O as read) if there is space in the table (periodically O(N) to rebuild the table but the tables usually double in space which means you will only pay O(N) 7 or 8 times for the life of the table).
Numeric properties are special, however. If you access an object using integer numeric properties that have few or no gaps in range, that is, use the object like it is an array, the values will tend to be stored in a linear block of memory with O(1) access. Even if your access has gaps the engines will probably shift to a sparse array access which will probably be, at worst, O(log N).
Accessing a property by identifier is also special. If you access the property like,
myObject.foo42
and execute this code often (that is, the speed of this matters) and with the same or similar object this is likely to be optimized into one or two machine instructions. What makes objects similar also differs for each engine but if they are constructed by the same literal or function they are more likely to be treated as similar.
No engine that does at all well on the JavaScript benchmarks will use the same algorithm for every object. They all must dynamically determine how the object is being used and try to adjust the access algorithm accordingly.
Arrays in JavaScript are very easy to modify by adding and removing items. It somewhat masks the fact that most languages arrays are fixed-size, and require complex operations to resize. It seems that JavaScript makes it easy to write poorly performing array code. This leads to the question:
What performance (in terms of big O time complexity) can I expect from JavaScript implementations in regards to array performance?
I assume that all reasonable JavaScript implementations have at most the following big O's.
Access - O(1)
Appending - O(n)
Prepending - O(n)
Insertion - O(n)
Deletion - O(n)
Swapping - O(1)
JavaScript lets you pre-fill an array to a certain size, using new Array(length) syntax. (Bonus question: Is creating an array in this manner O(1) or O(n)) This is more like a conventional array, and if used as a pre-sized array, can allow O(1) appending. If circular buffer logic is added, you can achieve O(1) prepending. If a dynamically expanding array is used, O(log n) will be the average case for both of those.
Can I expect better performance for some things than my assumptions here? I don't expect anything is outlined in any specifications, but in practice, it could be that all major implementations use optimized arrays behind the scenes. Are there dynamically expanding arrays or some other performance-boosting algorithms at work?
P.S.
The reason I'm wondering this is that I'm researching some sorting algorithms, most of which seem to assume appending and deleting are O(1) operations when describing their overall big O.
NOTE: While this answer was correct in 2012, engines use very different internal representations for both objects and arrays today. This answer may or may not be true.
In contrast to most languages, which implement arrays with, well, arrays, in Javascript Arrays are objects, and values are stored in a hashtable, just like regular object values. As such:
Access - O(1)
Appending - Amortized O(1) (sometimes resizing the hashtable is required; usually only insertion is required)
Prepending - O(n) via unshift, since it requires reassigning all the indexes
Insertion - Amortized O(1) if the value does not exist. O(n) if you want to shift existing values (Eg, using splice).
Deletion - Amortized O(1) to remove a value, O(n) if you want to reassign indices via splice.
Swapping - O(1)
In general, setting or unsetting any key in a dict is amortized O(1), and the same goes for arrays, regardless of what the index is. Any operation that requires renumbering existing values is O(n) simply because you have to update all the affected values.
guarantee
There is no specified time complexity guarantee for any array operation. How arrays perform depends on the underlying datastructure the engine chooses. Engines might also have different representations, and switch between them depending on certain heuristics. The initial array size might or might not be such an heuristic.
reality
For example, V8 uses (as of today) both hashtables and array lists to represent arrays. It also has various different representations for objects, so arrays and objects cannot be compared. Therefore array access is always better than O(n), and might even be as fast as a C++ array access. Appending is O(1), unless you reach the size of the datastructure and it has to be scaled (which is O(n)). Prepending is worse. Deletion can be even worse if you do something like delete array[index] (don't!), as that might force the engine to change its representation.
advice
Use arrays for numeric datastructures. That's what they are meant for. That's what engines will optimize them for. Avoid sparse arrays (or if you have to, expect worse performance). Avoid arrays with mixed datatypes (as that makes internal representations more complex).
If you really want to optimize for a certain engine (and version), check its sourcecode for the absolute answer.