V8: Heterogeneous Array Literals - javascript

I'm getting lost in the weeds with V8 source as well as articles on the subject and I came across a blog post which stated:
If you are forced to fill up an array with heterogeneous elements, let
V8 know early on by using an array literal especially with fixed-size
small arrays.
let array = [77, 88, 0.5, true]; //V8 knows to not allocate multiple times.
If this is true, then why is it true? Why an array literal? What's so special about that vs creating an array via a constructor? Being new to the V8 source, it's difficult to track-down where the difference in homogeneous/heterogeneous arrays lie.
Also, if an answerer can point me towards the relevant V8 source, that'd be appreciated.
EDIT: slight clarification on my question (array literal vs. array constructor)

From this blog post provided by Mathias, a V8 developer:
Common elements kinds
While running JavaScript code, V8 keeps track of what kind of elements
each array contains. This information allows V8 to optimize any
operations on the array specifically for this type of element. For
example, when you call reduce, map, or forEach on an array, V8 can
optimize those operations based on what kind of elements the array
contains.
Take this array, for example:
const array = [1, 2, 3];
What kinds of elements does it contain? If you’d ask the typeof
operator, it would tell you the array contains numbers. At the
language-level, that’s all you get: JavaScript doesn’t distinguish
between integers, floats, and doubles — they’re all just numbers.
However, at the engine level, we can make more precise distinctions.
The elements kind for this array is PACKED_SMI_ELEMENTS. In V8, the
term Smi refers to the particular format used to store small integers.
(We’ll get to the PACKED part in a minute.)
Later adding a floating-point number to the same array transitions it to a more generic elements kind:
const array = [1, 2, 3];
// elements kind: PACKED_SMI_ELEMENTS
array.push(4.56);
// elements kind: PACKED_DOUBLE_ELEMENTS
Adding a string literal to the array changes its elements kind once again.
const array = [1, 2, 3];
// elements kind: PACKED_SMI_ELEMENTS
array.push(4.56);
// elements kind: PACKED_DOUBLE_ELEMENTS
array.push('x');
// elements kind: PACKED_ELEMENTS
....
V8 assigns an elements kind to each array.
The elements kind of an array is not set in stone — it can change at runtime. In the earlier example, we transitioned from PACKED_SMI_ELEMENTS to PACKED_ELEMENTS.
Elements kind transitions can only go from specific kinds to more general kinds.
THUS, behind the scenes, if you're constantly adding different types of data to the array at run time, the V8 engine has to adjust behind the scenes, losing the default optimization.
As far as constructor vs. array literal
If you don’t know all the values ahead of time, create an array using the array literal, and later push the values to it:
const arr = [];
arr.push(10);
This approach ensures that the array never transitions to a holey elements kind. As a result, V8 can optimize any future operations on the array more efficiently.
Also, to clarify what is meant by holey,
Creating holes in the array (i.e. making the array sparse) downgrades
the elements kind to its “holey” variant. Once the array is marked as holey, it’s holey forever — even if it’s packed later!
It might also be worth mentioning that V8 currently has 21 different element kinds.
More resources
V8 Internals for JavaScript Developers - a talk by Mathias Bynens
JavaScript Engines - How Do They Even? - a talk by Franziska Hinkelmann

Related

JavaScript Objects and Sorted Integer Keys under the hood

I have searched a bunch of other resources but have not been able to find a quality answer to this question.
JavaScript objects sort their integer keys in ascending order, not insertion order.
const lookup = {}
lookup['1'] = 1
lookup['3'] = 3
lookup['2'] = 2
console.log(Object.keys(lookup)) -> ['1', '2', '3']
That much is simple. But what is the big O notation of that internal sorting process? Some sort algorithm must be happening under the hood to sort those keys as they are inserted but I can't find out which one it is.
Array.sort() with a length of <= 10 is Insertion Sort and
Array.sort() with a length > 10 is Quick Sort
But Array.sort() is reordering an object's keys based off of the sorting of its values.
How does JavaScript under the hood sort its keys on insertion?
(V8 developer here.)
It depends, as so often.
If the integer-keyed properties are sufficiently dense, the object will use an array under the hood to store them. Compared to sorting algorithms, that'd be closest to "radix sort", with the notable difference that there is no explicit sorting step: the sort order arises as a "free" side effect of the way elements are stored. When lookup[2] = ... is executed, the value will be written into the respective slot of the array. If the array isn't big enough, a new array is allocated, and existing entries are copied over; since that doesn't happen too often, the cost of an insertion is still "O(1) amortized". When getting the list of integer-keyed properties, then the array is already in sorted order.
If the integer-keyed properties are too sparse, the object will switch to using a dictionary as the backing store. Hash-based dictionaries store entries in their own "random" order, so in that case Object.keys() and similar operations actually have to perform an explicit sorting step. Looks like we're currently relying on C++'s std::sort for that, but that's very much an implementation detail that could change (not just on V8's side, also how std::sort is implemented depends on the standard library that V8 is linked against).
Array.sort() with a length of <= 10 is Insertion Sort and Array.sort() with a length > 10 is Quick Sort
No, not any more. We switched to TimSort in 2018.

How are untyped javascript arrays laid out in memory considering they're not homogeneous?

I'm curious about the performance characteristics of untyped javascript arrays since they're not homogeneous, and was wondering how that is dealt with internally.
For example, if I have a number and some arbitrary object in an array, are they stored contiguously in memory? Are all primitives boxed and the array just contains pointers to everything? Is it an implementation detail of the VM?
It depends on the JavaScript engine implementation.
But in general in JavaScript arrays, integers and floats are stored by-value and all other objects by-reference.
In V8 the array type will be either PACKED_ELEMENTS or HOLEY_ELEMENTS (depending on how the array was created/populated) and each string will additionally be stored separately on the heap.
To verify, use the %DebugPrint function in a debug version of the V8 engine (you can get one using jsvu tool):
d8> var a = [1, 2, 'aaa']; %DebugPrint(a);
DebugPrint: 000003B13FECFC89: [JSArray]
- elements: 0x03b13fecfc31 <FixedArray[3]> {
0: 1
1: 2
2: 0x00c73b3e0fe1 <String[#3]: aaa>
}
Ryan Peden seems to have done some checking on all the juicy details (and fairly recently):
https://ryanpeden.com/how-do-javascript-arrays-work-under-the-hood/

How are the JavaScript Arrays internally resizing?

I've been trying to implement a collection type of class (similar to List found in C#) in JavaScript that has some custom functionalities. I also wanted it to be somewhat optimized (I've read some articles on how to properly use JavaScript Arrays).
I thought to myself "if we don't define an initial size to an Array and we keep adding objects to it, internally it will have to allocate a new size for each insertion, that must be slow. I can avoid this by allocating a new size myself (changing the array length), somewhat similar to how it is done in C#, doubling in size whenever the max capacity is reached (I know it's not this trivial but it's a start)".
I tried to implement this idea and found out that it is way slower (about 10 times slower):
// This simplified approach of my implementation is faster...
var array = [];
var counter = 0;
function addItem(newItem) {
array[++counter] = newItem;
}
// ...then this version that resizes the array when a limit is reached
var array = [];
array.length = INITIAL_SIZE;
/*
Alternatively
var array = new Array(INITIAL_SIZE);
*/
var counter = 0;
function addItem(newItem) {
if( CheckCapacity(counter + 1) ) { // Function that checks if the maximum size is reached and if it is, change the array.length to the new size
array[++counter] = newItem;
}
}
Before testing this, I thought to myself, "since I've a new size for the array when I call CheckCapacity(counter + 1), internally it (JavaScript Array) won't have to make as much operations compared to the first function since I make sure that there is space available, more than necessary", i.e., the array[++counter] = newItem line on the second function should be faster compared to the same one in the first function.
I've even used different arrays which contained pre-calculated sizes for the one holding the items; it still was slower.
So back to my question, how is the implementation of a JavaScript Array allocating the necessary size? Am I correct to assume that not much can be done to speed this process up? To me it made sense that the of the drawbacks of having an object (the JavaScript Array) that dynamically allocates more memory each time a new item is added, would be the loss of speed (unless it has pretty good algorithms implemented, but I don't know, hence my question).
In JavaScript, an Array is an abstraction. How it is implemented (and when allocation and resizing is performed) is left up to the JavaScript engine - the ECMAScript specification does not dictate how this is done. So there is basically no precise way to know.
In practice, JavaScript engines are very clever about how the allocate memory and the make sure not to allocate too much. In my opinion, they are far more sophisticated than C#'s List -- because JavaScript engines can dynamically change the underlying data structure depending on the situation. The algorithms vary, but most will consider whether there are any "holes" in your array:
var array = [];
array[0] = "foo" // Is a resizable array
array[1] = "bar" // Is a resizable array
array[2] = "baz" // Is a resizable array
array[1000000] = "hello"; // Is now a hash table
console.log(array[1000000]) // "hello"
If you use arrays normally and use contiguous keys starting at zero, then there are no "holes" and most JavaScript engines will represent the JavaScript array by using a resizable array data structure. Now consider the fourth assignment, I've created a so-called "hole" of roughly a size of a million (the hole spans slots 3-999999). It turns out, JavaScript engines are clever enough not to allocate ~1 million slots in memory for this massive hole. It detects that we have a hole, it will now, represent the JavaScript array using a Dictionary / hash-table like data structure (it uses a binary search tree where the keys are hashed) to save space. It won't store space for the hole, just four mappings: (0, "foo"), (1, "bar"), (2, "baz"), (1000000, "hello").
Unfortunately, accessing the Array is now slower for the engine because it will now have to compute a hash and traverse a tree. When there are no holes, we use a resizable array and we have quicker access times, but when we have a hole the Array's performance is slower. The common terminology is to say an Array is a dense array, when it is without any holes (it uses a resizable array = better performance), and an Array is a sparse array, when it with one or more holes (it uses a hash table = slower performance). For best performance in general, try to use dense arrays.
Now to finish off, let me tell you that the following is a bad idea:
var array = new Array(1000000);
array[0] = "foo"; // Is a hash table
The array above has a hole of size ~1 million (it's like this: ["foo", undefined, undefined, ... undefined]) and so therefore, it is using a hash-table as the underlying data structure. So implementing the resizing yourself is a bad idea - it will create a hole and cause worst performance than better. You're only confusing the JavaScript engine.
This is what your code was doing, your array always had a hole in it and therefore was using a hash table as the underlying data structure; giving slower performance compared to an array without any holes (aka the first version of your code).
Am I correct to assume that not much can be done to speed this process up?
Yes, there is little to be done on the user's side regarding pre-allocation of space. To speed up JavaScript arrays in general you want to avoid creating sparse arrays (avoid created holes):
Don't pre-allocate using new Array(size). Instead "grow as you go". The engine will work out the size of the underlying resizable array itself.
Use contiguous integer keys starting at 0. Don't start from a big integer. Don't add keys that are not integers (e.g. don't use strings as keys).
Try not to delete keys in the middle of arrays (don't delete the element at index 5 from an array with indices 0-9 filled in).
Don't convert to and from dense and sparse arrays (i.e. don't repeatedly add and remove holes). There's an overhead for the engine to convert to and from the resizable array vs hash-table representations.
The disadvantage of [JavaScript Arrays over C# Lists is that they] dynamically allocate more memory each time a new item is added
No, not necessarily. C# Lists and JavaScript Arrays are basically the same when the JavaScript array has no holes. Both are resizable arrays. The difference is that:
C# Lists give the user more control over the behaviour of the resizable array. In JavaScript, you have no control over it -- it's inside the engine.
C# Lists allow the user preallocate memory for better performance, whereas in JavaScript, you should let the engine automatically work out how to preallocate memory in the underlying resizable array for better performance.

how does javascript move to a specific index in an array?

This is more a general question about the inner workings of the language. I was wondering how javascript gets the value of an index. For example when you write array[index] does it loop through the array till it finds it? or by some other means? the reason I ask is because I have written some code where I am looping through arrays to match values and find points on a grid, I am wondering if performance would be increased by creating and array like array[gridX][gridY] or if it will make a difference. what I am doing now is going through a flat array of objects with gridpoints as properties like this:
var array = [{x:1,y:3}];
then looping through and using those coordinates within the object properties to identify and use the values contained in the object.
my thought is that by implementing a multidimensional grid it would access them more directly as can specify a gridpoint by saying array[1][3] instead of looping through and doing:
for ( var a = 0; a < array.length; a += 1 ){
if( array[a].x === 1 && array[a].y === 3 ){
return array[a];
}
}
or something of the like.
any insight would be appreciated, thanks!
For example when you write array[index] does it loop through the array till it finds it? or by some other means?
This is implementation defined. Javascript can have both numeric and string keys and the very first Javascript implementations did do this slow looping to access things.
However, nowadays most browsers are more efficient and store arrays in two parts, a packed array for numeric indexes and a hash table for the rest. This means that accessing a numeric index (for dense arrays without holes) is O(1) and accessing string keys and sparse arrays is done via hash tables.
I am wondering if performance would be increased by creating and array like array[gridX][gridY] or if it will make a difference. what I am doing now is going through a flat array of objects with gridpoints as properties like this array[{x:1,y:3}]
Go with the 2 dimension array. Its a much simpler solution and is most likely going to be efficient enough for you.
Another reason to do this is that when you use an object as an array index what actually happens is that the object is converted to a string and then that string is used as a hash table key. So array[{x:1,y:3}] is actually array["[object Object]"]. If you really wanted, you could override the toString method so not all grid points serialize to the same value, but I don't think its worth the trouble.
Whether it's an array or an object, the underlying structure in any modern javascript engine is a hashtable. Need to prove it? Allocate an array of 1000000000 elements and notice the speed and lack of memory growth. Javascript arrays are a special case of Object that provides a length method and restricts the keys to integers, but it's sparse.
So, you are really chaining hashtables together. When you nest tables, as in a[x][y], you creating multiple hashtables, and it will require multiple visits to resolve an object.
But which is faster? Here is a jsperf testing the speed of allocation and access, respectively:
http://jsperf.com/hash-tables-2d-versus-1d
http://jsperf.com/hash-tables-2d-versus-1d/2
On my machines, the nested approach is faster.
Intuition is no match for the profiler.
Update: It was pointed out that in some limited instances, arrays really are arrays underneath. But since arrays are specialized objects, you'll find that in these same instances, objects are implemented as arrays as well (i.e., {0:'hello', 1:'world'} is internally implemented as an array. But this shouldn't frighten you from using arrays with trillions of elements, because that special case will be discarded once it no longer makes sense.
To answer your initial question, in JavaScript, arrays are nothing more than a specialized type of object. If you set up an new Array like this:
var someArray = new Array(1, 2, 3);
You end up with an Array object with a structure that looks more-or-less, like this (Note: this is strictly in regards to the data that it is storing . . . there is a LOT more to an Array object):
someArray = {
0: 1,
1: 2,
2: 3
}
What the Array object does add to the equation, though, is built in operations that allow you to interact with it in the [1, 2, 3] concept that you are used to. push() for example, will use the array's length property to figure out where the next value should be added, creates the value in that position, and increments the length property.
However (getting back to the original question), there is nothing in the array structure that is any different when it comes to accessing the values that it stores than any other property. There is no inherent looping or anything like that . . . accessing someArray[0] is essentially the same as accessing someArray.length.
In fact, the only reason that you have to access standard array values using the someArray[N] format is that, the stored array values are number-indexed, and you cannot directly access object properties that begin with a number using the "dot" technique (i.e., someArray.0 is invalid, someArray[0] is not).
Now, admittedly, that is a pretty simplistic view of the Array object in JavaScript, but, for the purposes of your question, it should be enough. If you want to know more about the inner workings, there is TONS of information to be found here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
single dimension hash table with direct access:
var array = {
x1y3: 999,
x10y100: 0
};
function getValue(x, y) {
return array['x' + x + 'y' + y];
}

How do modern browsers implement JS Array, specifically adding elements?

By this I mean when calling .push() on an Array object and JavaScript increases the capacity (in number of elements) of the underlying "array". Also, if there is a good resource for finding this sort of information for JS, that would be helpful to include.
edit
It seems that the JS Array is like an object literal with special properties. However, I'm interested in a lower level of detail--how browsers implement this in their respective JS engines.
There cannot be any single correct answer to this qurstion. An array's mechanism for expanding is an internal implementation detail and can vary from one JS implementation to another. In fact, the Tamarin engine has two different implementations used internally for arrays depending on if it determines if the array is going to be sequential or sparse.
This answer is wrong. Please see #Samuel Neff's answer and the following resources:
http://news.qooxdoo.org/javascript-array-performance-oddities-characteristics
http://jsperf.com/array-popuplation-direction
Arrays in JavaScript don't have a capacity since they aren't real arrays. They're actually just object hashes with a length property and properties of "0", "1", "2", etc. When you do .push() on an array, it effectively does:
ary[ ary.length++ ] = the_new_element; // set via hash
Javascript does include a mechanism to declare the length of your array like:
var foo = new Array(3);
alert(foo.length); // alerts 3
But since arrays are dynamic in javascript there is no reason to do this, you don't have to manually allocate your arrays. The above example does not create a fixed length array, just initializes it with 3 undefined elements.
// Edit: I either misread your question or you changed it, sorry I don't think this is what you were asking.

Categories

Resources