JavaScript Objects and Sorted Integer Keys under the hood

JavaScript Objects and Sorted Integer Keys under the hood - javascript

I have searched a bunch of other resources but have not been able to find a quality answer to this question.
JavaScript objects sort their integer keys in ascending order, not insertion order.
const lookup = {}
lookup['1'] = 1
lookup['3'] = 3
lookup['2'] = 2
console.log(Object.keys(lookup)) -> ['1', '2', '3']
That much is simple. But what is the big O notation of that internal sorting process? Some sort algorithm must be happening under the hood to sort those keys as they are inserted but I can't find out which one it is.
Array.sort() with a length of <= 10 is Insertion Sort and
Array.sort() with a length > 10 is Quick Sort
But Array.sort() is reordering an object's keys based off of the sorting of its values.
How does JavaScript under the hood sort its keys on insertion?

(V8 developer here.)
It depends, as so often.
If the integer-keyed properties are sufficiently dense, the object will use an array under the hood to store them. Compared to sorting algorithms, that'd be closest to "radix sort", with the notable difference that there is no explicit sorting step: the sort order arises as a "free" side effect of the way elements are stored. When lookup[2] = ... is executed, the value will be written into the respective slot of the array. If the array isn't big enough, a new array is allocated, and existing entries are copied over; since that doesn't happen too often, the cost of an insertion is still "O(1) amortized". When getting the list of integer-keyed properties, then the array is already in sorted order.
If the integer-keyed properties are too sparse, the object will switch to using a dictionary as the backing store. Hash-based dictionaries store entries in their own "random" order, so in that case Object.keys() and similar operations actually have to perform an explicit sorting step. Looks like we're currently relying on C++'s std::sort for that, but that's very much an implementation detail that could change (not just on V8's side, also how std::sort is implemented depends on the standard library that V8 is linked against).
Array.sort() with a length of <= 10 is Insertion Sort and Array.sort() with a length > 10 is Quick Sort
No, not any more. We switched to TimSort in 2018.

Related

Realm-JS: Performant way to find the index of an element in sorted results list

I am searching for a perfomant way to find the index of a given realm-object in a sorted results list.
I am aware of this similar question, which was answered with using indexOf, so my current solution looks like this:
const sortedRecords = realm.objects('mySchema').sorted('time', true) // 'time' property is a timestamp
// grab element of interest by id (e.g. 123)
const item = realm.objectForPrimaryKey('mySchema','123')
// find index of that object in my sorted results list
const index = sortedRecords.indexOf(item)
My basic concern here is performance for lager datasets. Is the indexOf implementation of a realm-list improved for this in any way, or is it the same as from a JavaScript array? I know there is the possibility to create indexed properties, would indexing the time property improve the performance in this case?
Note:
In the realm-js api documentation, the indexOf section does not reference to Array.prototype.indexOf, as other sections do. This made me optimistic it's an own implementation, but it's not stated clearly.

Realm query methods return a Results object which is quite different from an Array object, the main difference is that the first one can change over time even without calling methods on it: adding and/or deleting record to the source schema can result in a change to Results object.
The only common thing between Results.indexOf and Array.indexOf is the name of the method.
Once said that is easy to also say that it makes no sense to compare the efficiency of the two methods.
In general, a problem common to all indexOf implementations is that they need a sequential scan and in the worst case (i.e. the not found case) a full scan is required. The wort implemented indexOf executed against 10 elements has no impact on program performances while the best implemented indexOf executed against 1M elements can have a severe impact on program performances. When possible it's always a good idea avoiding to use indexOf on large amounts of data.
Hope this helps.

V8: Heterogeneous Array Literals

I'm getting lost in the weeds with V8 source as well as articles on the subject and I came across a blog post which stated:
If you are forced to fill up an array with heterogeneous elements, let
V8 know early on by using an array literal especially with fixed-size
small arrays.
let array = [77, 88, 0.5, true]; //V8 knows to not allocate multiple times.
If this is true, then why is it true? Why an array literal? What's so special about that vs creating an array via a constructor? Being new to the V8 source, it's difficult to track-down where the difference in homogeneous/heterogeneous arrays lie.
Also, if an answerer can point me towards the relevant V8 source, that'd be appreciated.
EDIT: slight clarification on my question (array literal vs. array constructor)

From this blog post provided by Mathias, a V8 developer:
Common elements kinds
While running JavaScript code, V8 keeps track of what kind of elements
each array contains. This information allows V8 to optimize any
operations on the array specifically for this type of element. For
example, when you call reduce, map, or forEach on an array, V8 can
optimize those operations based on what kind of elements the array
contains.
Take this array, for example:
const array = [1, 2, 3];
What kinds of elements does it contain? If you’d ask the typeof
operator, it would tell you the array contains numbers. At the
language-level, that’s all you get: JavaScript doesn’t distinguish
between integers, floats, and doubles — they’re all just numbers.
However, at the engine level, we can make more precise distinctions.
The elements kind for this array is PACKED_SMI_ELEMENTS. In V8, the
term Smi refers to the particular format used to store small integers.
(We’ll get to the PACKED part in a minute.)
Later adding a floating-point number to the same array transitions it to a more generic elements kind:
const array = [1, 2, 3];
// elements kind: PACKED_SMI_ELEMENTS
array.push(4.56);
// elements kind: PACKED_DOUBLE_ELEMENTS
Adding a string literal to the array changes its elements kind once again.
const array = [1, 2, 3];
// elements kind: PACKED_SMI_ELEMENTS
array.push(4.56);
// elements kind: PACKED_DOUBLE_ELEMENTS
array.push('x');
// elements kind: PACKED_ELEMENTS
....
V8 assigns an elements kind to each array.
The elements kind of an array is not set in stone — it can change at runtime. In the earlier example, we transitioned from PACKED_SMI_ELEMENTS to PACKED_ELEMENTS.
Elements kind transitions can only go from specific kinds to more general kinds.
THUS, behind the scenes, if you're constantly adding different types of data to the array at run time, the V8 engine has to adjust behind the scenes, losing the default optimization.
As far as constructor vs. array literal
If you don’t know all the values ahead of time, create an array using the array literal, and later push the values to it:
const arr = [];
arr.push(10);
This approach ensures that the array never transitions to a holey elements kind. As a result, V8 can optimize any future operations on the array more efficiently.
Also, to clarify what is meant by holey,
Creating holes in the array (i.e. making the array sparse) downgrades
the elements kind to its “holey” variant. Once the array is marked as holey, it’s holey forever — even if it’s packed later!
It might also be worth mentioning that V8 currently has 21 different element kinds.
More resources
V8 Internals for JavaScript Developers - a talk by Mathias Bynens
JavaScript Engines - How Do They Even? - a talk by Franziska Hinkelmann

how does javascript move to a specific index in an array?

This is more a general question about the inner workings of the language. I was wondering how javascript gets the value of an index. For example when you write array[index] does it loop through the array till it finds it? or by some other means? the reason I ask is because I have written some code where I am looping through arrays to match values and find points on a grid, I am wondering if performance would be increased by creating and array like array[gridX][gridY] or if it will make a difference. what I am doing now is going through a flat array of objects with gridpoints as properties like this:
var array = [{x:1,y:3}];
then looping through and using those coordinates within the object properties to identify and use the values contained in the object.
my thought is that by implementing a multidimensional grid it would access them more directly as can specify a gridpoint by saying array[1][3] instead of looping through and doing:
for ( var a = 0; a < array.length; a += 1 ){
if( array[a].x === 1 && array[a].y === 3 ){
return array[a];
}
}
or something of the like.
any insight would be appreciated, thanks!

For example when you write array[index] does it loop through the array till it finds it? or by some other means?
This is implementation defined. Javascript can have both numeric and string keys and the very first Javascript implementations did do this slow looping to access things.
However, nowadays most browsers are more efficient and store arrays in two parts, a packed array for numeric indexes and a hash table for the rest. This means that accessing a numeric index (for dense arrays without holes) is O(1) and accessing string keys and sparse arrays is done via hash tables.
I am wondering if performance would be increased by creating and array like array[gridX][gridY] or if it will make a difference. what I am doing now is going through a flat array of objects with gridpoints as properties like this array[{x:1,y:3}]
Go with the 2 dimension array. Its a much simpler solution and is most likely going to be efficient enough for you.
Another reason to do this is that when you use an object as an array index what actually happens is that the object is converted to a string and then that string is used as a hash table key. So array[{x:1,y:3}] is actually array["[object Object]"]. If you really wanted, you could override the toString method so not all grid points serialize to the same value, but I don't think its worth the trouble.

Whether it's an array or an object, the underlying structure in any modern javascript engine is a hashtable. Need to prove it? Allocate an array of 1000000000 elements and notice the speed and lack of memory growth. Javascript arrays are a special case of Object that provides a length method and restricts the keys to integers, but it's sparse.
So, you are really chaining hashtables together. When you nest tables, as in a[x][y], you creating multiple hashtables, and it will require multiple visits to resolve an object.
But which is faster? Here is a jsperf testing the speed of allocation and access, respectively:
http://jsperf.com/hash-tables-2d-versus-1d
http://jsperf.com/hash-tables-2d-versus-1d/2
On my machines, the nested approach is faster.
Intuition is no match for the profiler.
Update: It was pointed out that in some limited instances, arrays really are arrays underneath. But since arrays are specialized objects, you'll find that in these same instances, objects are implemented as arrays as well (i.e., {0:'hello', 1:'world'} is internally implemented as an array. But this shouldn't frighten you from using arrays with trillions of elements, because that special case will be discarded once it no longer makes sense.

To answer your initial question, in JavaScript, arrays are nothing more than a specialized type of object. If you set up an new Array like this:
var someArray = new Array(1, 2, 3);
You end up with an Array object with a structure that looks more-or-less, like this (Note: this is strictly in regards to the data that it is storing . . . there is a LOT more to an Array object):
someArray = {
0: 1,
1: 2,
2: 3
}
What the Array object does add to the equation, though, is built in operations that allow you to interact with it in the [1, 2, 3] concept that you are used to. push() for example, will use the array's length property to figure out where the next value should be added, creates the value in that position, and increments the length property.
However (getting back to the original question), there is nothing in the array structure that is any different when it comes to accessing the values that it stores than any other property. There is no inherent looping or anything like that . . . accessing someArray[0] is essentially the same as accessing someArray.length.
In fact, the only reason that you have to access standard array values using the someArray[N] format is that, the stored array values are number-indexed, and you cannot directly access object properties that begin with a number using the "dot" technique (i.e., someArray.0 is invalid, someArray[0] is not).
Now, admittedly, that is a pretty simplistic view of the Array object in JavaScript, but, for the purposes of your question, it should be enough. If you want to know more about the inner workings, there is TONS of information to be found here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array

single dimension hash table with direct access:
var array = {
x1y3: 999,
x10y100: 0
};
function getValue(x, y) {
return array['x' + x + 'y' + y];
}

Javascript hash table vs associative arrays

I'm new to javascript and I cannot find answer about some issues, one is the following:
Why in js would be useful an hash table data structure when we can use objects as associative arrays that is simple to use and have a very good performance?

why in js would be useful an hash table data structure when we can use
objects as associative arrays that is simple to use and have a very
good performance?
The only reason I can imaging to implement an hash table data structure is if you want to preserve the ordering of the elements. The for…in loop for objects doesn't guarantee to you in which order the properties ("keys") are returned, even if commonly you can obtain them in chronological order (last added, last returned). But it's not guarantee. For instance, old version of Opera returned the properties in an apparently random order. So, if you need an "ordered" hash table, you need to implement it by yourself.

A javascript hashtable and javascript associative array (and javascript objects) are all the same thing in the underline implementation. These are just different syntax.
So :
var a = {};
a.id = "aa";
is same as:
var a = new Object();
a.id = "aa";
Which is is same as:
var a = {};
a["id"] = "aa";

Associative arrays and hashtables are the same.
These expressions usually means the same too: dictionary, associative array, hashmap, map, hashtable and... the list goes on.
There is a difference between simple ARRAY and HASHTABLE though.
If you want to search an item in an array, the more items you have inside it, the longer will the search takes.
For example searching an item in an array with 10000 elements could takes 100 times longer than in an array with 100 elements.
With hashtables, no matter how many elements you have, the performance remains the same. (Be it 100 elements or even 99999999 elements...)

JavaScript sort with numbers

The following program (taken from a tutorial) prints the numbers in an array in order from lowest to highest. In this case, the result will be 2,4,5,13,31
My question relates to the paramaters "a" and "b" for the function compareNumbers. When the function is called in numArray.sort(compareNumbers) what numbers will be the parameters a and b for the function. Does it just move along the array. For example, start with a=13 and b=2? After that, does the function run again comparing a=2 and b=31? or would it next compare a=31 and b=4?
Can someone please explain how that part works and also how it manages to sort them from lowest to highest? I don`t see how the function manages to do the necessary calculations on the numbers in the array.
function compareNumbers(a,b) {
return a - b;
}
var numArray = [13,2,31,4,5];
alert(numArray.sort(compareNumbers));

The particular pairs that get passed in depend on the sorting algorithm being used. As the algorithm tries to go about sorting the range, it needs to be able to compare pairs of values to determine their ordering. Whenever this happens, it will call your function to get that comparison.
Because of this, without inside knowledge about how the sorting algorithm works, you cannot predict what pairs will get compared. The choice of algorithm will directly determine what elements get compared and in what order.
Interestingly, though, you can actually use the comparison function to visualize how the sort works or to reverse-engineer the sorting algorithm! The website sortviz.org has many visualizations of sorting algorithms generated by passing custom comparators into sorting functions that track the positions of each element. If you take a look, you can see how differently each algorithm moves its elements around.
Even more interestingly, you can use comparison functions as offensive weapons! Some sorting algorithms, namely quicksort, have particular inputs that can cause them to run much more slowly than usual. In "A Killer Adversary for Quicksort," the author details how to use a custom comparator to deliberate construct a bad input for a sorting algorithm.
Hope this helps!

The two parameters will be elements of your array. The system will compare enough pairs to be able to sort them correctly. Nothing else is guaranteed.
There are lots of things the sort method could be doing under the hood; see, e.g., http://en.wikipedia.org/wiki/Sorting_algorithm for some of them. Most Javascript implementations probably use some variant of either quicksort or mergesort.
(Here are super-brief descriptions of those. Quicksort is: pick an element in the array, rearrange the array to put everything smaller than that in front of everything larger, then sort the "smaller" and "larger" bits. Mergesort is: sort the first half of the array, sort the second half of the array, and then merge the two sorted halves. In both cases you need to sort smaller arrays, which you do with the same algorithm until you get to arrays so small that sorting them is trivial. In both cases, good practical implementations do all sorts of clever things I haven't mentioned.)

It will be called for all pairs of a,b that sorting algorithm need to get all array sorted. Check out http://en.wikipedia.org/wiki/Sorting_algorithm for brief list of sorting algorithms.

When you pass a function to Array.sort(), it expects two parameters and returns a numerical value.
If you return a negative value, the first parameter will be placed before the second parameter in the array.
If you return a positive value, the first parameter will be placed after the second parameter in the array.
If you return 0, they will stay in their current position.
By doing return a - b;, you are returning a negative number if a is less than b (2 - 13 = -11), a positive number if b is less than a (13 - 2 = 11), and zero if they are even (13 - 13 = 0).
As far as which numbers are compared in what order, I believe that is up to the javascript engine to determine.
Check out the documentation on javascript array sorting at the MDC Doc Center for more detailed information.
https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/sort
(BTW, I always check the MDC Doc Center for any questions about how javascript works, they have the best information on the language AFAIK.)

Develop Reference

JavaScript is the programming language of the Web.

JavaScript Objects and Sorted Integer Keys under the hood - javascript

Related

Realm-JS: Performant way to find the index of an element in sorted results list

V8: Heterogeneous Array Literals

how does javascript move to a specific index in an array?

Javascript hash table vs associative arrays

JavaScript sort with numbers

Categories

Resources