How to pre-allocate a dense array in Javascript?

How to pre-allocate a dense array in Javascript? - javascript

When using the new Array(size) ctor, if size is not a constant, JS seems to create a sparse array in some browsers (at least in Chrome), causing access to be much slower than when using the default ctor, as shown here.
That is exactly the opposite of what I want: I pre-allocate an array of given size to avoid dynamic re-allocation and thereby improving performance. Is there any way to achieve that goal?
Please note that this question is not about the ambiguity of the new Array(size) ctor. I posted a recommendation on that here.

100000 is 1 past the pre-allocation threshold, 99999 still pre-allocates and as you can see it's much faster
http://jsperf.com/big-array-initialize/5

Pre-allocated vs. dynamically grown is only part of the story. For
preallocated arrays, 100,000 just so happens to be the threshold where
V8 decides to give you a slow (a.k.a. "dictionary mode") array.
Also, growing arrays on demand doesn't allocate a new array every time
an element is added. Instead, the backing store is grown in chunks
(currently it's grown by roughly 50% each time it needs to be grown,
but that's a heuristic that might change over time).
You can find more information ..here.Thanks ..:)

Related

How are arrays implemented in JavaScript? What happened to the good old lists?

JavaScript provides a variety of data structures to be used ranging from simple objects over arrays, sets, maps, the weak variants as well as ArrayBuffers.
Over the half past year I found myself in the spot to recreate some of the more common structures like Dequeues, count maps and mostly different variants of trees.
While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
Contrary to languages I am used to, arrays in JavaScript have a variable length, similar to list. Does that mean that elements are not necessarily aligned next to each other in memory? Does a splice push and pop actually result in new allocation if a certain threshold is reached, similar to for example ArrayLists in Java? I am wondering if arrays are the way to go for queues and stacks or if actual list implementations with references to the next element might be suited in JavaScript in some cases (e.g. regarding overhead opposed to the native implementation of arrays?).
If someone has some more in-depth literature, please feel encouraged to link them here.

While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
The ECMAScript specification does not specify or require a specific implementation. That is up to the engine that implements the array to decide how best to store the data.
Arrays in the V8 engine have multiple forms based on how the array is being used. A sequential array with no holes that contains only one data type is highly optimized into something similar to an array in C++. But, if it contains mixed types or if it contains holes (blocks of the array with no value - often called a sparse array), it would have an entirely different implementation structure. And, as you can imagine it may be dynamically changed from one implementation type to another if the data in the array changes to make it incompatible with its current optimized form.
Since arrays have indexed, random access, they are not implemented as linked lists internally which don't have an efficient way to do random, indexed access.
Growing an array may require reallocating a larger block of memory and copying the existing array into it. Calling something like .splice() to remove items will have to copy portions of the array down to the lower position.
Whether or not it makes more sense to use your own linked list implementation for a queue instead of an array depends upon a bunch of things. If the queue gets very large, then it may be faster to deal with the individual allocations of a list so avoid having to copy large portions of the queue around in order to manipulate it. If the queue never gets very large, then the overhead of a moving data in an array is small and the extra complication of a linked list and the extra allocations involved in it may not be worth it.
As an extreme example, if you had a very large FIFO queue, it would not be particularly optimal as an array because you'd be adding items at one end and removing items from the other end which would require copying the entire array down to insert or remove an item from the bottom end and if the length changed regularly, the engine would probably regularly have to reallocate too. Whether or not that copying overhead was relevant in your app or not would need to be tested with an actual performance test to see if it was worth doing something about.
But, if your queue was always entirely the same data type and never had any holes in it, then V8 can optimize it to a C++ style block of memory and when calling .splice() on that to remove an item can be highly optimized (using CPU block move instructions) which can be very, very fast. So, you'd really have to test to decide if it was worth trying to further optimize beyond an array.
Here's a very good talk on how V8 stores and optimizes arrays:
Elements Kinds in V8
Here are some other reference articles on the topic:
How do JavaScript arrays work under the hood
V8 array source code
Performance tips in V8
How does V8 optimize large arrays

Complexity of Array methods

In a team project we need to remove the first element of an array, thus I called Array.prototype.shift(). Now a guy saw Why is pop faster than shift?, thus suggested to first reverse the array, pop, and reverse again, with Array.prototype.pop() and Array.prototype.reverse().
Intiutively this will be slower (?), since my approach takes O(n) I think, while the other needs O(n), plus O(n). Of course in asymptotic notation this will be the same. However notice the verb I used, think!
Of course I could write some, use jsPerf and benchmark, but this takes time (in contrast with deciding via the time complexity signs (e.g. a O(n3) vs O(n) algorithm).
However, convincing someone when using my opinion is much harder than pointing him to the Standard (if it refered to complexity).
So how to find the Time Complexity of these methods?
For example in C++ std::reverse() clearly states:
Complexity
Linear in half the distance between first and last: Swaps elements.

how to find the Time Complexity of these methods?
You cannot find them in the Standard.
ECMAScript is a Standard for scripting languages. JavaScript is such a language.
The ECMA specification does not specify a bounding complexity. Every JavaScript engine is free to implement its own functionality, as long as it is compatible with the Standard.
As a result, you have to benchmark with jsPerf, or even look at the souce code of a specific JavaScript Engine, if you would like.
Or, as robertklep's comment mentioned:
"The Standard doesn't mandate how these methods should be implemented. Also, JS is an interpreted language, so things like JIT and GC may start playing a role depending on array sizes and how often the code is called. In other words: benchmarking is probably your only option to get an idea on how different JS engines perform".
There are further evidence for this claim ([0], [1], [2]).

An undisputable method is to do a comparative benchmark (provided you do it correctly and the other party is not bad faith). Don't care about theoretical complexities.
Otherwise you will spend a hard time convincing someone who doesn't see the obvious: a shift moves every element once, while two reversals move it twice.
By the way, a shift is optimal, as every element has to move at least once. And if properly implemented as a memmov, it is very fast (while a single reversal cannot be as fast).

Looping : 10,000 arrays (time wasting) vs 1 global array (memory wasting)

For performance reason, I have a game map divided in 10,000 tiles. Each tile has an array of entities. Thus, entities entering/leaving a tile is pushed/removed into/from the corresponding array.
On the other hand, to avoid to loop on the 10,000 arrays to update all the entities every x ms, what is the most efficient way to handle arrays ?
Besides the tiles array, should I create one global array containing all the entities from all the tiles ? Is it not memory wasting ?

This is a common dichotomy in performance tuning, processing vs memory.
My answer as always, (especially regarding performance) is "it depends". The only real way to know is to measure both and see if the speed/memory usage fits within some acceptable bounds. eg. If you want 60fps you have to be <16ms per frame, no questions asked.
My suggestion would be to implement it in whichever way is the most logical to the reader (ie. you/your team) and only once it works mangle the code for performance gains (and then, only once you can measure them). This prevents you "optimizing" code and making it unreadable/unmaintainable (or, at least less so) without having evidence it is required and that it is an actual improvement.

Why set arrayLength in "new Array(arrayLength)"?

For example:
array1 = new Array(5); array2 = new Array(10);
Both console.log(array1) and console.log(array2) would return [].
then, what is the role of arrayLength here?

JavaScript hides a lot of the details that you'd typically have to deal with when working with arrays in many other languages. For example, an array in JavaScript can grow automatically as you push values to it.
However, under the covers, the runtime is still dealing with the same sort of memory allocation issues that languages like C or Java make visible to the developer. The runtime may set aside a little extra memory for the array to grow into, and then once it runs out of contiguous memory space, it'll reallocate a larger piece of memory somewhere else and copy all of the values from the first set of memory to another location.
(This is vastly oversimplifying things, but hopefully you get the general idea.)
If you know ahead of time exactly how many items you can expect to put into the array, using new Array(number) will give the runtime a hint that it can begin by allocating that much memory, and avoid the need for the memory to be reallocated and copied around as you grow it.
It's in this light, for example, that this page suggests the following practices to achieve maximum performance in the V8 javascript engine:
Don't pre-allocate large Arrays (e.g. > 64K elements) to their maximum size, instead grow as you go
Initialize using array literals for small fixed-sized arrays
Preallocate small arrays (<64k) to correct size before using them

Passing a number to the Array constructor sets the length property of the array without setting the indices of the items (which is why your console.log isn't showing anything).
To quote JavaScript Garden:
Being able to set the length of the array in advance is only useful in a few cases, like repeating a string, in which it avoids the use of a loop.
Here's an example of doing just that:
new Array(count + 1).join(stringToRepeat);

Is it worth creating a LinkedList in Java Script

I am currently working on a project that requires me to iterate through a list of values and add a new value in between each value already in the list. This is going to be happening for every iteration so the list will grow exponentially. I decided that implementing the list as a Linked List would be a great idea. Now, JS has no default Linked List data structure, and I have no problem creating one.
But my question is, would it be worth it to create a simple Linked List from scratch, or would it be a better idea to just create an array and use splice() to insert each element? Would it, in fact, be less efficient due to the overhead?

Use a linked list, in fact most custom implementations done well in user javascript will beat built-in implementations due to spec complexity and decent JITting. For example see https://github.com/petkaantonov/deque
What george said is literally 100% false on every point, unless you take a time machine to 10 years ago.
As for implementation, do not create external linked list that contains values but make the values naturally linked list nodes. You will otherwise use way too much memory.

Inserting each element with splice() would be slower indeed (inserting n elements takes O(n²) time). But simply building a new array (appending new values and appending the values from the old one in lockstep) and throwing away the old one takes linear time, and most likely has better constant factors than manipulating a linked list. It might even take less memory (linked list nodes can have surprisingly large space overhead, especially if they aren't intrusive).

Javascript is an interpreted language. If you want to implement a linked list then you will be looping a lot! The interpreter will perform vely slowly. The built-in functions provided by the intrepreter are optimized and compiled with the interpreter so they will run faster. I would choose to slice the array and then concatenate everything again, it should be faster then implementing your own data structure.
As well javascript passes by value not by pointer/reference so how are you going to implement a linked list?

Develop Reference

JavaScript is the programming language of the Web.