How are JavaScript arrays stored in memory - javascript

So, I was thinking how arrays are stored in memory in JavaScript.
I already read How are JavaScript arrays represented in physical memory? , but I couldn't find my answer.
What I'm thinking is more about the memory location of the array units. In C for example, you need to define the size of the array when you define them. With this, C defines a whole block of memory, and it can look the exact location of each unit.
For example:
int array[10]; // C knows the memory location of the 1st item of the array
array[3] = 1 // C can do that, because it can calculate the location
// of array[3] by doing &array + 3 * (int size)
In JS, you can grow the size of an array after allocating memory to other stuff, which means JS doesn't work with the "block" type of array.
But if arrays are not a single block of memory, how does JS calculate where each unit is? Do JS arrays follows a linked list type of structure?

One thing I would recommend everyone is that node.js recently became a first-class citizen of Chrome V8, so I would recommend studying V8 to see not only how it handles these implementation details but also why.
First, This article should prove beneficial to readers because of its focus on writing optimized isomorphic JavaScript:
https://blog.sessionstack.com/how-javascript-works-inside-the-v8-engine-5-tips-on-how-to-write-optimized-code-ac089e62b12e
The above article goes into details about how the JIT (Just In Time) compiler works, so you should be able to derive the exact questions you have after reading it.
Here is an exerpt:
Arrays: avoid sparse arrays where keys are not incremental numbers. Sparse arrays which don’t have every element inside them are a hash table. Elements in such arrays are more expensive to access. Also, try to avoid pre-allocating large arrays. It’s better to grow as you go. Finally, don’t delete elements in arrays. It makes the keys sparse.
Second, I would also recommend reading this and then working outward with respect to V8:
http://www.jayconrod.com/posts/52/a-tour-of-v8-object-representation
Third, as a matter of critical bonus facts, I read this answer a while ago and I mentally revisit it from time to time. I am extremely surprised I just found it now. I literally Googled "stack overflow optimize train tracks" and found it. Thanks Google: Why is it faster to process a sorted array than an unsorted array?
Yes, that answer does have 27,000 positive votes.
That article talks about branch prediction, and I would like you to be aware of that because it could have some implications on how you work with data in general not just arrays. Again, note the first article I linked, and pay attention while it is describing the order of keys on an Object.
Performance can be optimized by understanding the implementation details and understanding why the problems were solved that way.
Finally, everything is an Object in JavaScript unless it is a scalar value, which we call primitives--String, Number, Boolean, etc.
Here is an example for thought provoking purposes:
const arr = ['one', 'two', 'three']
const sameArr = {
0: 'one',
1: 'two',
2: 'three',
}
We could then destructure our Array as if it were an Object:
const yolo = ['one', 'two', 'three']
const {
0: one,
1: two,
2: three,
} = yolo
console.log('Pretty cool:', one, two, three)
You can get some hints from that example as to why changing the order of keys could wreak havoc on the underlying hash table. Just because you can't see the keys doesn't mean they aren't there and affected.
In the above example, if it were a map, you could do sameArr.get('0') and JavaScript would reasonably know exactly where that is in the numerical table.
I would also recommend being careful reading old JavaScript material because of the overhauls of ES6. I feel the most comfortable directing you to V8 material.

Unlike C or other compiled languages that are proprietary, JavaScript is an ECMAScript implementation. The details of the implementation are not standardized and are specific to each vendor's implementation. In short, the low level details of how the language is implemented is a black box and while you can certainly dive into the internals of a particular vendor's implementation, there is no standard on this and implementations will vary from one vendor to another.

Related

How are arrays implemented in JavaScript? What happened to the good old lists?

JavaScript provides a variety of data structures to be used ranging from simple objects over arrays, sets, maps, the weak variants as well as ArrayBuffers.
Over the half past year I found myself in the spot to recreate some of the more common structures like Dequeues, count maps and mostly different variants of trees.
While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
Contrary to languages I am used to, arrays in JavaScript have a variable length, similar to list. Does that mean that elements are not necessarily aligned next to each other in memory? Does a splice push and pop actually result in new allocation if a certain threshold is reached, similar to for example ArrayLists in Java? I am wondering if arrays are the way to go for queues and stacks or if actual list implementations with references to the next element might be suited in JavaScript in some cases (e.g. regarding overhead opposed to the native implementation of arrays?).
If someone has some more in-depth literature, please feel encouraged to link them here.
While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
The ECMAScript specification does not specify or require a specific implementation. That is up to the engine that implements the array to decide how best to store the data.
Arrays in the V8 engine have multiple forms based on how the array is being used. A sequential array with no holes that contains only one data type is highly optimized into something similar to an array in C++. But, if it contains mixed types or if it contains holes (blocks of the array with no value - often called a sparse array), it would have an entirely different implementation structure. And, as you can imagine it may be dynamically changed from one implementation type to another if the data in the array changes to make it incompatible with its current optimized form.
Since arrays have indexed, random access, they are not implemented as linked lists internally which don't have an efficient way to do random, indexed access.
Growing an array may require reallocating a larger block of memory and copying the existing array into it. Calling something like .splice() to remove items will have to copy portions of the array down to the lower position.
Whether or not it makes more sense to use your own linked list implementation for a queue instead of an array depends upon a bunch of things. If the queue gets very large, then it may be faster to deal with the individual allocations of a list so avoid having to copy large portions of the queue around in order to manipulate it. If the queue never gets very large, then the overhead of a moving data in an array is small and the extra complication of a linked list and the extra allocations involved in it may not be worth it.
As an extreme example, if you had a very large FIFO queue, it would not be particularly optimal as an array because you'd be adding items at one end and removing items from the other end which would require copying the entire array down to insert or remove an item from the bottom end and if the length changed regularly, the engine would probably regularly have to reallocate too. Whether or not that copying overhead was relevant in your app or not would need to be tested with an actual performance test to see if it was worth doing something about.
But, if your queue was always entirely the same data type and never had any holes in it, then V8 can optimize it to a C++ style block of memory and when calling .splice() on that to remove an item can be highly optimized (using CPU block move instructions) which can be very, very fast. So, you'd really have to test to decide if it was worth trying to further optimize beyond an array.
Here's a very good talk on how V8 stores and optimizes arrays:
Elements Kinds in V8
Here are some other reference articles on the topic:
How do JavaScript arrays work under the hood
V8 array source code
Performance tips in V8
How does V8 optimize large arrays

What are the semantic differences between `Set` and a `Map` in JavaScript?

It seems that everything you can do with Set you can do with Map? Is this correct?
What are the semantic differences between a Set and a Map?
Edit: the linked "dupe" does not enumerate the semantic differences between the two.
I have retracted my close vote.
A quick google of 'set vs. hashtable' or 'set vs. hashmap' turns up numerous SO questions, mostly in the Java tag, but I didn't see a single answer that actually tackled the difference in a good conceptual way (although a few linked to relevant resources).
Let's start with what data structures are: namely containers for values. The values can be whatever for the most part. Some data structures are homogeneous, some aren't. Some have restrictions (e.g. Map can have arbitrary keys but POJOs can only have string or symbol keys), some don't, some are ordered, some aren't, etc. All of these tradeoffs generally boil down to performance.
A Set is a data structure that holds unique values. Let's compare to an array:
Array.from(new Set([1,2,2,3])).toString() === [1,2,2,3].toString();
// false
Like arrays or lists, Sets in JavaScript are linear: you can traverse them in order*. But unlike arrays (more like lists) Sets are not indexed. You can't say new Set(1)[0];.
Maps on the other hand *ahem* map keys to values (indexed). If I have a Map new Map([['a',1]]), then .get('a') will return 1. Order is not generally considered important for Maps, that what key indexes are for. Nor is uniqueness: new Map([['a', 1], ['b', 1]]) stores the value 1 twice** and you can access it from either key.
Even if, like me, you are primarily a self-taught programmer, I highly recommend familiarizing yourself with basic data structures as it offers valuable insight into problem identification and general solutions. If you find yourself using Array.prototype.shift a lot for instance, you probably wanted a FIFO queue/linked list instead.
* Sets in general are unordered, the retention of insertion order is a JavaScript thing.
** The underlying implementation may as an optimization store it only once, but that is an implementation detail and opaque to you the user.

Does v8 still have C-style arrays that are contiguous blocks of memory, and how can I make sure I am using them?

There seems to be a conventional wisdom that arrays are represented as hashmaps from indices to values in v8. The only source I found that states otherwise is this:
https://www.youtube.com/watch?feature=player_detailpage&v=XAqIpGU8ZZk#t=994s
Seems authoritative, however, it dates back to 2012. A lot could have changed since.
Is it still true that
var a1 = Array(1000) is a contiguous array under the hood (unless you exceed array's boundaries) and var a2 = [] is not?
V8 will use true arrays if it can. For instance, if you fill the array in a contiguous way, don't use delete on it, etc. Basically, if you use it as though it were a true array (but one that magically grows for you), V8 is likely to be able to keep using a true array under the covers.
If your data is a fit for one of the typed arrays (Int8Array, Uint8Array, Uint8ClampedArray, Int16Array, Uint16Array, Int32Array, Uint32Array, Float32Array, or Float64Array), you can use them to ensure you're dealing with a true array.
Re the comment you added under the question: I don't have a specific reference I can cite for the above. The V8 source code is, of course, available on the V8 site, but digging through it for all places where arrays might fall back to dictionary behavior would probably be more work than you (or I) are going to want to do. :-)

linked list vs arrays for dictionaries

I was recently asked in an interview about advantages and disadvantages of linked list and arrays for dictionary of words implementation and also what is the best data structure for implementing it? This where I messed up things. After googling I couldn't specifically found exact answer that is specific to dictionaries but general linked list v arrays explanation. What is the best suited answer to above question?
If you're just going to use it for lookups, then an array is the obvious best choice of the two. You can build the dictionary from a list of words in O(n log n)--just build an array and sort it. Lookups are O(log n) with a binary search.
Although you can build a linked list of words in O(n), lookups will require, on average, that you look at n/2 words. The difference is pretty large. Given an English dictionary of 128K words, a linked list lookup will take on average 64,000 string comparisons. A binary search will require at most 17.
In addition, a linked list of n words will occupy more memory than an array of n words, because you need the next pointer in the list.
If you need the ability to update the dictionary, you'll probably still want to use an array if updates are infrequent compared to lookups (which is almost certainly the case). I can't think of a real-world example of a dictionary of words that's updated more frequently than it's queried.
As others have pointed out, neither array nor linked list is the best choice for a dictionary of words. But of the two options you're given, array is superior in almost all cases.
There is no one answer.
The two obvious choices would be something based on a hash table if you only want to look up individual items, or something based on a balanced tree if you want to look up ranges of items.
A sorted array can work well if you do a lot of searching and relatively little insertion or deletion. Finding situations where linked lists are preferred is rather more difficult. Depending on the situation (especially such things as finding all the words that start with, say, "ste"), tries can also work extremely well (and often do well at minimizing the storage needed for a given set of data as well).
Those are really broad categories though, not specific implementations. There are also variations such as extensible hashing and distributed hash tables that can be useful in specific situations (and also have somewhat tree-like properties, so things like range-based searching can be reasonable efficient).
Best data structure for implementing dictionaries is suffix trees. You can also have a look at tries.
Well, if you're building a dictionary, you'd want it to be a sorted structure. So you're going for a sorted-array or a sorted linked-list.
For a linked list retrieval is O(n) since you have to examine all words until you find the one you need. For a sorted array, you can use binary search to find the right location, which is O(log n).
For a sorted array, insertion is O(log n) to find the right location (binary search) and then O(n) to insert because you need to push everything down. For a linked list, it would be O(n) to find the location and then O(1) to insert because you only have to adjust pointers. The same applies for deletion.
Since you aren't going to be updating a dictionary much, you can just build and then sort the array in O(nlog n) time (using quicksort for example). After that, lookup is O(log n) using binary search. Furthermore, as delnan mentioned below, using an array has the advantage that everything you access is sequential in memory; i.e., the data are localized (locality of reference). This minimizes cache misses (which are expensive). With a linked list, the data are spread out all over and there is no guarantee that they are close together, which increases the chance of cache-misses. With this in mind, given the two options, use the array.
You can do an even better job if you implement a sorted hashmap using a red-black tree (your tree entries, which are the keys can be coupled with a hashmap); here search, insert, and delete are O(log n). But it really depends on your behavior profile; if you're only doing lookup, a simple hashmap would be best (O(1) retrieval).
Another interesting data-structure you can use is a Trie, where insertion and lookup are O(m); m being the length of the string.

How would you explain Javascript Typed Arrays to someone with no programming experience outside of Javascript?

I have been messing with Canvas a lot lately, developing some ideas I have for a web-based game. As such I've recently run into Javascript Typed Arrays. I've done some reading for example at MDN and I just can't understand anything I'm finding. It seems most often, when someone is explaining Typed Arrays, they use analogies to other languages that are a little beyond my understanding.
My experience with "programming," if you can call it that (and not just front-end scripting), is pretty much limited to Javascript. I do feel as though I understand Javascript pretty well outside of this instance, however. I have deeply investigated and used the Object.prototype structure of Javascript, and more subtle factors such as variable referencing and the value of this, but when I look at any information I've found about Typed Arrays, I'm just lost.
With this frame-of-reference in mind, can you describe Typed Arrays in a simple, usable way? The most effective depicted use-case, for me, would be something to do with Canvas image data. Also, a well-commented Fiddle would be most appreciated.
In typed programming languages (to which JavaScript kinda belongs) we usually have variables of fixed declared type that can be dynamically assigned values.
With Typed Arrays it's quite the opposite.
You have a fixed chunk of data (represented by ArrayBuffer) that you do not access directly. Instead this data is accessed by views. Views are created at run time and they effectively declare some portion of the buffer to be of a certain type. These views are sub-classes of ArrayBufferView. The views define the certain continuous portion of this chunk of data as elements of an array of a certain type. Once the type is declared browser knows the length and content of each element, as well as a number of such elements. With this knowledge browsers can access individual elements much more efficiently.
So we dynamically assigning a type to a portion of what actually is just a buffer. We can assign multiple views to the same buffer.
From the Specs:
Multiple typed array views can refer to the same ArrayBuffer, of different types,
lengths, and offsets.
This allows for complex data structures to be built up in the ArrayBuffer.
As an example, given the following code:
// create an 8-byte ArrayBuffer
var b = new ArrayBuffer(8);
// create a view v1 referring to b, of type Int32, starting at
// the default byte index (0) and extending until the end of the buffer
var v1 = new Int32Array(b);
// create a view v2 referring to b, of type Uint8, starting at
// byte index 2 and extending until the end of the buffer
var v2 = new Uint8Array(b, 2);
// create a view v3 referring to b, of type Int16, starting at
// byte index 2 and having a length of 2
var v3 = new Int16Array(b, 2, 2);
The following buffer and view layout is created:
This defines an 8-byte buffer b, and three views of that buffer, v1,
v2, and v3. Each of the views refers to the same buffer -- so v1[0]
refers to bytes 0..3 as a signed 32-bit integer, v2[0] refers to byte
2 as a unsigned 8-bit integer, and v3[0] refers to bytes 2..3 as a
signed 16-bit integer. Any modification to one view is immediately
visible in the other: for example, after v2[0] = 0xff; v21 = 0xff;
then v3[0] == -1 (where -1 is represented as 0xffff).
So instead of declaring data structures and filling them with data, we take data and overlay it with different data types.
I spend all my time in javascript these days, but I'll take a stab at quick summary, since I've used typed arrays in other languages, like Java.
The closest thing I think you'll find in the way of comparison, when it comes to typed arrays, is a performance comparison. In my head, Typed Arrays enable compilers to make assumptions they can't normally make. If someone is optimizing things at the low level of a javascript engine like V8, those assumptions become valuable. If you can say, "Data will always be of size X," (or something similar), then you can, for instance, allocate memory more efficiently, which lets you (getting more jargon-y, now) reduce how many times you go to access memory and it's not in a CPU cache. Accessing CPU cache is much faster than having to go to RAM, I believe. When doing things at a large scale, those time savings add up quick.
If I were to do up a jsfiddle (no time, sorry), I'd be comparing the time it takes to perform certain operations on typed arrays vs non-typed arrays. For example, I imagine "adding 100,000 items" being a performance benchmark I'd try, to compare how the structures handle things.
What I can do is link you to: http://jsperf.com/typed-arrays-vs-arrays/7
All I did to get that was google "typed arrays javascript performance" and clicked the first item (I'm familiar with jsperf, too, so that helped me decide).

Categories

Resources