What are the semantic differences between `Set` and a `Map` in JavaScript?

What are the semantic differences between `Set` and a `Map` in JavaScript? - javascript

It seems that everything you can do with Set you can do with Map? Is this correct?
What are the semantic differences between a Set and a Map?
Edit: the linked "dupe" does not enumerate the semantic differences between the two.

I have retracted my close vote.
A quick google of 'set vs. hashtable' or 'set vs. hashmap' turns up numerous SO questions, mostly in the Java tag, but I didn't see a single answer that actually tackled the difference in a good conceptual way (although a few linked to relevant resources).
Let's start with what data structures are: namely containers for values. The values can be whatever for the most part. Some data structures are homogeneous, some aren't. Some have restrictions (e.g. Map can have arbitrary keys but POJOs can only have string or symbol keys), some don't, some are ordered, some aren't, etc. All of these tradeoffs generally boil down to performance.
A Set is a data structure that holds unique values. Let's compare to an array:
Array.from(new Set([1,2,2,3])).toString() === [1,2,2,3].toString();
// false
Like arrays or lists, Sets in JavaScript are linear: you can traverse them in order*. But unlike arrays (more like lists) Sets are not indexed. You can't say new Set(1)[0];.
Maps on the other hand *ahem* map keys to values (indexed). If I have a Map new Map([['a',1]]), then .get('a') will return 1. Order is not generally considered important for Maps, that what key indexes are for. Nor is uniqueness: new Map([['a', 1], ['b', 1]]) stores the value 1 twice** and you can access it from either key.
Even if, like me, you are primarily a self-taught programmer, I highly recommend familiarizing yourself with basic data structures as it offers valuable insight into problem identification and general solutions. If you find yourself using Array.prototype.shift a lot for instance, you probably wanted a FIFO queue/linked list instead.
* Sets in general are unordered, the retention of insertion order is a JavaScript thing.
** The underlying implementation may as an optimization store it only once, but that is an implementation detail and opaque to you the user.

Related

How are arrays implemented in JavaScript? What happened to the good old lists?

JavaScript provides a variety of data structures to be used ranging from simple objects over arrays, sets, maps, the weak variants as well as ArrayBuffers.
Over the half past year I found myself in the spot to recreate some of the more common structures like Dequeues, count maps and mostly different variants of trees.
While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
Contrary to languages I am used to, arrays in JavaScript have a variable length, similar to list. Does that mean that elements are not necessarily aligned next to each other in memory? Does a splice push and pop actually result in new allocation if a certain threshold is reached, similar to for example ArrayLists in Java? I am wondering if arrays are the way to go for queues and stacks or if actual list implementations with references to the next element might be suited in JavaScript in some cases (e.g. regarding overhead opposed to the native implementation of arrays?).
If someone has some more in-depth literature, please feel encouraged to link them here.

While looking at the Ecma specification I could not find a description on how arrays implemented on a memory level, supposedly this is up to the underlying engine?
The ECMAScript specification does not specify or require a specific implementation. That is up to the engine that implements the array to decide how best to store the data.
Arrays in the V8 engine have multiple forms based on how the array is being used. A sequential array with no holes that contains only one data type is highly optimized into something similar to an array in C++. But, if it contains mixed types or if it contains holes (blocks of the array with no value - often called a sparse array), it would have an entirely different implementation structure. And, as you can imagine it may be dynamically changed from one implementation type to another if the data in the array changes to make it incompatible with its current optimized form.
Since arrays have indexed, random access, they are not implemented as linked lists internally which don't have an efficient way to do random, indexed access.
Growing an array may require reallocating a larger block of memory and copying the existing array into it. Calling something like .splice() to remove items will have to copy portions of the array down to the lower position.
Whether or not it makes more sense to use your own linked list implementation for a queue instead of an array depends upon a bunch of things. If the queue gets very large, then it may be faster to deal with the individual allocations of a list so avoid having to copy large portions of the queue around in order to manipulate it. If the queue never gets very large, then the overhead of a moving data in an array is small and the extra complication of a linked list and the extra allocations involved in it may not be worth it.
As an extreme example, if you had a very large FIFO queue, it would not be particularly optimal as an array because you'd be adding items at one end and removing items from the other end which would require copying the entire array down to insert or remove an item from the bottom end and if the length changed regularly, the engine would probably regularly have to reallocate too. Whether or not that copying overhead was relevant in your app or not would need to be tested with an actual performance test to see if it was worth doing something about.
But, if your queue was always entirely the same data type and never had any holes in it, then V8 can optimize it to a C++ style block of memory and when calling .splice() on that to remove an item can be highly optimized (using CPU block move instructions) which can be very, very fast. So, you'd really have to test to decide if it was worth trying to further optimize beyond an array.
Here's a very good talk on how V8 stores and optimizes arrays:
Elements Kinds in V8
Here are some other reference articles on the topic:
How do JavaScript arrays work under the hood
V8 array source code
Performance tips in V8
How does V8 optimize large arrays

Hashtable vs objects In javascript

I’m new to data structure and I’m learning it in Javascript.
My Question is:
Why do we need hash tables when we 've objects in javascript?
Can anybody give me a situation where hash tables will be more useful than objects?

"Hashtable" is called different things in different languages. Java has Hashtable and HashMap, Ruby has Hash, Python has dict... in JavaScript, it's called Map.
Objects' keys are limited to strings; Map keys can be anything.
Objects support inheritance; a Map only contains what is specifically put into it.

Think you means Map instead of HashTable. IMHO Map may be more useful and perform better if you need one of that:
keep order of insertions of key/value pairs;
frequent additional and removal;
key which not String/Symbol.
I think you can obtain more information at MDN

The MDN docs on this are quite helpful: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map#Objects_and_maps_compared
Most notably, using a map gives you the advantage of using anything as a key, maps retain order, and may perform better when constantly adding and removing values.

How are JavaScript arrays stored in memory

So, I was thinking how arrays are stored in memory in JavaScript.
I already read How are JavaScript arrays represented in physical memory? , but I couldn't find my answer.
What I'm thinking is more about the memory location of the array units. In C for example, you need to define the size of the array when you define them. With this, C defines a whole block of memory, and it can look the exact location of each unit.
For example:
int array[10]; // C knows the memory location of the 1st item of the array
array[3] = 1 // C can do that, because it can calculate the location
// of array[3] by doing &array + 3 * (int size)
In JS, you can grow the size of an array after allocating memory to other stuff, which means JS doesn't work with the "block" type of array.
But if arrays are not a single block of memory, how does JS calculate where each unit is? Do JS arrays follows a linked list type of structure?

One thing I would recommend everyone is that node.js recently became a first-class citizen of Chrome V8, so I would recommend studying V8 to see not only how it handles these implementation details but also why.
First, This article should prove beneficial to readers because of its focus on writing optimized isomorphic JavaScript:
https://blog.sessionstack.com/how-javascript-works-inside-the-v8-engine-5-tips-on-how-to-write-optimized-code-ac089e62b12e
The above article goes into details about how the JIT (Just In Time) compiler works, so you should be able to derive the exact questions you have after reading it.
Here is an exerpt:
Arrays: avoid sparse arrays where keys are not incremental numbers. Sparse arrays which don’t have every element inside them are a hash table. Elements in such arrays are more expensive to access. Also, try to avoid pre-allocating large arrays. It’s better to grow as you go. Finally, don’t delete elements in arrays. It makes the keys sparse.
Second, I would also recommend reading this and then working outward with respect to V8:
http://www.jayconrod.com/posts/52/a-tour-of-v8-object-representation
Third, as a matter of critical bonus facts, I read this answer a while ago and I mentally revisit it from time to time. I am extremely surprised I just found it now. I literally Googled "stack overflow optimize train tracks" and found it. Thanks Google: Why is it faster to process a sorted array than an unsorted array?
Yes, that answer does have 27,000 positive votes.
That article talks about branch prediction, and I would like you to be aware of that because it could have some implications on how you work with data in general not just arrays. Again, note the first article I linked, and pay attention while it is describing the order of keys on an Object.
Performance can be optimized by understanding the implementation details and understanding why the problems were solved that way.
Finally, everything is an Object in JavaScript unless it is a scalar value, which we call primitives--String, Number, Boolean, etc.
Here is an example for thought provoking purposes:
const arr = ['one', 'two', 'three']
const sameArr = {
0: 'one',
1: 'two',
2: 'three',
}
We could then destructure our Array as if it were an Object:
const yolo = ['one', 'two', 'three']
const {
0: one,
1: two,
2: three,
} = yolo
console.log('Pretty cool:', one, two, three)
You can get some hints from that example as to why changing the order of keys could wreak havoc on the underlying hash table. Just because you can't see the keys doesn't mean they aren't there and affected.
In the above example, if it were a map, you could do sameArr.get('0') and JavaScript would reasonably know exactly where that is in the numerical table.
I would also recommend being careful reading old JavaScript material because of the overhauls of ES6. I feel the most comfortable directing you to V8 material.

Unlike C or other compiled languages that are proprietary, JavaScript is an ECMAScript implementation. The details of the implementation are not standardized and are specific to each vendor's implementation. In short, the low level details of how the language is implemented is a black box and while you can certainly dive into the internals of a particular vendor's implementation, there is no standard on this and implementations will vary from one vendor to another.

What mathematical tool is similar to a JavaScript object?

I am trying to write documentation on a piece of Javascript code but I am having trouble describing the objects made by the code in a concise and understandable way. It is especially difficult because the objects have nested objects (often multiple layers).
Is there any mathematics that involves things with keys and attached values?
If not, how best can I describe an object with multiple nest objects in a concise manner?
Note: Just showing an example of an object is not enough as the structure changes often. Also, there are mathematical relationships between the keys and the values (coupon dates as keys and coupon payments as values).

I would say that Javascript objects are functions or mappings, in that they map keys to values.
Beyond that, it is hard to compare... the domain can encompass numbers, and a subset of all strings. As simple as that is to say, I'm not sure what mathematical field (etc) the domain would be equivalent to!
The range would, of course, be worse, as values in the range can be numbers, strings, booleans, undefined, further objects, or functions. However, I think the concept of an object being a mapping is fairly intuitive.
This doesn't include the prototype style inheritance, but I'm not sure how deep you want to go...

I saw a comment on it earlier, JavaScript objects pretty much follow the associative array abstract data type, which is a mathematical concept by virtue since computer science is basically a subset of applied mathematics, but if you need a true mathematical representation there's relational algebra which was created for relational databases (close enough) and is essentially an extension of set theory... just remember math doesn't necessarily mean it's clear and concise – Patrick Barr yesterday

Can I increase lookup speed by positioning properties in object?

I've seen a lot of questions about the fastest way to access object properties (like using . vs []), but can't seem to find whether it's faster to retrieve object properties that are declared higher than others in object literal syntax.
I'm working with an object that could contain up to 40,000 properties, each of which is an Array of length 2. I'm using it as a lookup by value.
I know that maybe 5% of the properties will be the ones I need to retrieve most often. Is either of the following worth doing for increased performance (decreased lookup time)?
Set the most commonly needed properties at the top of the object literal syntax?
If #1 has no effect, should I create two separate objects, one with the most common 5% of properties, search that one first, then if the property isn't found there, then look through the object with all the less-common properties?
Or, is there a better way?

I did a js perf here: http://jsperf.com/object-lookup-perf
I basically injected 40000 props with random keys into an object, saved the "first" and "last" keys and looked them up in different tests. I was surprised by the result, because accessing the first was 35% slower than accessing the last entry.
Also, having an object of 5 or 40000 entries didn’t make any noticeable difference.
The test case can most likely be improved and I probably missed something, but there is a start for you.
Note: I only tested chrome

Yes, something like "indexOf" searches front to back, so placing common items higher in the list will return them faster. Most "basic" search algorithms are basic top down (simple sort) searches. At least for arrays.

If you have so many properties, they must be computed, no ? So you can replace the (string, most probably) computation by an integer hash computation, then use this hash in a regular array.
You might even use one single array by putting values in the 2*ith, 2*i+1th slot.
If you can use a typed array here, do it and you could no go faster.

Set the most commonly needed properties at the top of the object literal syntax?
No. Choose readability over performance. If you've got few enough properties that you use a literal in the code, it won't matter anyway; and you should order the properties in a logical sequence.
Property lookup in objects is usually based on hash maps, and position should not make a substantial difference. Depending on the implementation of the hash, they might be neglible slower, but I'd guess this is quite random and depends heavily on the applied optimisations. It should not matter.
If #1 has no effect, should I create two separate objects, one with the most common 5% of properties, search that one first, then if the property isn't found there, then look through the object with all the less-common properties?
Yes. If you've got really huge objects (with thousands of properties), this is a good idea. Depending on the used data structure, the size of the object might influence the lookup time, so if you've got a smaller object for the more frequent properties it should be faster. It's possible that different structures are chosen for the two objects, which could perform better than the single one - especially if you know beforehand in which object to look. However you will need to test this hypothesis with your actual data, and you should beware of premature [micro-]optimisation.

Develop Reference

JavaScript is the programming language of the Web.