Is Set a hashed collection in JavaScript? - javascript

I was asking myself this question. Is Set a hashed collection in JavaScript?
For example, Set.prototype.has will iterate the entire Set or do its implementations use an internal hash table to locate an item within the collection?

The ECMAScript 2015 specification says that:
Set objects must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection.
Obviously they can't force a particular JS engine to actually do that, but in practice JS engines will do the right thing.

The ES6 specification does not require a specific implementation, but it does indicate that it should be better than O(n) (so better than a linear lookup). And, since the purpose of the Set object is to be efficient at looking up items in the Set, it most surely uses some sort of efficient lookup system like a hash.
If you want to know for sure how it works, you'd have to look at the open source code for the Firefox or Chrome implementations.
You could also benchmark it to prove that the lookup speed is not O(n), but something more efficient than that.

Related

MongoDB multiple sort properties: How is precedence determined?

According to Mongo's docs, you can specify multiple sort keys like this:
{ $sort : { age : -1, posts: 1 } }
Which they say will sort first by age (descending) then by posts (ascending).
But the sort query is a Javascript object. To the best of my knowledge, although implementations typically iterate over properties in the order they were created, that's not actually part of ECMAScript's spec: object properties officially have no order.
Is MongoDB really relying on arbitrary behavior that could vary by implementation, am I wrong about the ECMAScript spec, or am I missing something in the Mongo docs that lets you tune the precedence some other way?
The console is special, its objects are actually ordered unlike normal EMCAscript so that this can happen.
Here is a linked question from a 10gen employee that states: https://stackoverflow.com/a/18514551/383478
Among other things, order of fields is always preserved.
N.B: It is good to note that V8 (runs MongoDB shell and MR since v2.2 about) has ordered objects in practice anyway.
The only true way in non-V8 JS to keep order is to do key lookups like: How to keep an Javascript object/array ordered while also maintaining key lookups?
Yes you are wrong about the ECMAScript spec. Properties retain their order which is why with some drivers for languages ( e.g Perl orders "hashes" by key name by default, use Tie::IxHash to change that) recommend forms that also maintain an order in the structure to be converted.
At any rate, this is not "really" JavaScript anyhow, but it is BSON. It is borrowed behavior anyhow so the statement really remains the same. The order you specify is preserved.

Can I increase lookup speed by positioning properties in object?

I've seen a lot of questions about the fastest way to access object properties (like using . vs []), but can't seem to find whether it's faster to retrieve object properties that are declared higher than others in object literal syntax.
I'm working with an object that could contain up to 40,000 properties, each of which is an Array of length 2. I'm using it as a lookup by value.
I know that maybe 5% of the properties will be the ones I need to retrieve most often. Is either of the following worth doing for increased performance (decreased lookup time)?
Set the most commonly needed properties at the top of the object literal syntax?
If #1 has no effect, should I create two separate objects, one with the most common 5% of properties, search that one first, then if the property isn't found there, then look through the object with all the less-common properties?
Or, is there a better way?
I did a js perf here: http://jsperf.com/object-lookup-perf
I basically injected 40000 props with random keys into an object, saved the "first" and "last" keys and looked them up in different tests. I was surprised by the result, because accessing the first was 35% slower than accessing the last entry.
Also, having an object of 5 or 40000 entries didn’t make any noticeable difference.
The test case can most likely be improved and I probably missed something, but there is a start for you.
Note: I only tested chrome
Yes, something like "indexOf" searches front to back, so placing common items higher in the list will return them faster. Most "basic" search algorithms are basic top down (simple sort) searches. At least for arrays.
If you have so many properties, they must be computed, no ? So you can replace the (string, most probably) computation by an integer hash computation, then use this hash in a regular array.
You might even use one single array by putting values in the 2*ith, 2*i+1th slot.
If you can use a typed array here, do it and you could no go faster.
Set the most commonly needed properties at the top of the object literal syntax?
No. Choose readability over performance. If you've got few enough properties that you use a literal in the code, it won't matter anyway; and you should order the properties in a logical sequence.
Property lookup in objects is usually based on hash maps, and position should not make a substantial difference. Depending on the implementation of the hash, they might be neglible slower, but I'd guess this is quite random and depends heavily on the applied optimisations. It should not matter.
If #1 has no effect, should I create two separate objects, one with the most common 5% of properties, search that one first, then if the property isn't found there, then look through the object with all the less-common properties?
Yes. If you've got really huge objects (with thousands of properties), this is a good idea. Depending on the used data structure, the size of the object might influence the lookup time, so if you've got a smaller object for the more frequent properties it should be faster. It's possible that different structures are chosen for the two objects, which could perform better than the single one - especially if you know beforehand in which object to look. However you will need to test this hypothesis with your actual data, and you should beware of premature [micro-]optimisation.

hash table - how often is the hash calculated for a given key?

I was asked this during an interview. My immediate answer was for every read and write. The interviewer then asked, "Are you sure the hash isn't cached in the table somewhere?"
This made me second guess myself. In the end, I stuck to my original answer, but out of curiosity, I figured I'd as the question here.
Also note that this interview was for a JavaScript position but the question wasn't necessarily specific to JavaScript.
So, in general, is a key's hash computed once or for every read/write? What about specific to JavaScript?
Of course it depends on the implementation, and even if you ask about JS there are several implementations (V8, SpiderMonkey, MSFT etc.).
It also should depend on the application. If your application is one that more frequently use the last item put into the hashtable then it should make sense to somehow cache the hash. In some cases this would be preferable.
I guess the interviewer just tried to see how you handle second-guessing...
It depends on the hash table and the key types, and on whether we're talking about the key used to read/write or the keys already in the table. The hash values of the former can and sometimes is cached in the object (example: strings in Python). The hash values of the latter can and sometimes are cached in the table - instead of key, value pairs you store hash, key, value triples.
In both cases, the decision depends on the kind of keys: Are they large and expensive to hash? Is it worth the extra space and memory traffic? For example, it's probably a clear win for strings larger than a couple dozen characters, and probably useless or harmful for 2D points. Also note that the hash values can be used to avoid comparisons, which might be useful but doesn't seem as important.

Look up elements in a generic tree

I have a json nested object, similar to this.
In my case, I have a unique id field of type int(say instead name above). This is not a binary tree, but more depict parent-child relationship. I wanted a way to easy lookup the child tree (children) rooted at say id = 121. In a brute force way, I may compare all nodes till I find one, and return the children. But I was think of keeping a map of {id, node}. For example {"121" : root[1][10]..[1]}. This may be super wastefulness of memory (unless use a pointer to the array).Note sure any better way.
I have control over what to send from server, so may augment above data structure. but need a quick way to get child tree based on node id in the client side.
EDIT:
I am considering keeping another data structure, map of {id, []ids}, where ids is the ordered path from root. Any better way?
Objects in javascript are true pointer-based objects, meaning that you can keep multiple references to them without using much more memory. Why not do a single traversal to assign the sub-objects to a new id-based parent object? Unless your hierarchical object is simply enormous, this should be very fast.
In light of best practice and what would happen if the application you're building were to scale to millions of users, you might rethink whether you really want the server to do more work. The client's computer is sitting there, ready to provide you with remote computing power for FREE. Why move the work load to the server causing it to process fewer client requests per second? That may not be a direction you want to go.
Here is a fiddle demonstrating this index-building technique. You run through it once, and use the index over and over as you please. It only takes 4 or 5 ms to build said index. There is no performance problem!
One more note: if you are concerned with bandwith, one simple way to help with that is trim down your JSON. Don't put quotes around object key names, use one-letter key names, and don't use whitespace and line breaks. That will get you a very large improvement. Performing this change to your example JSON, it goes from 11,792 characters to 5,770, only 49% of the original size!
One minor note is that object keys in javascript are always Strings. The numeric ids I added to your example JSON are coerced to strings when used as a key name. This should be no impediment to usage, but it is a subtle difference that you may want to be aware of.
I don't assume that the ids are somehow ordered, but still it might help to prune at least parts of the tree if you add to each node the information about the minimum and maximum id value of its children (and sub... children).
This can be quite easily be achieved at server side and when searching the tree you can check if the id you're looking for is within the id-range of a node, before stepping inside and searching all children.

What is the complexity of retrieval/insertion in JavaScript associative arrays (dynamic object properties) in the major javascript engines?

Take the following code example:
var myObject = {};
var i = 100;
while (i--) {
myObject["foo"+i] = new Foo(i);
}
console.log(myObject["foo42"].bar());
I have a few questions.
What kind of data structure do the major engines (IE, Mozilla, Chrome, Safari) use for storing key-value pairs? I'd hope it's some kind Binary Search tree, but I think they may use linked lists (due to the fact iterating is done in insertion order).
If they do use a search tree, is it self balancing? Because the above code with a conventional search tree will create an unbalanced tree, causing worst case scenario of O(n) for searching, rather than O(log n) for a balanced tree.
I'm only asking this because I will be writing a library which will require efficient retrieval of keys from a data structure, and while I could implement my own or an existing red-black tree I would rather use native object properties if they're efficient enough.
The question is hard to answer for a couple reasons. First, the modern browsers all heavily and dynamically optimize code while it is executing so the algorithms chosen to access the properties might be different for the same code. Second, each engine uses different algorithms and heuristics to determine which access algorithm to use. Third, the ECMA specification dictates what the result of must be, not how the result is achieved so the engines have a lot of freedom to innovate in this area.
That said, given your example all the engines I am familiar with will use some form of a hash table to retrieve the value associated with foo42 from myobject. If you use an object like an associative array JavaScript engines will tend to favor a hash table. None that I am aware of use a tree for string properties. Hash tables are worst case O(N), best case O(1) and tend to be closer to O(1) than O(N) if the key generator is any good. Each engine will have a pattern you could use to get it to perform O(N) but that will be different for each engine. A balanced tree would guarantee worst case O(log N) but modifying a balanced tree while keeping it balanced is not O(log N) and hash tables are more often better than O(log N) for string keys and are O(1) to update (once you determine you need to, which is the same big-O as read) if there is space in the table (periodically O(N) to rebuild the table but the tables usually double in space which means you will only pay O(N) 7 or 8 times for the life of the table).
Numeric properties are special, however. If you access an object using integer numeric properties that have few or no gaps in range, that is, use the object like it is an array, the values will tend to be stored in a linear block of memory with O(1) access. Even if your access has gaps the engines will probably shift to a sparse array access which will probably be, at worst, O(log N).
Accessing a property by identifier is also special. If you access the property like,
myObject.foo42
and execute this code often (that is, the speed of this matters) and with the same or similar object this is likely to be optimized into one or two machine instructions. What makes objects similar also differs for each engine but if they are constructed by the same literal or function they are more likely to be treated as similar.
No engine that does at all well on the JavaScript benchmarks will use the same algorithm for every object. They all must dynamically determine how the object is being used and try to adjust the access algorithm accordingly.

Categories

Resources