Possible collisions in the standard JavaScript Object hash table implementation?

Possible collisions in the standard JavaScript Object hash table implementation? - javascript

I recently happened to think about object property access times in JavaScript and came across this question which seemed to reasonably suggest that it should be constant time. This also made me wonder if there is a limit on object property key lengths. Apparently modern browsers support key lengths of upto 2^30, which seem to be quite good for a hash function. That said,
Is anyone aware of the kind of hash functions that is used by JS
engines?
Is it possible to experimentally create collisions of JavaScript's
property accessors?

Is anyone aware of the kind of hash functions that is used by JS engines?
Yes, their developers are certainly aware of the hash functions and the problems they have. In fact, attacks based on hash collisions were demonstrated in 2011 against a variety of languages, among others as a DOS attack againt node.js servers.
The v8 team solved the issue, you can read about the details at https://v8project.blogspot.de/2017/08/about-that-hash-flooding-vulnerability.html.
Is it possible to experimentally create collisions of JavaScript's property accessors?
It appears so: https://github.com/hastebrot/V8-Hash-Collision-Generator

Related

Is Set a hashed collection in JavaScript?

I was asking myself this question. Is Set a hashed collection in JavaScript?
For example, Set.prototype.has will iterate the entire Set or do its implementations use an internal hash table to locate an item within the collection?

The ECMAScript 2015 specification says that:
Set objects must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection.
Obviously they can't force a particular JS engine to actually do that, but in practice JS engines will do the right thing.

The ES6 specification does not require a specific implementation, but it does indicate that it should be better than O(n) (so better than a linear lookup). And, since the purpose of the Set object is to be efficient at looking up items in the Set, it most surely uses some sort of efficient lookup system like a hash.
If you want to know for sure how it works, you'd have to look at the open source code for the Firefox or Chrome implementations.
You could also benchmark it to prove that the lookup speed is not O(n), but something more efficient than that.

Naming conventions for JSON-serializable variables

What naming conventions are people using for variables that hold JSON-serializable objects? We'd like the name of the variable to remind us to only store information in the object that can be serialized into JSON without losing information. Applications include HTTP sessions, non-searchable database columns, data logging, and serializing restorable application state.
The obvious contenders, at least from the perspective of someone programming in Javascript for node.js, seem to fall short:
prefixJSON - JSON is actually a serialization syntax, so this
would properly be a string in JSON format, not an object
prefixInfo - info is often used in node.js for any kind of map,
including ones that take functions and instances of ES6 classes.
prefixMap - Same issue as the info suffix
prefixData - Doesn't really suggest a constraint on the type
The best I can do is prefixJSONInfo, prefixJSONData, or prefixJSONObject, but I was hoping for something more succinct and readable. The prefix may be lengthy and descriptive too.
This question mainly applies to programming languages that support variant-type variables, such as Javascript. These variables are meant to hold a mishmash, but the programmer needs to be reminded to limit the types of values that are thrown into the mishmash.

In this case, it seems that a specific acronym might serve a good purpose.
JSONSerializable
written:
JSS or Jss
Examples:
prefixJss
customerDataJss
sessionInfoJss
Or maybe even:
JSONSerializableObject
prefixJSO
prefixJso
My $0.02. This might be more opinion based, as I don't think there are any globally accepted standards for these things. There are best practices, such as you mentioned (like short but descriptive). It's up to the developer or team to determine which short and descriptive versions they like (team preference).
Trust me tho, I understand how difficult naming things is sometimes. :)

Does JSON.parse() use eval() internally? [duplicate]

This question already has answers here:
What is JSON.parse written in / Is it open source?
(4 answers)
Closed 9 years ago.
Does JSON.parse in modern browsers use eval() internally for evaluating and executing the dynamic code?
Because I have been looking through Douglas Crockford's JSON library. It also uses eval() when using parse() but after preprocessing prior to the actual evaluation. Such as:-
A wall against Unicode characters in the code.
A code shows malicious intent.
Do the modern browsers which supports JSON.parse natively perform this or they follow other protocols?

No, JSON.parse() doesn't Use eval()
This is by design, as eval() being able to execute any arbitrary JavaScript code you feed it, it could execute things you wouldn't want it to. So JSON.parse() does what it says on the tin: it actually parses the whole string and reconstructs and entire object tree.
JSON.parse is usually delegated to an internal function implemented with "native" code, where "native" means whatever is considered "native" in the context of your browser's javascript engine (could be compiled machine code, could be bytecode for a VM, etc...). I don't think there's any strong requirement on that.
Differences in the Implementations?
JSON (the notation) itself is codified by the RFC4627.
Regarding the implemetation of the JSON object and its methods, all modern browsers implementing should behave the same, as they should follow the same specifications for ECMAScript 5's JSON object. However, there's always the chance for potential defects. For instance, V8 originally contained this nasty bug.
Also, note that the implementation listed in comments above are for you to add JSON.parse() support to browsers that do not support it natively (also known as "these damn old browsers you sometimes need to support"). But it doesn't mean that it's necessarily how they implemented it.
For instance, for Google's V8 implementation used in Chrome, see json.js which invokes native code from json_parser.h.

It would be a very funny thing to do, if you think about it.
To understand why, see if this analogy helps: you're traveling with your boss to a country where you speak the language but she doesn't. Since you're fluent, you will serve two roles: as her assistant (doing tasks for her) as well as her translator (telling her what things mean).
So you have these two jobs, which are complementary. Your boss could tell you to do something--in any language you both understand (say, English)--as well as ask you to tell her what something says, like a sign or a document. She could even do both: hand you a set of instructions written in this other language and say, "This was given to me by someone I trust. Please do everything it says here."
In this analogy, reading signs or documents to your boss is like JSON.parse. Your boss handing you instructions and telling you to do everything they say is like eval.
If JavaScript engines used eval internally for JSON.parse, that would be analogous to your boss asking you what a document says, and you choosing to act out everything written in the document in order to explain it to her. Instead of just reading it.

JavaScript: Do longer keys make object lookup slower?

The question title almost says it all: do longer keys make for slower lookup?
Is:
someObj["abcdefghijklmnopqrstuv"]
Slower than:
someObj["a"]
Another sub-question is whether the type of the characters in the string used as key matters. Are alphanumeric key-strings faster?
I tried to do some research; there doesn't seem to be much info online about this. Any help/insight would be extremely appreciated.

In general no. In the majority of languages, string literals are 'interned', which hashes them and makes their lookup much faster. In general, there may be some discrepancies between different javascript engines, but overall if they're implemented well (cough IE cough), it should be fairly equal. Especially since javascript engines are constantly being developed, this is (probably) an easy thing to optimize, and the situation will improve over time.
However, some engines also have limits on the length of strings that are interned. YMMV on different browsers. We can also see some insight from the jsperf test (linked in comments for the question). Firefox obviously does much more aggressive interning.
As for the types of characters, the string is treated as just a bunch bytes no matter the charset, so that probably won't matter either. Engines might optimize keys that can be used in dot notation but I don't have any evidence for that.

The performance is the same if we are talking about Chrome which uses V8 javascript engine. Based on V8 design specifications you can see from "fast property access" and "Dynamic machine code generation" that in the end those keys end up being compiled as any other c++ class variables.

Is it acceptable style for Node.js libraries to rely on object key order?

Enumerating the keys of javascript objects replays the keys in the order of insertion:
> for (key in {'z':1,'a':1,'b'}) { console.log(key); }
z
a
b
This is not part of the standard, but is widely implemented (as discussed here):
ECMA-262 does not specify enumeration order. The de facto standard is to match
insertion order, which V8 also does, but with one exception:
V8 gives no guarantees on the enumeration order for array indices (i.e., a property
name that can be parsed as a 32-bit unsigned integer).
Is it acceptable practice to rely on this behavior when constructing Node.js libraries?

Absolutely not! It's not a matter of style so much as a matter of correctness.
If you depend on this "de facto" standard your code might fail on an ECMA-262 5th Ed. compliant interpreter because that spec does not specify the enumeration order. Moreover, the V8 engine might change its behavior in the future, say in the interest of performance, e.g.

Definitely do not rely on the order of the keys. If the standard doesn't specify an order, then implementations are free to do as they please. Hash tables often underlie objects like these, and you have no way of knowing when one might be used. Javascript has many implementations, and they are all competing to be the fastest. Key order will vary between implementations, if not now, then in the future.

No. Rely on the ECMAScript standard, or you'll have to argue with the developers about whether a "de facto standard" exists like the people on that bug.

It's not advised to rely on it naively.
You should also do your best to stick to the spec/standard.
However there are often cases where the spec or standard limits what you can do. I'm not sure in programming I've encountered many implementations that deviate or extend the specification often for reasons such as the specification doesn't cater to everything.
Sometime people using specifics of an implementation might have test cases for that, though it's hard to make a reliable test case for beys being in order. It most succeed by accident or rather it's difficult behavior to reliably produce.
If you do rely on an implementation specific then you must document that. If your project requires portability (code to run on other people's setups out of your control and you want maximum compatibility) then in this case it's not a good idea to rely on an implementation specific such as key order.
Where you do have full control of the implementation being used then it's entirely up to you which implementation specifics you use while keeping in mind you may be forced to cater to portability due to the common need or desire to upgrade implementation.
The best form of documentation for cases like this is inline, in the code itself, often with the intention of at least making it easy to identify areas to be changed should you switch from an implementation guaranteeing order to one not doing so.
You can make up the format you like but it can be something like...
/** #portability: insertion_ordered_keys */
for(let key in object) console.log();
You might even wrap such cases up in code:
forEachKeyInOrderOfInsertion(object, console.log)
Again, likely something less overly verbose but enough to identify cases dependent on that.
For where your implementation guarantees key order you're just trans late that to the same as the original for.
You can use a JS function for that with platform detection, templating like CPP, transpiling, etc. You might also want to wrap the object creation and to be very careful about things crossing boundaries. If something loses order before reaching you (like JSON decode of input from a client over the network) then you'll likely not have a solution to that solely withing your library, this can even be just if someone else is calling your library.
Though you'll likely not need those, just make cases where you do something that might break later as a minimum and document that potential exists.
An obvious exception to that is if the implementation guarantees consistency. In that case you will probably be wasting your time decorating everything if it's not really a variability and is already documented via the implementation. The implementation often is a spec or has its own, you can choose to stick to that rather than a more generalised spec.
Ultimately in each case you'll need to make a judgement call, you may also choose to take a chance. As long as you're fully aware of the potential problems including the potential of wasting time avoiding problems you wont necessarily actually have, that is you know all the stakes and have considered your circumstances, then it's up to you what to do. There's no "should" or "shouldn't", it's case specific.
If you're making node.js public libraries or libraries to be widely distributed beyond the scope of your control then I'd say it's not good to rely on implementation specifics. Instead at least have a disclaimer with the release notes that the library is only catering to your stack and that if people want to use it for others then can fix and put in a pull request. Otherwise if not documented, it should be fixed.

Develop Reference

JavaScript is the programming language of the Web.