Which is Faster? Select by ID, or Select by Index? - javascript

Backbone provides options to select models from collections both by ID (a unique identifier attribute assigned to every model) and by index. Which of these is the fastest way to access items from a collection?
Cracking open Backbone.js, I can see that collection.get(id) (the select-by-ID function) uses a simple object-literal look-up and collection.at(index) (the select-by-index function) uses a simple array look-up.
from Backbone.js:
collection.get(id):
// Get a model from the set by id.
get: function(obj) {
if (obj == null) return void 0;
return this._byId[obj] || this._byId[obj.id] || this._byId[obj.cid];
}
collection.at(index):
// Get the model at the given index.
at: function(index) {
return this.models[index];
}
Because of this, the answer to this question should be directly tied to which is faster - array access or object literal access (in this case assuming that .get is used in its first iteration, where it's sent an ID, and not a model with an ID, or CID on it).

According to this JSPerf, select by index (using collection.at(index)) is generally faster than select by ID (using collection.get(id)) but by how much varies widely by browser. On Chrome and at least one of the versions of Firefox I tested, the difference is negligible, but still systematically in favor of select by index; in IE11, however, select by index is consistently (and almost exactly) twice as fast.
The moral of the story here is to use select by index whenever possible; hashed object retrieval is fast and convenient, but lacks the raw efficiency of indexed look-ups.
To access objects from a hash, Javascript engines must go through an additional look-up step, this in addition the overall complexity of objects make them a less-than-ideal choice for any script where performance is a consideration.

Related

Realm-JS: Performant way to find the index of an element in sorted results list

I am searching for a perfomant way to find the index of a given realm-object in a sorted results list.
I am aware of this similar question, which was answered with using indexOf, so my current solution looks like this:
const sortedRecords = realm.objects('mySchema').sorted('time', true) // 'time' property is a timestamp
// grab element of interest by id (e.g. 123)
const item = realm.objectForPrimaryKey('mySchema','123')
// find index of that object in my sorted results list
const index = sortedRecords.indexOf(item)
My basic concern here is performance for lager datasets. Is the indexOf implementation of a realm-list improved for this in any way, or is it the same as from a JavaScript array? I know there is the possibility to create indexed properties, would indexing the time property improve the performance in this case?
Note:
In the realm-js api documentation, the indexOf section does not reference to Array.prototype.indexOf, as other sections do. This made me optimistic it's an own implementation, but it's not stated clearly.
Realm query methods return a Results object which is quite different from an Array object, the main difference is that the first one can change over time even without calling methods on it: adding and/or deleting record to the source schema can result in a change to Results object.
The only common thing between Results.indexOf and Array.indexOf is the name of the method.
Once said that is easy to also say that it makes no sense to compare the efficiency of the two methods.
In general, a problem common to all indexOf implementations is that they need a sequential scan and in the worst case (i.e. the not found case) a full scan is required. The wort implemented indexOf executed against 10 elements has no impact on program performances while the best implemented indexOf executed against 1M elements can have a severe impact on program performances. When possible it's always a good idea avoiding to use indexOf on large amounts of data.
Hope this helps.

Javascript: Efficiently move items in and out of a fixed-size array

If I have an array that I want to be of fixed size N for the purpose of caching the most recent of N items, then once limit N is reached, I'll have to get rid of the oldest item while adding the newest item.
Note: I don't care if the newest item is at the beginning or end of the array, just as long as the items get removed in the order that they are added.
The obvious ways are either:
push() and shift() (so that cache[0] contains the oldest item), or
unshift() and pop() (so that cache[0] contains the newest item)
Basic idea:
var cache = [], limit = 10000;
function cacheItem( item ) {
// In case we want to do anything with the oldest item
// before it's gone forever.
var oldest = [];
cache.push( item );
// Use WHILE and >= instead of just IF in case the cache
// was altered by more than one item at some point.
while ( cache.length >= limit ) {
oldest.push( cache.shift() );
}
return oldest;
}
However, I've read about memory issues with shift and unshift since they alter the beginning of the array and move everything else around, but unfortunately, one of those methods has to be used to do it this way!
Qs:
Are there other ways to do this that would be better performance-wise?
If the two ways I already mentioned are the best, are there specific advantages/disadvantages I need to be aware of?
Conclusion
After doing some more research into data structures (I've never programmed in other languages, so if it's not native to Javascript, I likely haven't heard of it!) and doing a bunch of benchmarking in multiple browsers with both small and large arrays as well as small and large numbers of reads / writes, here's what I found:
The 'circular buffer' method proposed by Bergi is hands-down THE best as far performance (for reasons explained in the answer and comments), and hence it has been accepted as the answer. However, it's not as intuitive, and makes it difficult to write your own 'extra' functions (since you always have to take offset into account). If you're going to use this method, I recommend an already-created one like this circular buffer on GitHub.
The 'pop/unpush' method is much more intuitive, and performs fairly well, accept at the most extreme numbers.
The 'copyWithin' method is, sadly, terrible for performance (tested in multiple browsers), quickly creating unacceptable latency. It also has no IE support. It's such a simple method! I wish it worked better.
The 'linked list' method, proposed in the comments by Felix Kling, is actually a really good option. I initially disregarded it because it seemed like a lot of extra stuff I didn't need, but to my surprise....
What I actually needed was a Least Recently Used (LRU) Map (which employs a doubly-linked list). Now, since I didn't specify my additional requirements in my original question, I'm still marking Bergi's answer as the best answer to that specific question. However, since I needed to know if a value already existed in my cache, and if so, mark it as the newest item in the cache, the additional logic I had to add to my circular buffer's add() method (primarily indexOf()) made it not much more efficient than the 'pop/unpush' method. HOWEVER, the performance of the LRUMap in these situations blew both of the other two out of the water!
So to summarize:
Linked List -- most options while still maintaining great performance
Circular Buffer -- best performance for just adding and getting
Pop / Unpush -- most intuitive and simplest
copyWithin -- terrible performance currently, no reason to use
If I have an array that caches the most recent of N items, once limit N is reached, I'll have to get rid of the oldest while adding the newest.
You are not looking to copy stuff around within the array, which would take O(n) steps every time.
Instead, this is the perfect use case for a ring buffer. Just keep an offset to the "start" and "end" of the list, then access your buffer with that offset and modulo its length.
var cache = new Array(10000);
cache.offset = 0;
function cacheItem(item) {
cache[cache.offset++] = item;
cache.offset %= cache.length;
}
function cacheGet(i) { // backwards, 0 is most recent
return cache[(cache.offset - 1 - i + cache.length) % cache.length];
}
You could use Array#copyWithin.
The copyWithin() method shallow copies part of an array to another location in the same array and returns it, without modifying its size.
Description
The copyWithin works like C and C++'s memmove, and is a high-performance method to shift the data of an Array. This especially applies to the TypedArray method of the same name. The sequence is copied and pasted as one operation; pasted sequence will have the copied values even when the copy and paste region overlap.
The copyWithin function is intentionally generic, it does not require that its this value be an Array object.
The copyWithin method is a mutable method. It does not alter the length of this, but will change its content and create new properties if necessary.
var array = [0, 1, 2, 3, 4, 5];
array.copyWithin(0, 1);
console.log(array);
You need to splice the existing item and put it in the front using unshift (as the newest item). If the item doesn't already exist in your cache, then you can unshift and pop.
function cacheItem( item )
{
var index = cache.indexOf( item );
index != -1 ? cache.splice( index, 1 ) : cache.pop();
cache.unshift( item );
}
item needs to be a String or Number, or otherwise you'll need to write your own implementation of indexOf using findIndex to locate and object (if item is an object).

Can one add React Keys after instantiation?

I’m making a collection of React Elements and displaying them; what follows is a trivial example to frame the problem of how-would-one-modify-an-preexisting-instantiated-element only.
var c = [
<div>A</div>,
<div>B</div>,
// ...
<div>Z</div>
];
var ListComponents = React.createClass({
render: function() {
return <div>{c}</div>;
}
});
ReactDOM.render(<ListComponents/>, document.getElementById('root'));
While the code above “works,” it renders a console message I’d rather not ignore:
Warning: Each child in an array or iterator should have a unique "key" prop.
Check the render method of `ListComponents`.
See https://fb.me/react-warning-keys for more information.
Superficially, I could just add a unique key="…" string to each element in c and be done with it.
However, that seems a quite verbose, especially since I have the data in an indexed array and a functional language that in theory can assign each key its matching index value without manually having to enter it as a source literal.
I’d love to be able to just do this...
c.forEach( (e,i) => e.key = i ); // ...or call some setter
What’s the *right* React-way to do this -and- keep the code clean?
ADDENDUM:
...for the curious or those that want to just say add a key field...
The collection I'm using is actually an array of tuples containing meta-data and a corresponding React Element, a custom Component, or some huge JSX block. The example above overly trivializes what the actual data looks like as well as its irregularities.
As the source data itself is quite long, updated often, and not maintained by a developer; it is highly error prone to missed key fields or duplicates values from manual entry. Hence the desire to do it entirely programmatically. I can not count on the data owners to do it properly. They can't read code, so ideally I'd rather not mess up the data structures with a lot of "programming goop."
The collection is manipulated a few times, putting various runs of certain elements into other dynamically created wrappers, so that the final collection is actually generated by a few transformations, filters, and maps before it is ultimately displayed.
A major shout out to Wes Bos, who came up with a clever solution that works!
The code is a simple one liner and does exactly what I was looking for:
c = c.map( (el,key) => React.cloneElement(el, {key} ));
We're building a new collection using the .cloneElement() method, which I was unaware of. That was what I needed, it turns out.
In the .map() operation, the lambda function is passed both the element and the index. It's return value is a cloned element, but with the key property set.
By cleverly naming the index element key, it allows the short notation for the expression { "key" : key }. This object augments the cloned object.
In the end, I end up with a new collection of identical objects, each with a key property set to the index.

finding all keys that map to a value in javascript. And efficient alternatives

I'm working on an application in javascript where each user is in a room. No two users share the same name and no two rooms share the same name. Currently I have it set up like this:
var userroommap = {
username: "room",
username2: "room",
username3: "room2"
}
getting the room a user is in is as simple as
userroommap["user"]
but in order to get all users which are present in a room I would have to iterate over the entire userroommap like so:
for (var x in userroommap) {
if (userroommap[x] == "room")
//user x is present in room
}
}
In my application I must know which users are in which rooms very often so I am considering using another object to hold all users in a room, something like:
var roomusermap = {
room:["username", "username2"],
room2:["username3"]
}
Adding users to the room is trivial because all you have to do is append to an array, however removing a username from a room requires iterating over the array and becomes a decent operation. This already is a decent solution to my problem, but I became curious if there was a better solution. So: is there a better way to (i) store the roomusermap, perhaps without arrays? or, alternatively (ii) find all users in a room?
The data-structure described in the previous answer is called a BiMap.
A BiMap ideally provides equivalent performance for value: keys lookup operations as for key: values lookups. It is typically implemented by internally managing two separate maps (one with a forward-mapping {key:values} and one with a reverse-mapping {value:keys}).
Here's an existing implementation to use if you're not rolling your own. https://www.npmjs.com/package/bimap
Unless you've identified a genuine, real-world performance problem, I'd stick with the simple solution.
That said, a few thoughts for you:
All modern JavaScript engines give you the Object.keys function, which returns an array of an object's own enumerable properties. This may be more efficient than your for-in loop for two reasons:
It's happening within the engine's code, which lets the engine optimize
for-in looks for enumerable properties in prototype objects, whereas Object.keys knows it's only supposed to look in that specific object
Your roomusermap can contain maps per room, it doesn't need to use arrays.
var roomusermap = {
room: {
username: user,
username2: user2
},
room2: {
username3: user3
}
};
Adding a user to a room becomes:
userroommap[username] = roomname;
roomusermap[roomname][username] = user;
Removing a user is:
delete userroommap[username];
delete roomusermap[roomname][username];
If you're seeing performance problems with those map objects, something to keep in mind is that removing a property from an object (delete) puts the object into "dictionary mode" on several JavaScript engines (having previously been in a more optimized state), significantly impacting the time required to look up properties on that object.
So in the very hypothetical case where the property lookup performance starts to be an issue, you could consider storing undefined rather than deleting the property. E.g., instead of:
delete userroommap[username];
delete roomusermap[roomname][username];
you'd do
userroommap[username] = undefined;
roomusermap[roomname][username] = undefined;
However, you'd have to adjust your checks for whether a user is in a room, and you couldn't use Object.keys (on its own) to get the list anymore since you have to weed out the properties with the value undefined. You could use Object.keys with filter:
var map = roomusermap[roomname];
var users = Object.keys(map).filter(function(username) {
return map[username] !== undefined;
});
So you'd really want to do that only if you've identifed a genuine problem caused by objects going into dictionary mode.

Dojo restricting queries?

I was reading the dojo query tutorial and saw
// retrieve an array of nodes with the class name "odd"
// from the first list using a selector
var odds1 = query("#list .odd");
// retrieve an array of nodes with the class name "odd"
// from the first list using a DOM node
var odds2 = query(".odd", document.getElementById("list"));
and they explain that odds2 is faster than odds1 because odds2 searches for .odd in the #list dom rather than the whole html dom. What I'm wondering is what are the advantages of doing odds1 (other than cleaner code, i guess)? Because it seems to me for any case where query is searching for objects within an id element the odds2 style should always be used (assuming proper id, class html use), so why doesn't dojo automatically parse the query string in odds1 to call odds2?
Well looking at the code (http://svn.dojotoolkit.org/src/dojo/trunk/query.js for query and http://svn.dojotoolkit.org/src/dojo/trunk/selector/acme.js the default selector engine) it appears that the "big" performance improvements comes from the fact that the initial DOMNode List is reduced when you give the query method some help with the document.getElementById("list"), however it appears you can just pass the query method a string of the parent node's id and achieve the same performance.
query(".odd", "list");
Then there is the fact that you can cache the DOMNode list by storing the result of document.getElementById("list") in a variable and reuse it. However in general readability (in matters that are this trivial) tends to trump performance. Considering the number of problems that a bad JavaScript interpreter can hide, having readable code can end up saving you a lot of trouble.

Categories

Resources