IndexedDB Performance of Index: Select and Insert - javascript

Question 1:
Q1.1
What is the complexity of IndexedDB (Javascript version) doing select or insert? Whether the indexes are "indexed"? Is it sorted or hashed? For example, when we use IDBKeyRange.only, it is taking O(1), O(log(n)), or O(n) time?
Q1.2
How about the IDBKeyRange.bound? Is it sorting the index first and then doing the select?
Q1.3
What is the performance of IDBObjectStore.add()?
Q1.4
For index.openCursor(), is the index sorted in advance?
Question 2:
We are using IDBObjectStore.createIndex() to create indexes.
If the answer is yes in Question 1 (which means the indexes are indexed), how to create index with options indexed or not indexed? In other words, I choose some indexes to be sorted or hashed, while others not. Do we have this choice?

The specification doesn't mandate performance characteristics. It's reasonable to assume an implementation using a B-tree, and therefore operations O(log n), but you'd need to test implementations in browsers. If a particular browser performs poorly on common operations, it may be worth reporting it as an issue.
Q1.1 What is the complexity of IndexedDB (Javascript version) doing
select or insert? Whether the indexes are "indexed"? Is it sorted or hashed?
Indexes are sorted. https://w3c.github.io/IndexedDB/#index-construct
Q1.2 How about the IDBKeyRange.bound? Is it sorting the index first and then doing the select?
The lookup is against the sorted index.
Q1.3 What is the performance of IDBObjectStore.add()?
Assuming a B-tree, O(log n)
Q1.4 For index.openCursor(), is the index sorted in advance?
Yes.
Although technically an implementation could create the index lazily -- the specification does not require particular performance or implementation, just that the results are indistinguishable.
Question 2:... how to create index with options indexed or not indexed? In other words, I choose some indexes to be sorted or hashed, while others not. Do we have this choice?
No - the API doesn't expose such an option.

Related

Efficiency of list.Reverse() in JavaScript?

Does anyone know how efficient .reverse() is in JS for reversing a list of objects?
I’m curious if there will be any meaningful performance loss for sorting a list (of say 100 objects) in ascending order and reversing it rather than just sorting the list in descending order.
Edit:
So I ran some tests of sorting an array descending and then ascending with reverse. The results are pretty interesting:
(Running in chromium)
1000 items in random order, descending normal sort, 10000 trials: 0.3681 ms
1000 items in random order, ascending sort then reverse, 10000 trials: 0.3651 ms
1000 items in ascending order, normal descending sort, 10000 trials: 0.0282 ms
1000 items in ascending order, ascending sort then reverse, 10000 trials: 0.0247 ms
Seems that .reverse() is not a very costly operation, especially when compared to sort.
The Ecma specification is vague about Array#sort, so the answer depends on the actual implementation, assuming we are talking about V8 (Chrome, NodeJS), then we can say that for lists with >10 elements the time complexity is O(n log(n)) while the time complexity of Array#reverse is O(n).
Given that, we can confidently say that it would be better to sort on a descending order directly since sort + reverse is evidently more expensive than just sort.
sorting in a descendent order is equal to sorting in an ascending order.
UPDATE: As stated by Jonas in the comment below, if you can notice a pattern on your list (e.g. the list is already asc-sorted), then probably you could just reverse it and save a O(n log(n)) operation. Trying to understand the shape of your data is always the first step for performance optimisations.
.reverse() works in O(n), which is evident from the linked spec. It will do a pairwise swap of elements.
To answer your question: sort + reverse is inevitably more expensive than sort alone, there is no shortcut made (as might be with a doubly linked list or some other data structure).

linked list vs arrays for dictionaries

I was recently asked in an interview about advantages and disadvantages of linked list and arrays for dictionary of words implementation and also what is the best data structure for implementing it? This where I messed up things. After googling I couldn't specifically found exact answer that is specific to dictionaries but general linked list v arrays explanation. What is the best suited answer to above question?
If you're just going to use it for lookups, then an array is the obvious best choice of the two. You can build the dictionary from a list of words in O(n log n)--just build an array and sort it. Lookups are O(log n) with a binary search.
Although you can build a linked list of words in O(n), lookups will require, on average, that you look at n/2 words. The difference is pretty large. Given an English dictionary of 128K words, a linked list lookup will take on average 64,000 string comparisons. A binary search will require at most 17.
In addition, a linked list of n words will occupy more memory than an array of n words, because you need the next pointer in the list.
If you need the ability to update the dictionary, you'll probably still want to use an array if updates are infrequent compared to lookups (which is almost certainly the case). I can't think of a real-world example of a dictionary of words that's updated more frequently than it's queried.
As others have pointed out, neither array nor linked list is the best choice for a dictionary of words. But of the two options you're given, array is superior in almost all cases.
There is no one answer.
The two obvious choices would be something based on a hash table if you only want to look up individual items, or something based on a balanced tree if you want to look up ranges of items.
A sorted array can work well if you do a lot of searching and relatively little insertion or deletion. Finding situations where linked lists are preferred is rather more difficult. Depending on the situation (especially such things as finding all the words that start with, say, "ste"), tries can also work extremely well (and often do well at minimizing the storage needed for a given set of data as well).
Those are really broad categories though, not specific implementations. There are also variations such as extensible hashing and distributed hash tables that can be useful in specific situations (and also have somewhat tree-like properties, so things like range-based searching can be reasonable efficient).
Best data structure for implementing dictionaries is suffix trees. You can also have a look at tries.
Well, if you're building a dictionary, you'd want it to be a sorted structure. So you're going for a sorted-array or a sorted linked-list.
For a linked list retrieval is O(n) since you have to examine all words until you find the one you need. For a sorted array, you can use binary search to find the right location, which is O(log n).
For a sorted array, insertion is O(log n) to find the right location (binary search) and then O(n) to insert because you need to push everything down. For a linked list, it would be O(n) to find the location and then O(1) to insert because you only have to adjust pointers. The same applies for deletion.
Since you aren't going to be updating a dictionary much, you can just build and then sort the array in O(nlog n) time (using quicksort for example). After that, lookup is O(log n) using binary search. Furthermore, as delnan mentioned below, using an array has the advantage that everything you access is sequential in memory; i.e., the data are localized (locality of reference). This minimizes cache misses (which are expensive). With a linked list, the data are spread out all over and there is no guarantee that they are close together, which increases the chance of cache-misses. With this in mind, given the two options, use the array.
You can do an even better job if you implement a sorted hashmap using a red-black tree (your tree entries, which are the keys can be coupled with a hashmap); here search, insert, and delete are O(log n). But it really depends on your behavior profile; if you're only doing lookup, a simple hashmap would be best (O(1) retrieval).
Another interesting data-structure you can use is a Trie, where insertion and lookup are O(m); m being the length of the string.

Did I just sort in O(n) on JavaScript?

Using underscorejs library, I tried to abuse the indexing of a JavaScript object, in order to sort an array a of integers or strings:
_(a).chain().indexBy(_.identity).values().value()
I realize it is kind of a "hack", but it actually yielded a sorted array in O(n) time...
Am I dreaming?
You aren't actually sorting anything.
Instead, you're building a hashtable and traversing it in hash order, which may be the same as sorted order for some sets.
It is possible to sort by O(n) using Bucket Sort http://en.wikipedia.org/wiki/Bucket_sort which is I believe what you attempted to write here, but as mentioned above you can't rely on the order of values of an object.
It is possible to sort this way in O(n) if you have limited number of values.
Your algorithm is not a comparison sort:
A comparison sort is a type of sorting algorithm that only reads the
list elements through a single abstract comparison operation (often a
"less than or equal to" operator or a three-way comparison) that
determines which of two elements should occur first in the final
sorted list.
You are using knowledge about the structure of the values (i.e. knowing that they're integers or strings) in your algorithm, by using those integers/strings as indexes. You are not adhering to the limitations imposed on a comparison sort, and thus you are not restricted to the O(n log n) boundary on time complexity.
Yes, you are dreaming :-)
It beggars belief that you would have found such a holy grail by accident. If that sequence of operations is a comparison-based sort, people who know this stuff have actually proven that it cannot be done in O(n) time.
I strongly suggest you run that code with dataset sizes of 10, 100, 1000, and so on and you'll see your assumption is incorrect.
Then check to see if you are actually sorting the array or whether this is just an artifact of its organisation. It seems very likely that the indexBy is simply creating an index structure where the order just happens to be the sort order you want, not something that would be guaranteed for all inputs.

What is the complexity of retrieval/insertion in JavaScript associative arrays (dynamic object properties) in the major javascript engines?

Take the following code example:
var myObject = {};
var i = 100;
while (i--) {
myObject["foo"+i] = new Foo(i);
}
console.log(myObject["foo42"].bar());
I have a few questions.
What kind of data structure do the major engines (IE, Mozilla, Chrome, Safari) use for storing key-value pairs? I'd hope it's some kind Binary Search tree, but I think they may use linked lists (due to the fact iterating is done in insertion order).
If they do use a search tree, is it self balancing? Because the above code with a conventional search tree will create an unbalanced tree, causing worst case scenario of O(n) for searching, rather than O(log n) for a balanced tree.
I'm only asking this because I will be writing a library which will require efficient retrieval of keys from a data structure, and while I could implement my own or an existing red-black tree I would rather use native object properties if they're efficient enough.
The question is hard to answer for a couple reasons. First, the modern browsers all heavily and dynamically optimize code while it is executing so the algorithms chosen to access the properties might be different for the same code. Second, each engine uses different algorithms and heuristics to determine which access algorithm to use. Third, the ECMA specification dictates what the result of must be, not how the result is achieved so the engines have a lot of freedom to innovate in this area.
That said, given your example all the engines I am familiar with will use some form of a hash table to retrieve the value associated with foo42 from myobject. If you use an object like an associative array JavaScript engines will tend to favor a hash table. None that I am aware of use a tree for string properties. Hash tables are worst case O(N), best case O(1) and tend to be closer to O(1) than O(N) if the key generator is any good. Each engine will have a pattern you could use to get it to perform O(N) but that will be different for each engine. A balanced tree would guarantee worst case O(log N) but modifying a balanced tree while keeping it balanced is not O(log N) and hash tables are more often better than O(log N) for string keys and are O(1) to update (once you determine you need to, which is the same big-O as read) if there is space in the table (periodically O(N) to rebuild the table but the tables usually double in space which means you will only pay O(N) 7 or 8 times for the life of the table).
Numeric properties are special, however. If you access an object using integer numeric properties that have few or no gaps in range, that is, use the object like it is an array, the values will tend to be stored in a linear block of memory with O(1) access. Even if your access has gaps the engines will probably shift to a sparse array access which will probably be, at worst, O(log N).
Accessing a property by identifier is also special. If you access the property like,
myObject.foo42
and execute this code often (that is, the speed of this matters) and with the same or similar object this is likely to be optimized into one or two machine instructions. What makes objects similar also differs for each engine but if they are constructed by the same literal or function they are more likely to be treated as similar.
No engine that does at all well on the JavaScript benchmarks will use the same algorithm for every object. They all must dynamically determine how the object is being used and try to adjust the access algorithm accordingly.

Big O of JavaScript arrays

Arrays in JavaScript are very easy to modify by adding and removing items. It somewhat masks the fact that most languages arrays are fixed-size, and require complex operations to resize. It seems that JavaScript makes it easy to write poorly performing array code. This leads to the question:
What performance (in terms of big O time complexity) can I expect from JavaScript implementations in regards to array performance?
I assume that all reasonable JavaScript implementations have at most the following big O's.
Access - O(1)
Appending - O(n)
Prepending - O(n)
Insertion - O(n)
Deletion - O(n)
Swapping - O(1)
JavaScript lets you pre-fill an array to a certain size, using new Array(length) syntax. (Bonus question: Is creating an array in this manner O(1) or O(n)) This is more like a conventional array, and if used as a pre-sized array, can allow O(1) appending. If circular buffer logic is added, you can achieve O(1) prepending. If a dynamically expanding array is used, O(log n) will be the average case for both of those.
Can I expect better performance for some things than my assumptions here? I don't expect anything is outlined in any specifications, but in practice, it could be that all major implementations use optimized arrays behind the scenes. Are there dynamically expanding arrays or some other performance-boosting algorithms at work?
P.S.
The reason I'm wondering this is that I'm researching some sorting algorithms, most of which seem to assume appending and deleting are O(1) operations when describing their overall big O.
NOTE: While this answer was correct in 2012, engines use very different internal representations for both objects and arrays today. This answer may or may not be true.
In contrast to most languages, which implement arrays with, well, arrays, in Javascript Arrays are objects, and values are stored in a hashtable, just like regular object values. As such:
Access - O(1)
Appending - Amortized O(1) (sometimes resizing the hashtable is required; usually only insertion is required)
Prepending - O(n) via unshift, since it requires reassigning all the indexes
Insertion - Amortized O(1) if the value does not exist. O(n) if you want to shift existing values (Eg, using splice).
Deletion - Amortized O(1) to remove a value, O(n) if you want to reassign indices via splice.
Swapping - O(1)
In general, setting or unsetting any key in a dict is amortized O(1), and the same goes for arrays, regardless of what the index is. Any operation that requires renumbering existing values is O(n) simply because you have to update all the affected values.
guarantee
There is no specified time complexity guarantee for any array operation. How arrays perform depends on the underlying datastructure the engine chooses. Engines might also have different representations, and switch between them depending on certain heuristics. The initial array size might or might not be such an heuristic.
reality
For example, V8 uses (as of today) both hashtables and array lists to represent arrays. It also has various different representations for objects, so arrays and objects cannot be compared. Therefore array access is always better than O(n), and might even be as fast as a C++ array access. Appending is O(1), unless you reach the size of the datastructure and it has to be scaled (which is O(n)). Prepending is worse. Deletion can be even worse if you do something like delete array[index] (don't!), as that might force the engine to change its representation.
advice
Use arrays for numeric datastructures. That's what they are meant for. That's what engines will optimize them for. Avoid sparse arrays (or if you have to, expect worse performance). Avoid arrays with mixed datatypes (as that makes internal representations more complex).
If you really want to optimize for a certain engine (and version), check its sourcecode for the absolute answer.

Categories

Resources