It may be because Sets are relatively new to Javascript but I haven't been able to find an article, on StackO or anywhere else, that talks about the performance difference between the two in Javascript. So, what is the difference, in terms of performance, between the two? Specifically, when it comes to removing, adding and iterating.
Ok, I have tested adding, iterating and removing elements from both an array and a set. I ran a "small" test, using 10 000 elements and a "big" test, using 100 000 elements. Here are the results.
Adding elements to a collection
It would seem that the .push array method is about 4 times faster than the .add set method, no matter the number of elements being added.
Iterating over and modifying elements in a collection
For this part of the test I used a for loop to iterate over the array and a for of loop to iterate over the set. Again, iterating over the array was faster. This time it would seem that it is exponentially so as it took twice as long during the "small" tests and almost four times longer during the "big" tests.
Removing elements from a collection
Now this is where it gets interesting. I used a combination of a for loop and .splice to remove some elements from the array and I used for of and .delete to remove some elements from the set. For the "small" tests, it was about three times faster to remove items from the set (2.6 ms vs 7.1 ms) but things changed drastically for the "big" test where it took 1955.1 ms to remove items from the array while it only took 83.6 ms to remove them from the set, 23 times faster.
Conclusions
At 10k elements, both tests ran comparable times (array: 16.6 ms, set: 20.7 ms) but when dealing with 100k elements, the set was the clear winner (array: 1974.8 ms, set: 83.6 ms) but only because of the removing operation. Otherwise the array was faster. I couldn't say exactly why that is.
I played around with some hybrid scenarios where an array was created and populated and then converted into a set where some elements would be removed, the set would then be reconverted into an array. Although doing this will give much better performance than removing elements in the array, the additional processing time needed to transfer to and from a set outweighs the gains of populating an array instead of a set. In the end, it is faster to only deal with a set. Still, it is an interesting idea, that if one chooses to use an array as a data collection for some big data that doesn't have duplicates, it could be advantageous performance wise, if there is ever a need to remove many elements in one operation, to convert the array to a set, perform the removal operation, and convert the set back to an array.
Array code:
var timer = function(name) {
var start = new Date();
return {
stop: function() {
var end = new Date();
var time = end.getTime() - start.getTime();
console.log('Timer:', name, 'finished in', time, 'ms');
}
}
};
var getRandom = function(min, max) {
return Math.random() * (max - min) + min;
};
var lastNames = ['SMITH', 'JOHNSON', 'WILLIAMS', 'JONES', 'BROWN', 'DAVIS', 'MILLER', 'WILSON', 'MOORE', 'TAYLOR', 'ANDERSON', 'THOMAS'];
var genLastName = function() {
var index = Math.round(getRandom(0, lastNames.length - 1));
return lastNames[index];
};
var sex = ["Male", "Female"];
var genSex = function() {
var index = Math.round(getRandom(0, sex.length - 1));
return sex[index];
};
var Person = function() {
this.name = genLastName();
this.age = Math.round(getRandom(0, 100))
this.sex = "Male"
};
var genPersons = function() {
for (var i = 0; i < 100000; i++)
personArray.push(new Person());
};
var changeSex = function() {
for (var i = 0; i < personArray.length; i++) {
personArray[i].sex = genSex();
}
};
var deleteMale = function() {
for (var i = 0; i < personArray.length; i++) {
if (personArray[i].sex === "Male") {
personArray.splice(i, 1)
i--
}
}
};
var t = timer("Array");
var personArray = [];
genPersons();
changeSex();
deleteMale();
t.stop();
console.log("Done! There are " + personArray.length + " persons.")
Set code:
var timer = function(name) {
var start = new Date();
return {
stop: function() {
var end = new Date();
var time = end.getTime() - start.getTime();
console.log('Timer:', name, 'finished in', time, 'ms');
}
}
};
var getRandom = function (min, max) {
return Math.random() * (max - min) + min;
};
var lastNames = ['SMITH','JOHNSON','WILLIAMS','JONES','BROWN','DAVIS','MILLER','WILSON','MOORE','TAYLOR','ANDERSON','THOMAS'];
var genLastName = function() {
var index = Math.round(getRandom(0, lastNames.length - 1));
return lastNames[index];
};
var sex = ["Male", "Female"];
var genSex = function() {
var index = Math.round(getRandom(0, sex.length - 1));
return sex[index];
};
var Person = function() {
this.name = genLastName();
this.age = Math.round(getRandom(0,100))
this.sex = "Male"
};
var genPersons = function() {
for (var i = 0; i < 100000; i++)
personSet.add(new Person());
};
var changeSex = function() {
for (var key of personSet) {
key.sex = genSex();
}
};
var deleteMale = function() {
for (var key of personSet) {
if (key.sex === "Male") {
personSet.delete(key)
}
}
};
var t = timer("Set");
var personSet = new Set();
genPersons();
changeSex();
deleteMale();
t.stop();
console.log("Done! There are " + personSet.size + " persons.")
OBSERVATIONS:
Set operations can be understood as snapshots within the execution stream.
We are not before a definitive substitute.
The elements of a Set class have no accessible indexes.
Set class is an Array class complement, useful in those scenarios where we need to store a collection on which to apply basic addition,
Deletion, checking and iteration operations.
I share some test of performance. Try to open your console and copypaste the code below.
Creating an array (125000)
var n = 125000;
var arr = Array.apply( null, Array( n ) ).map( ( x, i ) => i );
console.info( arr.length ); // 125000
1. Locating an Index
We compared the has method of Set with Array indexOf:
Array/indexOf (0.281ms) | Set/has (0.053ms)
// Helpers
var checkArr = ( arr, item ) => arr.indexOf( item ) !== -1;
var checkSet = ( set, item ) => set.has( item );
// Vars
var set, result;
console.time( 'timeTest' );
result = checkArr( arr, 123123 );
console.timeEnd( 'timeTest' );
set = new Set( arr );
console.time( 'timeTest' );
checkSet( set, 123123 );
console.timeEnd( 'timeTest' );
2. Adding a new element
We compare the add and push methods of the Set and Array objects respectively:
Array/push (1.612ms) | Set/add (0.006ms)
console.time( 'timeTest' );
arr.push( n + 1 );
console.timeEnd( 'timeTest' );
set = new Set( arr );
console.time( 'timeTest' );
set.add( n + 1 );
console.timeEnd( 'timeTest' );
console.info( arr.length ); // 125001
console.info( set.size ); // 125001
3. Deleting an element
When deleting elements, we have to keep in mind that Array and Set do not start under equal conditions. Array does not have a native method, so an external function is necessary.
Array/deleteFromArr (0.356ms) | Set/remove (0.019ms)
var deleteFromArr = ( arr, item ) => {
var i = arr.indexOf( item );
i !== -1 && arr.splice( i, 1 );
};
console.time( 'timeTest' );
deleteFromArr( arr, 123123 );
console.timeEnd( 'timeTest' );
set = new Set( arr );
console.time( 'timeTest' );
set.delete( 123123 );
console.timeEnd( 'timeTest' );
Read the full article here
Just the Property Lookup, little or zero writes
If property lookup is your main concern, here are some numbers.
JSBench tests https://jsbench.me/3pkjlwzhbr/1
// https://jsbench.me/3pkjlwzhbr/1
// https://docs.google.com/spreadsheets/d/1WucECh5uHlKGCCGYvEKn6ORrQ_9RS6BubO208nXkozk/edit?usp=sharing
// JSBench forked from https://jsbench.me/irkhdxnoqa/2
var theArr = Array.from({ length: 10000 }, (_, el) => el)
var theSet = new Set(theArr)
var theObject = Object.assign({}, ...theArr.map(num => ({ [num]: true })))
var theMap = new Map(theArr.map(num => [num, true]))
var theTarget = 9000
// Array
function isTargetThereFor(arr, target) {
const len = arr.length
for (let i = 0; i < len; i++) {
if (arr[i] === target) {
return true
}
}
return false
}
function isTargetThereForReverse(arr, target) {
const len = arr.length
for (let i = len; i > 0; i--) {
if (arr[i] === target) {
return true
}
}
return false
}
function isTargetThereIncludes(arr, target) {
return arr.includes(target)
}
// Set
function isTargetThereSet(numberSet, target) {
return numberSet.has(target)
}
// Object
function isTargetThereHasOwnProperty(obj, target) {
return obj.hasOwnProperty(target)
}
function isTargetThereIn(obj, target) {
return target in obj
}
function isTargetThereSelectKey(obj, target) {
return obj[target]
}
// Map
function isTargetThereMap(numberMap, target) {
return numberMap.has(target)
}
Array
for loop
for loop (reversed)
array.includes(target)
Set
set.has(target)
Object
obj.hasOwnProperty(target)
target in obj <- 1.29% slower
obj[target] <- fastest
Map
map.has(target) <- 2.94% slower
Results from January 2021, Chrome 87
Results from other browsers are most welcome, please update this answer.
You can use this spreadsheet to make a nice screenshot.
JSBench test forked from Zargold's answer.
For the iteration part of your question, I recently ran this test and found that Set much outperformed an Array of 10,000 items (around 10x the operations could happen in the same timeframe). And depending on the browser either beat or lost to Object.hasOwnProperty in a like for like test. Another interesting point is that Objects do not have officially guaranteed order, whereas Set in JavaScript is implemented as an OrderedSet and does maintain the order of insertion.
Both Set and Object have their "has" method performing in what seems to be amortized to O(1), but depending on the browser's implementation a single operation could take longer or faster. It seems that most browsers implement key in Object faster than Set.has(). Even Object.hasOwnProperty which includes an additional check on the key is about 5% faster than Set.has() at least for me on Chrome v86.
https://jsperf.com/set-has-vs-object-hasownproperty-vs-array-includes/1
Update: 11/11/2020: https://jsbench.me/irkhdxnoqa/2
In case you want to run your own tests with different browsers/environments.
At this period of time (when this test ran): Chrome's V8 clearly only optimized for Objects: The following is a snapshot for Chrome v86 in November 2020.
For loop: 104167.14 ops/s ‡ 0.22% Slowest
Array.includes: 111524.8 ops/s ‡ 0.24% 1.07x more ops/s than for loop (9k iterations for both)
For loop reversed: 218074.48 ops/s ‡ 0.59% 1.96x more ops/s than non-reversed Array.includes (9k iterations)
Set.has: 154744804.61 ops/s ‡ 1.88% 709.6x more ops/s than for loop reverse (only 1k iterations since target is on right side)
hasOwnProperty: 161399953.02 ops/s ‡ 1.81% 1.043x more ops/s than Set.has
key in myObject: 883055194.54 ops/s ‡ 2.08% ... 5x more ops/sec than myObject.hasOwnProperty.
Update: 11/10/2022: I re-ran (2 years after my original image) the same tests on Safari and Chrome today and had some interesting results: TLDR Set is equally fast if not faster than using key in Object and way faster than using Object.hasOwnProperty for both browsers. Chrome also has somehow dramatically optimized Array.includes to the extent that it is in the same realm of speed as Object/Set look up time (whereas for loops take 1000+x longer to complete).
For Safari Set is significantly faster than key in Object and Object.hasOwnProperty is barely in the same realm of speed. All array variants (for loops/includes) are as expected dramatically slower than set/object look ups.
Snapshot 11/10/2022: Tested On Safari v16.1 Operations per second (higher = faster):
mySet.has(key): 1,550,924,292.31
key in myObject: 942,192,599.63 (39.25% slower aka using Set you can perform around 1.6x more operations per second
myObject.hasOwnProperty(key): 21,363,224.51 (98.62% slower) aka you can perform about 72.6x more Set.has operations as hasOwnProperty checks in 1 second.
Reverse For loop 619,876.17 ops/s (target is 9,000 out of 10,000-so reverse for loop means iterating only 1,000 times vs 9,000) meaning you can do 2502x more Set look ups than for loop checks even when you know the item's position is advantageous.
for loop: 137,434 ops/s: as expected is even slower but surprisingly not much slower: Reverse for loop which involves 1/9th the loop iterations is only about 4.5x faster than for loop.
Array.includes(target) 111,076 ops/s is a bit slower still than the for loop manually checking for target you can perform 1.23x checks manually for each check of includes.
On Chrome v107.0.5304.87 11/10/2022: It is no longer true that Set significantly underperforms Object in operation: they now nearly tie. (Though the expected behavior is that set would outperform Object in due to the smaller possible of options with a set vs an object and how this is the behavior in Safari.) Notably impressive Array.includes has apparently been significantly optimized in Chrome (v8) for at least this type of test:
Object in finished 792894327.81 ops/s ‡ 2.51% Fastest
Set.prototype.has finished 790523864.11 ops/s ‡ 2.22% Fastest
Array.prototype.includes finished 679373215.29 ops/s ‡ 1.82% 14.32% slower
Object.hasOwnProperty finished 154217006.71 ops/s ‡ 1.31% 80.55% slower
for loop finished 103015.26 ops/s + 0.98% 99.99% slower
My observation is that a Set is always better with two pitfalls for large arrays in mind :
a) The creation of Sets from Arrays must be done in a for loop with a precached length.
slow (e.g. 18ms) new Set(largeArray)
fast (e.g. 6ms)
const SET = new Set();
const L = largeArray.length;
for(var i = 0; i<L; i++) { SET.add(largeArray[i]) }
b) Iterating could be done in the same way because it is also faster than a for of loop ...
See https://jsfiddle.net/0j2gkae7/5/
for a real life comparison to
difference(), intersection(), union() and uniq() ( + their iteratee companions etc.) with 40.000 elements
Let's consider the case where you want to maintain a set of unique values. Using a Set:
set.add(value);
and with an array:
if (arr.indexOf(value) === -1)
arr.push(value);
While Set has better algorithmic complexity (O(1) or O(log(n)) implementation depending), it likely has a bit more overhead in maintaining its internal tree/table. At what size does the overhead of the Set become worth it? Here is the data I gathered from benchmarking for an average use case (see benchmarking code):
Less than about ~60 unique elements, the array is faster. Greater than ~60 elements and a Set becomes faster.
console.time("set")
var s = new Set()
for(var i = 0; i < 10000; i++)
s.add(Math.random())
s.forEach(function(e){
s.delete(e)
})
console.timeEnd("set")
console.time("array")
var s = new Array()
for(var i = 0; i < 10000; i++)
s.push(Math.random())
s.forEach(function(e,i){
s.splice(i)
})
console.timeEnd("array")
Those three operations on 10K items gave me:
set: 7.787ms
array: 2.388ms
Related
Link to the problem
In a nutshell:
N is an integer that represents a number of counters and the max counter allowed;
A is an array that represents the operation done on a specific counter (for example, if A[0] is 1 and N is 3, we need to add 1 to counter[0]);
If an element in A is N+1, all elements of the counter should be changed to the largest number in the counter array.
I submitted the code I wrote and got only 60% in performance. Why is that? Any way I should approach a problem next time to make it more efficient? How can I improve?
function solution(N,A){
let counters = Array(N).fill(0);
let maxCounter = 0;
for(i=0;i<A.length;i++){
if(A[i]<=N){
counters[A[i]-1]++
if(counters[A[i]-1]>maxCounter){maxCounter = counters[A[i]-1]};
}
else if(A[i]===N+1){
counters = Array(N).fill(maxCounter)
}
}
return counters
}
Edit: I didn't know that this website wasn't meant for questions regarding code improvement, thanks, I will ask somewhere else.
The 60% score for efficiency is because of the two last test cases where over 10,000 "max counter" operations get performed. https://app.codility.com/demo/results/trainingA86B4F-NDB/
Each of those operations has to iterate through the counter array, which may have as many as 100,000 elements. That works out to a total of 1 billion writes, so the reason for the performance problem comes quickly in sight.
To improve this and bring this number down, we can eliminate the needless consecutive "max counter" operations by, for example, introducing a flag denoting whether the counter array is already maxed and there is no need to iterate through it all over again.
Sample code:
const solution = (n, arr) => {
const counter = new Array(n).fill(0);
let max = 0, counterMaxed = false;
for (let curr of arr) {
if (curr > n) {
if (!counterMaxed) { counter.fill(max); counterMaxed = true; }
continue;
}
curr--; counter[curr]++; counterMaxed = false;
if (counter[curr] > max) { max = counter[curr]; }
}
return counter;
};
This gets a straight 100% score:
https://app.codility.com/demo/results/training3H48RM-6EG/
One possible improvement would be, when you need to fill the whole array with the new max counters, don't create an entirely new array to do so - instead, change the existing array.
else if(A[i]===N+1){
counters.fill(maxCounter)
}
This could have a large effect if there are a whole lot of counters.
Here is a solution using object which will generate the actual array only once (after all operations are applied).
The "tracker" object will only ever hold the indices & values of those counters which have an operation. Say, N (ie, "num" number of counters) is 50,000 but only 5,000 counters have an explicit operation in A (ie, "arr" array), only those 5,000 elements will be tracked (using the "tracker" object).
Code Snippet
// alternative solution using object
const counterOps = (num, arr) => {
// the "tracker" object we will use to populate the result array
const tracker = {};
// when "num+1" to reset all elements to "max", then
// the "max" value is stored in "lastMaxSet"
let lastMaxSet = 0;
// helper method to get current max from "tracker"
const getMax = obj => Math.max(...Object.values(obj));
// helper method to "set" "all" values to "max"
const setMax = obj => {
lastMaxSet = getMax(obj);
Object.keys(obj).forEach(k => obj[k] = lastMaxSet);
};
// iterate through "arr" and "apply" each "operation"
arr.forEach(elt => {
if (elt === num + 1) {
// "reset to max" operation is applied
setMax(tracker);
} else {
// a particular counter is incremented
const k = elt - 1;
tracker[k] ??= lastMaxSet;
tracker[k]++;
}
});
// the "tracker" object is used to generate
// the result-array on-the-fly
return [...Array(num).fill(lastMaxSet)].map(
(val, idx) => idx in tracker ? tracker[idx] : val
);
};
console.log(counterOps(5, [3, 4, 4, 6, 1, 4, 4]));
.as-console-wrapper { max-height: 100% !important; top: 0 }
Please try out and share feedback if it helped at all (with performance).
I am looking for an algorithm to merge multiple sorted sequences, lets say X sorted sequences with n elements, into one sorted sequence in javascript , can you provide some examples?
note: I do not want to use any library.
Trying to solve https://icpc.kattis.com/problems/stacking
what will be the minimal number of operations needed to merge sorted arrays, under conditions :
Split: a single stack can be split into two stacks by lifting any top portion of the stack and putting it aside to form a new stack.
Join: two stacks can be joined by putting one on top of the other. This is allowed only if the bottom plate of the top stack is no larger than the top plate of the bottom stack, that is, the joined stack has to be properly ordered.
History
This problem has been solved for more than a century, going back to Hermann Hollerith and punchcards. Huge sets of punchcards, such as those resulting from a census, were sorted by dividing them into batches, sorting each batch, and then merging the sorted batches--the so-called
"merge sort". Those tape drives you see spinning in 1950's sci-fi movies were most likely merging multiple sorted tapes onto one.
Algorithm
All the algorithms you need can be found at https://en.wikipedia.org/wiki/Merge_algorithm. Writing this in JS is straightforward. More information is available in the question Algorithm for N-way merge. See also this question, which is an almost exact duplicate, although I'm not sure any of the answers are very good.
The naive concat-and-resort approach does not even qualify as an answer to the problem. The somewhat naive take-the-next-minimum-value-from-any-input approach is much better, but not optimal, because it takes more time than necessary to find the next input to take a value from. That is why the best solution using something called a "min-heap" or a "priority queue".
Simple JS solution
Here's a real simple version, which I make no claim to be optimized, other than in the sense of being able to see what it is doing:
const data = [[1, 3, 5], [2, 4]];
// Merge an array or pre-sorted arrays, based on the given sort criteria.
function merge(arrays, sortFunc) {
let result = [], next;
// Add an 'index' property to each array to keep track of where we are in it.
arrays.forEach(array => array.index = 0);
// Find the next array to pull from.
// Just sort the list of arrays by their current value and take the first one.
function findNext() {
return arrays.filter(array => array.index < array.length)
.sort((a, b) => sortFunc(a[a.index], b[b.index]))[0];
}
// This is the heart of the algorithm.
while (next = findNext()) result.push(next[next.index++]);
return result;
}
function arithAscending(a, b) { return a - b; }
console.log(merge(data, arithAscending));
The above code maintains an index property on each input array to remember where we are. The simplistic alternative would be to shift the element from the front of each array when it is its turn to be merged, but that would be rather inefficient.
Optimizing finding the next array to pull from
This naive implementation of findNext, to find the array to pull the next value from, simply sorts the list of inputs by the first element, and takes the first array in the result. You can optimize this by using a "min-heap" to manage the arrays in sorted order, which removes the need to resort them each time. A min-heap is a tree, consisting of nodes, where each node contains a value which is the minimum of all values below, with left and right nodes giving additional (greater) values, and so on. You can find information on a JS implementation of a min-heap here.
A generator solution
It might be slightly cleaner to write this as a generator which takes a list of iterables as inputs, which includes arrays.
// Test data.
const data = [[1, 3, 5], [2, 4]];
// Merge an array or pre-sorted arrays, based on the given sort criteria.
function* merge(iterables, sortFunc) {
let next;
// Create iterators, with "result" property to hold most recent result.
const iterators = iterables.map(iterable => {
const iterator = iterable[Symbol.iterator]();
iterator.result = iterator.next();
return iterator;
});
// Find the next iterator whose value to use.
function findNext() {
return iterators
.filter(iterator => !iterator.result.done)
.reduce((ret, cur) => !ret || cur.result.value < ret.result.value ? cur : ret,
null);
}
// This is the heart of the algorithm.
while (next = findNext()) {
yield next.result.value;
next.result = next.next();
}
}
function arithAscending(a, b) { return a - b; }
console.log(Array.from(merge(data, arithAscending)));
The naive approach is concatenating all the k sequences, and sort the result. But if each sequence has n elements, the the cost will be O(k*n*log(k*n)). Too much!
Instead, you can use a priority queue or heap. Like this:
var sorted = [];
var pq = new MinPriorityQueue(function(a, b) {
return a.number < b.number;
});
var indices = new Array(k).fill(0);
for (var i=0; i<k; ++i) if (sequences[i].length > 0) {
pq.insert({number: sequences[i][0], sequence: i});
}
while (!pq.empty()) {
var min = pq.findAndDeleteMin();
sorted.push(min.number);
++indices[min.sequence];
if (indices[min.sequence] < sequences[i].length) pq.insert({
number: sequences[i][indices[min.sequence]],
sequence: min.sequence
});
}
The priority queue only contains at most k elements simultaneously, one for each sequence. You keep extracting the minimum one, and inserting the following element in that sequence.
With this, the cost will be:
k*n insertions to a heap of k elements: O(k*n)
k*n deletions in a heap of k elements: O(k*n*log(k))
Various constant operations for each number: O(k*n)
So only O(k*n*log(k))
Just add them into one big array and sort it.
You could use a heap, add the first element of each sequence to it, pop the lowest one (that's your first merged element), add the next element from the sequence of the popped element and continue until all sequences are over.
It's much easier to just add them into one big array and sort it, though.
This is a simple javascript algo I came up with. Hope it helps. It will take any number of sorted arrays and do a merge. I am maintaining an array for index of positions of the arrays. It basically iterates through the index positions of each array and checks which one is the minimum. Based on that it picks up the min and inserts into the merged array. Thereafter it increments the position index for that particular array. I feel the time complexity can be improved. Will post back if I come up with a better algo, possibly using a min heap.
function merge() {
var mergedArr = [],pos = [], finished = 0;
for(var i=0; i<arguments.length; i++) {
pos[i] = 0;
}
while(finished != arguments.length) {
var min = null, selected;
for(var i=0; i<arguments.length; i++) {
if(pos[i] != arguments[i].length) {
if(min == null || min > arguments[i][pos[i]]) {
min = arguments[i][pos[i]];
selected = i;
}
}
}
mergedArr.push(arguments[selected][pos[selected]]);
pos[selected]++;
if(pos[selected] == arguments[selected].length) {
finished++;
}
}
return mergedArr;
}
This is a beautiful question. Unlike concatenating the arrays and applying a .sort(); a simple dynamical programming approach with .reduce() would yield a result in O(m.n) time complexity. Where m is the number of arrays and n is their average length.
We will handle the arrays one by one. First we will merge the first two arrays and then we will merge the result with the third array and so on.
function mergeSortedArrays(a){
return a.reduce(function(p,c){
var pc = 0,
cc = 0,
len = p.length < c.length ? p.length : c.length,
res = [];
while (p[pc] !== undefined && c[cc] !== undefined) p[pc] < c[cc] ? res.push(p[pc++])
: res.push(c[cc++]);
return p[pc] === undefined ? res.concat(c.slice(cc))
: res.concat(p.slice(pc));
});
}
var sortedArrays = Array(5).fill().map(_ => Array(~~(Math.random()*5)+5).fill().map(_ => ~~(Math.random()*20)).sort((a,b) => a-b));
sortedComposite = mergeSortedArrays(sortedArrays);
sortedArrays.forEach(a => console.log(JSON.stringify(a)));
console.log(JSON.stringify(sortedComposite));
OK as per #Mirko Vukušić's comparison of this algorithm with .concat() and .sort(), this algorithm is still the fastest solution with FF but not with Chrome. The Chrome .sort() is actually very fast and i can not make sure about it's time complexity. I just needed to tune it up a little for JS performance without touching the essence of the algorithm at all. So now it seems to be faster than FF's concat and sort.
function mergeSortedArrays(a){
return a.reduce(function(p,c){
var pc = 0,
pl =p.length,
cc = 0,
cl = c.length,
res = [];
while (pc < pl && cc < cl) p[pc] < c[cc] ? res.push(p[pc++])
: res.push(c[cc++]);
if (cc < cl) while (cc < cl) res.push(c[cc++]);
else while (pc < pl) res.push(p[pc++]);
return res;
});
}
function concatAndSort(a){
return a.reduce((p,c) => p.concat(c))
.sort((a,b) => a-b);
}
var sortedArrays = Array(5000).fill().map(_ => Array(~~(Math.random()*5)+5).fill().map(_ => ~~(Math.random()*20)).sort((a,b) => a-b));
console.time("merge");
mergeSorted = mergeSortedArrays(sortedArrays);
console.timeEnd("merge");
console.time("concat");
concatSorted = concatAndSort(sortedArrays);
console.timeEnd("concat");
5000 random sorted arrays of random lengths between 5-10.
es6 syntax:
function mergeAndSort(arrays) {
return [].concat(...arrays).sort()
}
function receives array of arrays to merge and sort.
*EDIT: as cought by #Redu, above code is incorrect. Default sort() if sorting function is not provided, is string Unicode. Fixed (and slower) code is:
function mergeAndSort(arrays) {
return [].concat(...arrays).sort((a,b)=>a-b)
}
I have array and I want to remove N elements from its head.
Lets say, array (with floats) has 1M elements and I want first 500K to go out. I have two ways, call shift 500K times in loop or call splice(0,500000).
The thing is, first solution is horrible idea (it's very very slow). Second is slow too because splice returns removed part from array in a new array (well, it just allocate 500K floats and throw them out of window).
In my app, I'm doing some things with really big matrices, and unfortunately, elements removal via splice is slow for me. Is there some faster way how to achieve it?
I would expect that Array#slice would be at least as fast as either of those options and probably faster. It does mean temporarily allocating duplicated memory, but 1M numbers is only about 64MB of memory (assuming the JavaScript engine has been able to use a true array under the covers), so temporarily having the original 64MB plus the 32MB for the ones you want to keep before releasing the original 64MB seems fairly cheap:
array = array.slice(500000);
This also has the advantage that it won't force the JavaScript engine into using an object rather than an array under the covers. (Other things you're doing may cause that, but...)
You've said you're doing this with floats, you might look at using Float64Array rather than untyped arrays. That limits the operations you can perform, but ensures that you don't end up with unoptimized arrays. When you delete entries from arrays, you can end up with unoptimized arrays with markedly slower access times than optimized arrays, as they end up being objects with named properties rather than offset accesses. (A good JavaScript engine will keep them optimized if it can; using typed arrays would help prevent you from blowing its optimizations.)
This (dashed off and quite certainly flawed) NodeJS test suggests that splice is anywhere from 60% to 95% slower than slice, and that V8 does a great job keeping the array optimized as the result for the typed array is virtually identical to the result for the untyped array in the slice case:
"use strict";
let sliceStats = createStats();
let sliceTypedStats = createStats();
let spliceStats = createStats();
for (let c = 0; c < 100; ++c) {
if (test(buildUntyped, sliceStats, testSlice).length != 500000) throw new Error("1");
if (test(buildTyped, sliceTypedStats, testSlice).length != 500000) throw new Error("2");
if (test(buildUntyped, spliceStats, testSplice).length != 500000) throw new Error("3");
console.log(c);
}
console.log("slice ", avg(sliceStats.sum, sliceStats.count));
console.log("sliceTyped", avg(sliceTypedStats.sum, sliceTypedStats.count));
console.log("splice ", avg(spliceStats.sum, spliceStats.count));
function avg(sum, count) {
return (sum / count).toFixed(3);
}
function createStats() {
return {
count: 0,
sum: 0
};
}
function buildUntyped() {
let a = [];
for (let n = 0; n < 1000000; ++n) {
a[n] = Math.random();
}
return a;
}
function buildTyped() {
let a = new Float64Array(1000000);
for (let n = 0; n < 1000000; ++n) {
a[n] = Math.random();
}
return a;
}
function test(build, stats, f) {
let a;
let ignore = 0;
let start = Date.now();
for (let i = 0; i < 10; ++i) {
let s = Date.now();
a = build();
ignore += Date.now() - s;
a = f(a);
}
stats.sum += Date.now() - start - ignore;
++stats.count;
return a;
}
function testSlice(a) {
return a.slice(500000);
}
function testSplice(a) {
a.splice(0, 500000);
return a;
}
Immutable.js solves this problem by structural sharing. It does not copy the entries as splice would do but returns a reference on the included
parts of the array. You would need to move your array to the Immutable.js data structure and then call the immutable operation splice.
Why would the Sizzle selector engine use push.apply( results.... ) over results.push(...) it seems unnecessary to me. Can someone explain the motivation?
To elaborate, I've become interested in writing/borrowing bits from sizzle for a lighter weight selector engine. I figure I don't need some things like :contains(text) which would reduce the weight even further. So reading through the source I see
var arr = [],
push = arr.push
results = results || [];
....
push.apply( results, context.getElementsByTagName( selector ) );
The code makes sense, except wouldn't it be simpler to use
results.push( context.getElementsByTagName( selector ) );
I don't intend to be naggy about such a minor convention, I just want to know if I'm missing something like a context issue.
It is instead of:
results.concat(array)
Because concat creates an extra array, but push.apply won't:
push.apply(results, array)
The results array is cached and no extra arrays are created.
But you could also do:
results.push.apply(results, array)
I'm not sure why the need for arr.
Edit:
I'm thinking the need for the extra arr might be to convert the pseudo-array that getElementsByTagName returns into a real array.
Looking over the code again (after taking a break). Around line 205, Sizzle checks if the selector pattern is an ID and uses results.push
elem = context.getElementById( m );
results.push( elem );
return results;
Line 237 onwards is for Elements or Classes and uses getElementsByTagName or getElementsByClassName along with push.apply( results, ... ).
I assume its a short hand version of
for( elem in context.getElementsByClassName( m ) ) {
results.push( elem );
}
As is the case in the Mozzila docs example https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/apply
// short hand
var max = Math.max.apply(null, numbers);
var min = Math.min.apply(null, numbers);
/* vs. simple loop based algorithm */
max = -Infinity, min = +Infinity;
for (var i = 0; i < numbers.length; i++) {
if (numbers[i] > max)
max = numbers[i];
if (numbers[i] < min)
min = numbers[i];
}
EDIT:
From my original question results.push( context.getElementsByTagName( selector ) ); would result in an unwanted Object. This pushes the one argument of type NodeList into results.
Example:
var a = [1, 2, 3], b = [], c =[];
b.push( a ); // b.length = 1, now we have a multidimensional array
[].push.apply( c, a ); // c.length = 3, we now have a clean array, not a NodeList
As a side result of testing some code I wrote a small function to compare the speed of using the array.push(value) method vs direct addressing array[n] = value. To my surprise the push method often showed to be faster especially in Firefox and sometimes in Chrome. Just out of curiosity: anyone has an explanation for it?
Here's the test (note: rewritten 2023/02/10)
const arrLen = 10_000;
const x = [...Array(10)].map( (_, i) => testArr(arrLen, i));
console.log(`Array length: ${arrLen}\n--------\n${x.join(`\n`)}`);
function testArr(n, action) {
let arr = [];
const perfStart = performance.now();
const methods =
` for (; n; n--) arr.push(n)
for (; i < n; i += 1) { arr[i] = i; }
for (; i < n; i += 1) arr.push(i)
while (--n) arr.push(n)
while (i++ < n) arr.push(n)
while (--n) arr.splice(0, 0, n)
while (--n) arr.unshift(n)
while (++i < n) arr.unshift(i)
while (--n) arr.splice(n - 1, 0, n)
while (n--) arr[n] = n`.split(`\n`).map(v => v.trim());
const report = i => `${methods[i]}: ${
(performance.now() - perfStart).toFixed(2)} milliseconds`;
let i = 0;
switch (action) {
case 0: for (; n; n--) arr.push(n)
case 1: for (; i < n; i += 1) { arr[i] = i; } break;
case 2: for (let i = 0; i < n; i += 1) arr.push(i); break;
case 3: while (--n) arr.push(n); break;
case 4: while (i++ < n) arr.push(n); break;
case 5: while (--n) arr.splice(0, 0, n); break;
case 6: while (--n) arr.unshift(n)
case 7: while (++i < n) arr.unshift(i); break;
case 8: while (--n) arr.splice(n - 1, 0, n); break;
default: while (n--) arr[n] = n;
}
return report(action);
}
.as-console-wrapper {
max-height: 100% !important;
}
All sorts of factors come into play, most JS implementations use a flat array that converts to sparse storage if it becomes necessary later on.
Basically the decision to become sparse is a heuristic based on what elements are being set, and how much space would be wasted in order to remain flat.
In your case you are setting the last element first, which means the JS engine will see an array that needs to have a length of n but only a single element. If n is large enough this will immediately make the array a sparse array -- in most engines this means that all subsequent insertions will take the slow sparse array case.
You should add an additional test in which you fill the array from index 0 to index n-1 -- it should be much, much faster.
In response to #Christoph and out of a desire to procrastinate, here's a description of how arrays are (generally) implemented in JS -- specifics vary from JS engine to JS engine but the general principle is the same.
All JS Objects (so not strings, numbers, true, false, undefined, or null) inherit from a base object type -- the exact implementation varies, it could be C++ inheritance, or manually in C (there are benefits to doing it in either way) -- the base Object type defines the default property access methods, eg.
interface Object {
put(propertyName, value)
get(propertyName)
private:
map properties; // a map (tree, hash table, whatever) from propertyName to value
}
This Object type handles all the standard property access logic, the prototype chain, etc.
Then the Array implementation becomes
interface Array : Object {
override put(propertyName, value)
override get(propertyName)
private:
map sparseStorage; // a map between integer indices and values
value[] flatStorage; // basically a native array of values with a 1:1
// correspondance between JS index and storage index
value length; // The `length` of the js array
}
Now when you create an Array in JS the engine creates something akin to the above data structure. When you insert an object into the Array instance the Array's put method checks to see if the property name is an integer (or can be converted into an integer, e.g. "121", "2341", etc.) between 0 and 2^32-1 (or possibly 2^31-1, i forget exactly). If it is not, then the put method is forwarded to the base Object implementation, and the standard [[Put]] logic is done. Otherwise the value is placed into the Array's own storage, if the data is sufficiently compact then the engine will use the flat array storage, in which case insertion (and retrieval) is just a standard array indexing operation, otherwise the engine will convert the array to sparse storage, and put/get use a map to get from propertyName to value location.
I'm honestly not sure if any JS engine currently converts from sparse to flat storage after that conversion occurs.
Anyhoo, that's a fairly high level overview of what happens and leaves out a number of the more icky details, but that's the general implementation pattern. The specifics of how the additional storage, and how put/get are dispatched differs from engine to engine -- but this is the clearest i can really describe the design/implementation.
A minor addition point, while the ES spec refers to propertyName as a string JS engines tend to specialise on integer lookups as well, so someObject[someInteger] will not convert the integer to a string if you're looking at an object that has integer properties eg. Array, String, and DOM types (NodeLists, etc).
These are the result I get with your test
on Safari:
Array.push(n) 1,000,000 values: 0.124
sec
Array[n .. 0] = value
(descending) 1,000,000 values: 3.697
sec
Array[0 .. n] = value (ascending)
1,000,000 values: 0.073 sec
on FireFox:
Array.push(n) 1,000,000 values: 0.075 sec
Array[n .. 0] = value (descending) 1,000,000 values: 1.193 sec
Array[0 .. n] = value (ascending) 1,000,000 values: 0.055 sec
on IE7:
Array.push(n) 1,000,000 values: 2.828 sec
Array[n .. 0] = value (descending) 1,000,000 values: 1.141 sec
Array[0 .. n] = value (ascending) 1,000,000 values: 7.984 sec
According to your test the push method seems to be better on IE7 (huge difference), and since on the other browsers the difference is small, it seems to be the push method really the best way to add element to an array.
But I created another simple test script to check what method is fast to append values to an array, the results really surprised me, using Array.length seems to be much faster compared to using Array.push, so I really don't know what to say or think anymore, I'm clueless.
BTW: on my IE7 your script stops and browsers asks me if I want to let it go on (you know the typical IE message that says: "Stop runnign this script? ...")
I would recoomend to reduce a little the loops.
push() is a special case of the more general [[Put]] and therefore can be further optimized:
When calling [[Put]] on an array object, the argument has to be converted to an unsigned integer first because all property names - including array indices - are strings. Then it has to be compared to the length property of the array in order to determine whether or not the length has to be increased. When pushing, no such conversion or comparison has to take place: Just use the current length as array index and increase it.
Of course there are other things which will affect the runtime, eg calling push() should be slower than calling [[Put]] via [] because the prototype chain has to be checked for the former.
As olliej pointed out: actual ECMAScript implementations will optimize the conversion away, ie for numeric property names, no conversion from string to uint is done but just a simple type check. The basic assumption should still hold, though its impact will be less than I originally assumed.
array[n] = value, when previously initialised with a length (like new Array(n)), is faster than array.push, when ascending when n >= 90.
From inspecting the javascript source code of your page, your Array[0 .. n] = value (ascending) test does not initialize the array with a length in advance.
So Array.push(n) sometimes comes ahead on the first run, but on subsequent runs of your test then Array[0 .. n] = value (ascending) actually consistently performs best (in both Safari and Chrome).
If the code is modified so it initialises an array with a length in advance like var array = new Array(n) then Array[0 .. n] = value (ascending) shows that array[n] = value performs 4.5x to 9x faster than Array.push(n) in my rudimentary running of this specific test code.
This is consistent with other tests, like #Timo Kähkönen reported. See specifically this revision of the test he mentioned: https://jsperf.com/push-method-vs-setting-via-key/10
The modified code, so you may see how I edited it and initialised the array in a fair manner (not unnecessarily initialising it with a length for the array.push test case):
function testArr(n, doPush){
var now = new Date().getTime(),
duration,
report = ['<b>.push(n)</b>',
'<b>.splice(0,0,n)</b>',
'<b>.splice(n-1,0,n)</b>',
'<b>[0 .. n] = value</b> (ascending)',
'<b>[n .. 0] = value</b> (descending)'];
doPush = doPush || 5;
if (doPush === 1) {
var arr = [];
while (--n) {
arr.push(n);
}
} else if (doPush === 2) {
var arr = [];
while (--n) {
arr.splice(0,0,n);
}
} else if (doPush === 3) {
var arr = [];
while (--n) {
arr.splice(n-1,0,n);
}
} else if (doPush === 4) {
var arr = new Array(n);
for (var i = 0;i<n;i++) {
arr[i] = i;
}
} else {
while (--n) {
var arr = [];
arr[n] = n;
}
}
/*console.log(report[doPush-1] + '...'+ arr.length || 'nopes');*/
duration = ((new Date().getTime() - now)/1000);
$('zebradinges').innerHTML += '<br>Array'+report[doPush-1]+' 1.000.000 values: '+duration+' sec' ;
arr = null;
}
Push adds it to the end, while array[n] has to go through the array to find the right spot. Probably depends on browser and its way to handle arrays.