In my website i have many arrays with data.
for example: vertices array, colors array, sizes array...
I'm working with big amounts of items. Up to tens of millions.
Before adding the data into the arrays I need to process it.
Until now, I did it in the main thread and this made my website freeze for X seconds.
It froze because of the processing and because of adding the processed data into the arrays.
Today I 'moved' (did a lot of work) the processing into web workers, but the processed data is being added in the main thread. I managed to save the freezing time of the processing but not of the adding.
The adding is simply done by array.push() or array.splice().
I've read some articles about how array works, and found out when we add item to an array, the array is being fully copied to a new place in the memory with array.length + 1 size and there adding the value. This makes my data pushing slow.
I also read that typed array are much faster. But for this I would need to know the size of the array, which I don't know, and for creating a big typed array with extra counter and managing adding items in the middle(and not the end of the array) would be a lot of code change, which i don't want to do at this time.
So, for my question,
I have TypedArray that return from the web worker, and this i need to put into regular array. What is the best performant way to do it?
(today i'm running in a loop and pushing one after the other)
EDIT
Example how the website work:
the client add count of items, lets say 100000.
The items raw data is being collected and send to the worker.
The worker is processing all the information and sending back the processed data as typed-array (for using as transferable objects). In the main thread we are adding the processed data to arrays - to the end or in some specific index.
2nd round. the client add another 100000 items. sending to the worker and the result being added to the main thread arrays.
3nd round can be 10 items, 4nd round 10000, 5nd round can remove indices 10-2000, ...
Did some more research using the comments and thought about another direction.
I've tried using typedArray.set method and discovered that it is very very fast.
10 million of items using sets took 0.004 seconds, compares to array.push 0.866 seconds. I separated the the 10 millions into 10 arrays just to make sure the set method is not working faster when starting from index 0.
This way I think I would even implement my insertAtIndex of my own using the TypedArray, which pushing all the items forward and setting the new one\s in the right index.
In addition, I can use TypedArray.subArray to fetch my sub data according to the real amount of data in the array (which is not copying the data) - useful for uploading the data to the buffer (WebGL)
I said I want to work with regular arrays but this performance boost I don't think I would get other wise. And it is not so much work, when I'm wrapping MyNewTypedArray as TypedArray with all the push, splice, own implementation.
Hope this info helped anyone
var maxCount = 10000000;
var a = new Float32Array(maxCount);
var aSimple = [];
var arrays = [];
var div = 10;
var arrayLen = maxCount / div;
for (var arraysIdx = 0; arraysIdx < div; arraysIdx++) {
var b = new Float32Array(arrayLen);
for (var i = 0; i < b.length; i++) {
b[i] = i * (arraysIdx + 1);
}
arrays.push(b);
}
var timeBefore = new Date().getTime();
for (var currArrayIdx = 0; currArrayIdx < arrays.length; currArrayIdx++) {
a.set(arrays[currArrayIdx], currArrayIdx * arrayLen);
}
var timeAfter = new Date().getTime();
good.innerHTML = (timeAfter - timeBefore) / 1000 + " sec.\n";
timeBefore = new Date().getTime();
for (var currArrayIdx = 0; currArrayIdx < arrays.length; currArrayIdx++) {
for (var i = 0; i < arrayLen; i++) {
aSimple.push(arrays[currArrayIdx][i]);
}
}
timeAfter = new Date().getTime();
bad.innerHTML = (timeAfter - timeBefore) / 1000 + " sec.\n";
Using set of TypedArray:
<div id='good' style='background-color:lightGreen'>working...</div>
Using push of Array:
<div id='bad' style='background-color:red'>working...</div>
Related
I have an array of items, and for each of the item in the array, I need to do some check against the rest of the items in the same array.
Here is the code I am using:
const myArray = [ ...some stuff ];
let currentItem;
let nextItem;
for (let i = 0; i < myArray.length; i++) {
currentItem = myArray[i];
for (let j = i + 1; j < myArray.length; j++) {
nextItem = myArray[j];
doSomeComparision(currentItem, nextItem);
}
}
While this works, I need to find a more efficient algorithm because it slows down significantly if the array is very big.
Can someone provide some advice on how to make this algorithm better?
Edit 1
I apologize.
I should have provided more context around what I am trying to do here.
I am using the loop above with a HalfEdge data structure, a.k.a. DCEL.
Basically, a HalfEdge is an object with 3 properties:
class HalfEdge = {
head: // some (x,y,z) coords
tail: // some (x,y,z) coords
twin: // reference to another HalfEdge
}
A twin of a given HalfEdge is defined like so:
/**
* if two Half-Edges are twins:
* Edge A TAIL ----> HEAD
* = =
* Edge B HEAD <---- TAIL
*/
My array contains many HalfEdges, and for each HalfEdge in the array, I want to find its twin (i.e., one that satisfies the condition above).
Basically, I am comparing two 3D vectors (one from currentItem, the other from nextItem).
Edit 2
Fixed typo in code example (i.e., from let j = 0 to let j = i + 1)
Here is a linear-time solution to your problem. I am not that familiar with javascript, so I'll feel more comfortable about giving you the algorithm correctly in psuedo-code.
lookup := hashtable()
for i .. myArray.length
twin_id := lookup[myArray[i].tail, myArray[i].head]
if twin_id != null
myArray[i].twin := twin_id
myArray[twin_id].twin := i
else
lookup[myArray[i].head, myArray[i].tail] = i
The idea is to construct a hash table of (head, tail) pairs, and to check if a (tail, head) pair already exists that matches the current node's. If so, they are twins, and mark them as such, otherwise update the hash table with a new entry. Every element is looped over exactly once, and insertion / retrieval from the hash table is done in constant time.
I don't know whether there's any kind of specific algorithm that is more efficient, but the following optimizations come to my mind immediately:
Let j start with i+1 - otherwise you are comparing all items twice
against each other
- Initialize a variable with myArray.length outside
the loops as the same operation is done twice.
If the comparison
is any kind of direct 'equal / larger' then it could help to sort the
array first
Update on Edit 1
I think the optimization depends on the number of expected matches. I.e., if all HalfEdge objects have a twin, then I think you're current approach with the changes above is already pretty optimal.
However, if the percentage of expected twins is rather low, then I would suggest the following:
- Extract a list of all heads and a list of all tails, sort them, and compare against each other. Remember which heads have found a twin tail.
Then, do you original loops again, but only enter the inner loop for the heads which found a match.
Not sure this is optimal, but I hope you get my approach.
Without knowing more information about the type of items
1) You should first sort your array, aftewards the comparisson can be done forward only, it should then give you a complexity of o(log n) + n^2, this could be useful depending on the type of your items and could lead to more improvements.
2) Starting the internal loop from i + 1 should reduce it further to o(log n + n)
const myArray = [ ...some stuff ].sort((a,b) => sortingComparison(a,b)); // sorting comparison must return a number
let currentItem;
let nextItem;
for (let i = 0; i < myArray.length; i++) {
currentItem = myArray[i];
for (let j = i + 1; j < myArray.length; j++) {
nextItem = myArray[j];
doSomeComparision(currentItem, nextItem);
}
}
Bonus:
Here is some fancy functional code (if you are aiming for raw performance the for loops versions are faster)
function compare(value, array) {
array.forEach((nextValue) => {
// Do your coparisson here
// nextValue === value
}
}
const myArray = [items]
myArray
.sort((a,b) => (a-b))
.forEach((v, idx) => compare(v, myArray.slice(idx, myArray.length))
Since values are 3D coordinates, build an octree ( O(N) ) and add items on their HEAD values. Then from each of them, follow them to their TAIL values using already built octree ( O(Nklog(N)) ) with its nodes containing maximum of k edges which means only k comparisons at the lowest level nodes of each TAIL. Also finding each TAIL may need traveling up to log(N) levels of octree from top to bottom.
O(N) with constant of building octree + O(N * k * log(N)) with low enough k edges per node(and logN levels of octree).
When you follow a TAIL in octree, any HEAD with same value would be in same node with maximum k elements or any "close enough" HEAD value would be inside that lowest level node and its closest neighbors.
Are you looking for an exact HEAD==TAIL or some tolerance is used? Tolerance could need "loose octree" imo.
If each edge has a length defined, then you can constrain the search radius by this value, if edges are both ways symmetric.
For up to 5k - 10k edges, there may be only 5-10 levels in octree depending on edges per node limit and if this limit is picked to be around 2-4 then each HEAD would need to do only 10-40 operations to find its twin edge with same TAIL value.
A node process of mine receives a sample point every half a second, and I want to update the history chart of all the sample points I receive.
The chart should be an array which contains the downsampled history of all points from 0 to the current point.
In other words, the maximum length of the array should be l. If I received more sample points than l, I want the chart array to be a downsampled-to-l version of the whole history.
To express it with code:
const CHART_LENGTH = 2048
createChart(CHART_LENGTH)
onReceivePoint = function(p) {
// p can be considered a number
const chart = addPointToChart(p)
// chart is an array representing all the samples received, from 0 to now
console.assert(chart.length <= CHART_LENGTH)
}
I already have a working downsampling function with number arrays:
function downsample (arr, density) {
let i, j, p, _i, _len
const downsampled = []
for (i = _i = 0, _len = arr.length; _i < _len; i = ++_i) {
p = arr[i]
j = ~~(i / arr.length * density)
if (downsampled[j] == null) downsampled[j] = 0
downsampled[j] += Math.abs(arr[i] * density / arr.length)
}
return downsampled
}
One trivial way of doing this would obviously be saving all the points I receive into an array, and apply the downsample function whenever the array grows. This would work, but, since this piece of code would run in a server, possibly for months and months in a row, it would eventually make the supporting array grow so much that the process would go out of memory.
The question is: Is there a way to construct the chart array re-using the previous contents of the chart itself, to avoid mantaining a growing data structure? In other words, is there a constant memory complexity solution to this problem?
Please note that the chart must contain the whole history since sample point #0 at any moment, so charting the last n points would not be acceptable.
The only operation that does not distort the data and that can be used several times is aggregation of an integer number of adjacent samples. You probably want 2.
More specifically: If you find that adding a new sample will exceed the array bounds, do the following: Start at the beginning of the array and average two subsequent samples. This will reduce the array size by 2 and you have space to add new samples. Doing so, you should keep track of the current cluster size c(the amount of samples that constitute one entry in the array). You start with one. Every reduction multiplies the cluster size by two.
Now the problem is that you cannot add new samples directly to the array any more because they have a completely different scale. Instead, you should average the next c samples to a new entry. It turns out that it is sufficient to store the number of samples n in the current cluster to do this. So if you add a new sample s, you would do the following.
n++
if n = 1
append s to array
else
//update the average
last array element += (s - last array element) / n
if n = c
n = 0 //start a new cluster
So the memory that you actually need is the following:
the history array with predefined length
the number of elements in the history array
the current cluster size c
the number of elements in the current cluster n
The size of the additional memory does not depend on the total number of samples, hence O(1).
I run this test in different node versions:
function test() {
var i;
var bigArray = {};
var start = new Date().getTime();
for (i=0; i<100000; i+=1) {
bigArray[i] = {};
var j= Math.floor(Math.random() * 10000000);
bigArray[i]["a" + j] = i.toString(32);
if (i % 1000 === 0) console.log(i);
}
var end = new Date().getTime();
var time = end - start;
console.log('Execution time: ' + time);
}
test();
As you can see, it just creates an object with 100000 fields where each field is just an object with just one field. The key of this inner object is forced to be alphanumeric (if the key is numeric, it performs normal).
When I run this test in different javascript implementations/versions I get this results:
v0.8.28 -> 2716 ms
v0.10.40 -> 73570 ms
v0.12.7 -> 92427 ms
iojs v2.4.0 -> 510 ms
chrome -> 1473 ms
I have also tried to run this test in an asynchronous loop (each loop step in in a different tick), but the results are similar to the ones showed above.
I can't understand why this test is so expensive in newer node versions.
Why is it so slow?
Is there any special v8 flag that can improve this test?
In order to handle large and sparse arrays, there are two types of array storage internally:
Fast Elements: linear storage for compact key sets
Dictionary Elements: hash table storage otherwise
It's best not to cause the array storage to flip from one type to another.
Therefore:
Use contiguous keys starting at 0 for Arrays
Don't pre-allocate large Arrays (e.g. > 64K elements) to their maximum size, instead grow as you go
Don't delete elements in arrays, especially numeric arrays
Don't load uninitialized or deleted elements
Source and more info: http://www.html5rocks.com/en/tutorials/speed/v8/
PS: this is supposed to improve considerably in the upcoming node.js+io.js version.
I am profiling my javascript code intended to be used on embedded browser on Android (PhoneGap).
Basically I need a very large bitfield (200k+ bits) for my calculations.
I've tried to put them into array of unsigned integers with each item storing 32 bits - this indeed reduced memory usage but made execution time drastically too slow (over 30 seconds for simple iterating and reversing all bits in the bitfield on modern PC!)
Than I made good old fashion array of bools. This increased memory usage (but still it was less than 15 mega on Android for entire PhoneGap framework around my code). Profiling showed me that initial step in my algorithm - setting all elements of the bitfield to 1 (simple for- loop) - takes half of the execution time (~1.5 seconds on PC, more than few minutes on Android). I can rewrite my code so default value would be 0 not 1 (reverse all conditions), but I still don't know how to set such large array to 0'es fast.
Edit adding my code, as requested:
var count = 200000;
var myArr = [];
myArr.length = count;
for(var i = 0; i < count ; i++)
myArr[i] = true;
Could someone point me how can I clear very large array, or is there any faster way to store and operate on large bitfields in javascript?
See if this is a faster way to create the array:
var myArray = [true];
var desiredLength = 200000;
while (myArray.length < desiredLength) {
myArray = myArray.concat(myArray);
}
if (myArray.length > desiredLength) {
myArray.splice(desiredLength);
}
I've added a few more test cases to the jsperf page that Asad linked in his comment. By far the fastest in my browser (Chrome 23.0.1271.101 on Mac OS X 10.8.2) is this one:
var count = 200000;
var myArr = [];
for (var i = 0; i < count; i++) {
myArr.push(true);
}
Why pre-fill the array in the first place! Use undefined to your advantage. Remember that undefined acts as a falsey value. So it will act exactly like 0/false when you do a boolean check.
var myArray = new Array(200000);
if (myArray[1]) {
//I am a truthy value
} else {
//I am a falsey value
}
So when you initialize the array this way, there is no reason to prefill! That means no extra processing and take advantage of the sparse Array!
If I remove one element from an array using splice() like so:
arr.splice(i, 1);
Will this be O(n) in the worst case because it shifts all the elements after i? Or is it constant time, with some linked list magic underneath?
Worst case should be O(n) (copying all n-1 elements to new array).
A linked list would be O(1) for a single deletion.
For those interested I've made this lazily-crafted benchmark. (Please don't run on Windows XP/Vista). As you can see from this though, it looks fairly constant (i.e. O(1)), so who knows what they're doing behind the scenes to make this crazy-fast. Note that regardless, the actual splice is VERY fast.
Rerunning an extended benchmark directly in the V8 shell that suggest O(n). Note though that you need huge array sizes to get a runtime that's likely to affect your code. This should be expected as if you look at the V8 code it uses memmove to create the new array.
The Test:
I took the advice in the comments and wrote a simple test to time splicing a data-set array of size 3,000, each one containing 3,000 items in it. The test would simply splice the
first item in the first array
second item in the second array
third item in the third array
...
3000th item in the 3000th array
I pre-built the array to keep things simple.
The Findings:
The weirdest thing is that the number of times where the process of the splice even takes longer than 1ms grows linearly as you increase the size of the dataset.
I went as far as testing it for a dataset of 300,000 on my machine (but the SO snippet tends to crash after 3,000).
I also noticed that the number of splice()s that took longer than 1ms for a given dataset (30,000 in my case) was random. So I ran the test 1,000 times and plotted the number of results, and it looked like a standard distribution; leading me to believe that the randomness was just caused by the scheduler interrupts.
This goes against my hypothesis and #Ivan's guess that splice()ing from the beginning of an array will have a O(n) time complexity
Below is my test:
let data = []
const results = []
const dataSet = 3000
function spliceIt(i) {
data[i].splice(i, 1)
}
function test() {
for (let i=0; i < dataSet; i++) {
let start = Date.now()
spliceIt(i);
let end = Date.now()
results.push(end - start)
}
}
function setup() {
data = (new Array(dataSet)).fill().map(arr => new Array(dataSet).fill().map(el => 0))
}
setup()
test()
// console.log("data before test", data)
// console.log("data after test", data)
// console.log("all results: ", results)
console.log("results that took more than 1ms: ", results.filter(r => r >= 1))
¡Hi!
I did an experiment myself and would like to share my findings. The experiment was very simple, we ran 100 splice operations on an array of size n, and calculate the average time each splice function took. Then we varied the size of n, to check how it behave.
This graph summarizes our findings for big numbers:
For big numbers it seems to behave linearly.
We also checked with "small" numbers (they were still quite big but not as big):
On this case it seems to be constant.
If I would have to decide for one option I would say it is O(n), because that is how it behaves for big numbers. Bear in mind though, that the linear behaviour only shows for VERY big numbers.
However, It is hard to go for a definitive answer because the array implementation in javascript dependes A LOT on how the array is declared and manipulated.
I recommend this stackoverflow discussion and this quora discussion to understand how arrays work.
I run it in node v10.15.3 and the code used is the following:
const f = async () => {
const n = 80000000;
const tries = 100;
const array = [];
for (let i = 0; i < n; i++) { // build initial array
array.push(i);
}
let sum = 0;
for (let i = 0; i < tries; i++) {
const index = Math.floor(Math.random() * (n));
const start = new Date();
array.splice(index, 1); // UNCOMMENT FOR OPTION A
// array.splice(index, 0, -1); // UNCOMMENT FOR OPTION B
const time = new Date().getTime() - start.getTime();
sum += time;
array.push(-2); // UNCOMMENT FOR OPTION A, to keep it of size n
// array.pop(); // UNCOMMENT FOR OPTION B, to keep it of size n
}
console.log('for an array of size', n, 'the average time of', tries, 'splices was:', sum / tries);
};
f();
Note that the code has an Option B, we did the same experiment for the three argument splice function to insert an element. It worked similary.