Memory-efficient downsampling (charting) of a growing array - javascript

A node process of mine receives a sample point every half a second, and I want to update the history chart of all the sample points I receive.
The chart should be an array which contains the downsampled history of all points from 0 to the current point.
In other words, the maximum length of the array should be l. If I received more sample points than l, I want the chart array to be a downsampled-to-l version of the whole history.
To express it with code:
const CHART_LENGTH = 2048
createChart(CHART_LENGTH)
onReceivePoint = function(p) {
// p can be considered a number
const chart = addPointToChart(p)
// chart is an array representing all the samples received, from 0 to now
console.assert(chart.length <= CHART_LENGTH)
}
I already have a working downsampling function with number arrays:
function downsample (arr, density) {
let i, j, p, _i, _len
const downsampled = []
for (i = _i = 0, _len = arr.length; _i < _len; i = ++_i) {
p = arr[i]
j = ~~(i / arr.length * density)
if (downsampled[j] == null) downsampled[j] = 0
downsampled[j] += Math.abs(arr[i] * density / arr.length)
}
return downsampled
}
One trivial way of doing this would obviously be saving all the points I receive into an array, and apply the downsample function whenever the array grows. This would work, but, since this piece of code would run in a server, possibly for months and months in a row, it would eventually make the supporting array grow so much that the process would go out of memory.
The question is: Is there a way to construct the chart array re-using the previous contents of the chart itself, to avoid mantaining a growing data structure? In other words, is there a constant memory complexity solution to this problem?
Please note that the chart must contain the whole history since sample point #0 at any moment, so charting the last n points would not be acceptable.

The only operation that does not distort the data and that can be used several times is aggregation of an integer number of adjacent samples. You probably want 2.
More specifically: If you find that adding a new sample will exceed the array bounds, do the following: Start at the beginning of the array and average two subsequent samples. This will reduce the array size by 2 and you have space to add new samples. Doing so, you should keep track of the current cluster size c(the amount of samples that constitute one entry in the array). You start with one. Every reduction multiplies the cluster size by two.
Now the problem is that you cannot add new samples directly to the array any more because they have a completely different scale. Instead, you should average the next c samples to a new entry. It turns out that it is sufficient to store the number of samples n in the current cluster to do this. So if you add a new sample s, you would do the following.
n++
if n = 1
append s to array
else
//update the average
last array element += (s - last array element) / n
if n = c
n = 0 //start a new cluster
So the memory that you actually need is the following:
the history array with predefined length
the number of elements in the history array
the current cluster size c
the number of elements in the current cluster n
The size of the additional memory does not depend on the total number of samples, hence O(1).

Related

Complete the algorithm for generating Sudoku puzzles

I have a code below but there's a problem with the current implementation in the function entriesToDel. It will not always produce n distinct co-ordinates to be turned blank in a puzzle. It should always produce an array with n elements consisting of two-element arrays storing the co-ordinates of an element and every element of the output should produce distinct co-ordinates but I'm not sure how to do that as I'm quite new to this.
// a function to randomly select n (row,column) entries of a 2d array
function entriesToDel(n) {
var array = [];
for (var i = 0; i < n; i++) {
var row = Math.round(3*Math.random());
var col = Math.round(3*Math.random());
array.push([row,col]);
}
return array;
}
A simple solution would be using a set instead of an array and add random numbers until you have enough unique entries.
In case you dont know what a set is: A set works similarly to an array, but it does not allow for duplicates (duplicates will be ignored). (It also does not give elements a fixed position; you can't access an entry of a set with set[i])
const size = 3 // define size of grid in variable
function entriesToDel(n) {
// create set of positions
const positions = new Set() // create set
while(positions.size < n) { // repeat until set has enough entries
positions.add(Math.floor((size ** 2) * Math.random())) // add random position (duplicates will be ignored by the set)
}
// convert set of positions to array of coordinates
const cords = []
for(const position of positions) { // iterate through positions
// convert position to column and row and add to array
const row = Math.floor(position / size)
const column = position % size
cords.push([row, column])
}
return cords
}
This solution is nice and simple. An issue is that if you're really unlucky, the random number generator will keep on generating the same number, resulting in a possibly very long calculation time. If you want to exclude this (very unlikely) possibility, it would be reasonably to use the more complex solution provided by James.

Fastest way to compare each item in array with rest of array?

I have an array of items, and for each of the item in the array, I need to do some check against the rest of the items in the same array.
Here is the code I am using:
const myArray = [ ...some stuff ];
let currentItem;
let nextItem;
for (let i = 0; i < myArray.length; i++) {
currentItem = myArray[i];
for (let j = i + 1; j < myArray.length; j++) {
nextItem = myArray[j];
doSomeComparision(currentItem, nextItem);
}
}
While this works, I need to find a more efficient algorithm because it slows down significantly if the array is very big.
Can someone provide some advice on how to make this algorithm better?
Edit 1
I apologize.
I should have provided more context around what I am trying to do here.
I am using the loop above with a HalfEdge data structure, a.k.a. DCEL.
Basically, a HalfEdge is an object with 3 properties:
class HalfEdge = {
head: // some (x,y,z) coords
tail: // some (x,y,z) coords
twin: // reference to another HalfEdge
}
A twin of a given HalfEdge is defined like so:
/**
* if two Half-Edges are twins:
* Edge A TAIL ----> HEAD
* = =
* Edge B HEAD <---- TAIL
*/
My array contains many HalfEdges, and for each HalfEdge in the array, I want to find its twin (i.e., one that satisfies the condition above).
Basically, I am comparing two 3D vectors (one from currentItem, the other from nextItem).
Edit 2
Fixed typo in code example (i.e., from let j = 0 to let j = i + 1)
Here is a linear-time solution to your problem. I am not that familiar with javascript, so I'll feel more comfortable about giving you the algorithm correctly in psuedo-code.
lookup := hashtable()
for i .. myArray.length
twin_id := lookup[myArray[i].tail, myArray[i].head]
if twin_id != null
myArray[i].twin := twin_id
myArray[twin_id].twin := i
else
lookup[myArray[i].head, myArray[i].tail] = i
The idea is to construct a hash table of (head, tail) pairs, and to check if a (tail, head) pair already exists that matches the current node's. If so, they are twins, and mark them as such, otherwise update the hash table with a new entry. Every element is looped over exactly once, and insertion / retrieval from the hash table is done in constant time.
I don't know whether there's any kind of specific algorithm that is more efficient, but the following optimizations come to my mind immediately:
Let j start with i+1 - otherwise you are comparing all items twice
against each other
- Initialize a variable with myArray.length outside
the loops as the same operation is done twice.
If the comparison
is any kind of direct 'equal / larger' then it could help to sort the
array first
Update on Edit 1
I think the optimization depends on the number of expected matches. I.e., if all HalfEdge objects have a twin, then I think you're current approach with the changes above is already pretty optimal.
However, if the percentage of expected twins is rather low, then I would suggest the following:
- Extract a list of all heads and a list of all tails, sort them, and compare against each other. Remember which heads have found a twin tail.
Then, do you original loops again, but only enter the inner loop for the heads which found a match.
Not sure this is optimal, but I hope you get my approach.
Without knowing more information about the type of items
1) You should first sort your array, aftewards the comparisson can be done forward only, it should then give you a complexity of o(log n) + n^2, this could be useful depending on the type of your items and could lead to more improvements.
2) Starting the internal loop from i + 1 should reduce it further to o(log n + n)
const myArray = [ ...some stuff ].sort((a,b) => sortingComparison(a,b)); // sorting comparison must return a number
let currentItem;
let nextItem;
for (let i = 0; i < myArray.length; i++) {
currentItem = myArray[i];
for (let j = i + 1; j < myArray.length; j++) {
nextItem = myArray[j];
doSomeComparision(currentItem, nextItem);
}
}
Bonus:
Here is some fancy functional code (if you are aiming for raw performance the for loops versions are faster)
function compare(value, array) {
array.forEach((nextValue) => {
// Do your coparisson here
// nextValue === value
}
}
const myArray = [items]
myArray
.sort((a,b) => (a-b))
.forEach((v, idx) => compare(v, myArray.slice(idx, myArray.length))
Since values are 3D coordinates, build an octree ( O(N) ) and add items on their HEAD values. Then from each of them, follow them to their TAIL values using already built octree ( O(Nklog(N)) ) with its nodes containing maximum of k edges which means only k comparisons at the lowest level nodes of each TAIL. Also finding each TAIL may need traveling up to log(N) levels of octree from top to bottom.
O(N) with constant of building octree + O(N * k * log(N)) with low enough k edges per node(and logN levels of octree).
When you follow a TAIL in octree, any HEAD with same value would be in same node with maximum k elements or any "close enough" HEAD value would be inside that lowest level node and its closest neighbors.
Are you looking for an exact HEAD==TAIL or some tolerance is used? Tolerance could need "loose octree" imo.
If each edge has a length defined, then you can constrain the search radius by this value, if edges are both ways symmetric.
For up to 5k - 10k edges, there may be only 5-10 levels in octree depending on edges per node limit and if this limit is picked to be around 2-4 then each HEAD would need to do only 10-40 operations to find its twin edge with same TAIL value.

More efficient way to copy repeating sequence into TypedArray?

I have a source Float32Array that I create a secondary Float32Array from. I have a sequence of values model that I want to copy as a repeating sequence into the secondary Float32Array. I am currently doing this operation using a reverse while loop.
sequence = [1, 0, 0, 0, 0, 1, 0, 0, 2, 0, 1, 0];
n = 3179520; //divisible by sequence length
modelBuffs = new Float32Array(n);
var v = modelBuffs.length;
while(v-=12){
modelBuffs[v-12] = sequence[0];
modelBuffs[v-11] = sequence[1];
modelBuffs[v-10] = sequence[2];
modelBuffs[v-9] = sequence[3];
// YTransform
modelBuffs[v-8] = sequence[4];
modelBuffs[v-7] = sequence[5];
modelBuffs[v-6] = sequence[6];
modelBuffs[v-5] = sequence[7];
// ZTransform
modelBuffs[v-4] = sequence[8];
modelBuffs[v-3] = sequence[9];
modelBuffs[v-2] = sequence[10];
modelBuffs[v-1] = sequence[11];
}
Unfortunately, n can be unknown. I may have to do a significant refactor if there is no alternative solution. I am hoping that I can set the sequence once and there is a copy in place/ repeating fill / bitwise operation to repeat the initial byte sequence.
Edit simplified the example input
A fast method to fill an array with a repeated sequence, is to double up length of buffer for each iteration using the copyWithin() method of the typed array. You could use set() as well by creating a different view for the same underlying ArrayBuffer, but it's simpler to use the former for this purpose.
Using for example 1234 as source, the first initial iteration fill will be 1:1, or 4 indices in this case:
1234
From there we will use destination buffer as source for the remaining fill, so second iteration fills 8 indices:
12341234
Third iteration fills 16 indices:
1234123412341234
Fourth iteration fills 32 indices:
12341234123412341234123412341234
and so forth.
If the last segment length doesn't match power of 2 you can simple do a diff between last fill and the length remaining in the buffer and use that for the last iteration.
var
srcBuffer = new Uint8Array([1,2,3,4]), // any view type will do
dstBuffer = new Uint8Array(1<<14), // 16 kb
len = dstBuffer.length, // important: use indices length, not byte-length
sLen = srcBuffer.length,
p = sLen; // set initial position = source sequence length
var startTime = performance.now();
// step 1: copy source sequence to the beginning of dest. array
// todo: dest. buffer might be smaller than source. Check for this here.
dstBuffer.set(srcBuffer);
// step 2: copy existing data doubling segment length per iteration
while(p < len) {
if (p + sLen > len) sLen = len - p; // if not power of 2, truncate last segment
dstBuffer.copyWithin(p, 0, sLen); // internal copy
p += sLen; // add current length to offset
sLen <<= 1; // double length for next segment
}
var time = performance.now() - startTime;
console.log("done", time + "ms");
console.log(dstBuffer);
If the array is very long it will obviously take some time regardless. In those cases you could consider using a Web Worker with the new SharedArrayBuffer so that you can do the copying in a different process and not have to copy or transfer the data to and from. The gain from this is merely that the main thread is not blocked with little overhead dealing with the buffer as the internals of copyWithin() is relative optimal for its purpose already. The cons are the async aspect combined with the overhead from the event system (e.g.: it depends if this is useful).
A different approach is to use WebAssembly where you write the buffer fill code in C/C++, compile and expose methods to take source and destination buffers, then call that from JavaScript. I don't have any example for this case.
In both of these latter cases you will run into compatibility issues with (not that much) older browsers.

The best performant way to push items into array?

In my website i have many arrays with data.
for example: vertices array, colors array, sizes array...
I'm working with big amounts of items. Up to tens of millions.
Before adding the data into the arrays I need to process it.
Until now, I did it in the main thread and this made my website freeze for X seconds.
It froze because of the processing and because of adding the processed data into the arrays.
Today I 'moved' (did a lot of work) the processing into web workers, but the processed data is being added in the main thread. I managed to save the freezing time of the processing but not of the adding.
The adding is simply done by array.push() or array.splice().
I've read some articles about how array works, and found out when we add item to an array, the array is being fully copied to a new place in the memory with array.length + 1 size and there adding the value. This makes my data pushing slow.
I also read that typed array are much faster. But for this I would need to know the size of the array, which I don't know, and for creating a big typed array with extra counter and managing adding items in the middle(and not the end of the array) would be a lot of code change, which i don't want to do at this time.
So, for my question,
I have TypedArray that return from the web worker, and this i need to put into regular array. What is the best performant way to do it?
(today i'm running in a loop and pushing one after the other)
EDIT
Example how the website work:
the client add count of items, lets say 100000.
The items raw data is being collected and send to the worker.
The worker is processing all the information and sending back the processed data as typed-array (for using as transferable objects). In the main thread we are adding the processed data to arrays - to the end or in some specific index.
2nd round. the client add another 100000 items. sending to the worker and the result being added to the main thread arrays.
3nd round can be 10 items, 4nd round 10000, 5nd round can remove indices 10-2000, ...
Did some more research using the comments and thought about another direction.
I've tried using typedArray.set method and discovered that it is very very fast.
10 million of items using sets took 0.004 seconds, compares to array.push 0.866 seconds. I separated the the 10 millions into 10 arrays just to make sure the set method is not working faster when starting from index 0.
This way I think I would even implement my insertAtIndex of my own using the TypedArray, which pushing all the items forward and setting the new one\s in the right index.
In addition, I can use TypedArray.subArray to fetch my sub data according to the real amount of data in the array (which is not copying the data) - useful for uploading the data to the buffer (WebGL)
I said I want to work with regular arrays but this performance boost I don't think I would get other wise. And it is not so much work, when I'm wrapping MyNewTypedArray as TypedArray with all the push, splice, own implementation.
Hope this info helped anyone
var maxCount = 10000000;
var a = new Float32Array(maxCount);
var aSimple = [];
var arrays = [];
var div = 10;
var arrayLen = maxCount / div;
for (var arraysIdx = 0; arraysIdx < div; arraysIdx++) {
var b = new Float32Array(arrayLen);
for (var i = 0; i < b.length; i++) {
b[i] = i * (arraysIdx + 1);
}
arrays.push(b);
}
var timeBefore = new Date().getTime();
for (var currArrayIdx = 0; currArrayIdx < arrays.length; currArrayIdx++) {
a.set(arrays[currArrayIdx], currArrayIdx * arrayLen);
}
var timeAfter = new Date().getTime();
good.innerHTML = (timeAfter - timeBefore) / 1000 + " sec.\n";
timeBefore = new Date().getTime();
for (var currArrayIdx = 0; currArrayIdx < arrays.length; currArrayIdx++) {
for (var i = 0; i < arrayLen; i++) {
aSimple.push(arrays[currArrayIdx][i]);
}
}
timeAfter = new Date().getTime();
bad.innerHTML = (timeAfter - timeBefore) / 1000 + " sec.\n";
Using set of TypedArray:
<div id='good' style='background-color:lightGreen'>working...</div>
Using push of Array:
<div id='bad' style='background-color:red'>working...</div>

Efficiently ordering line segments into a loop

I'm using a library (JavaScript-Voronoi) which produces an array of line segments that represent a closed polygon. These segments appear unordered, both the order in which the segments appear as well as the ordering of the points for each end of the segment.
(Edit: As noted in a comment below, I was wrong: the segments from the library are well-ordered. However, the question stands as written: let's assume that the segments do not have any ordering, as this makes it more generally useful.)
For example:
var p1 = {x:13.6,y:13.1}, p2 = {x:37.2,y:35.8}, p3 = {x:99.9,y:14.6},
p4 = {x:99.9,y:45.5}, p5 = {x:33.7,y:66.7};
var segments = [
{ va:p1, vb:p2 },
{ va:p3, vb:p4 },
{ va:p5, vb:p4 },
{ va:p3, vb:p2 },
{ va:p1, vb:p5 } ];
Notice how the first segment links to the last (they share a common point), and to the next-to-last. It is guaranteed that every segment shares an end with exactly one other segment.
I would like to convert this into a list of points to generate a proper SVG polygon:
console.log( orderedPoints(segments) );
// [
// {"x":33.7,"y":66.7},
// {"x":13.6,"y":13.1},
// {"x":37.2,"y":35.8},
// {"x":99.9,"y":14.6},
// {"x":99.9,"y":45.5}
// ]
It doesn't matter whether the points are in clockwise or counter-clockwise order.
The following code is what I've come up with, but in the worst-case scenario it will take n^2+n point comparisons. Is there a more efficient algorithm for joining all these together?
function orderedPoints(segs){
segs = segs.concat(); // make a mutable copy
var seg = segs.pop(), pts = [seg.va], link = seg.vb;
for (var ct=segs.length;ct--;){
for (var i=segs.length;i--;){
if (segs[i].va==link){
seg = segs.splice(i,1)[0]; pts.push(seg.va); link = seg.vb;
break;
}else if (segs[i].vb==link){
seg = segs.splice(i,1)[0]; pts.push(seg.vb); link = seg.va;
break;
}
}
}
return pts;
}
If your polygon is convex, you can pick middle point of each line segment, then use convex hull algorithm to find convex polygon by middle items, after that, because you know what is the arrangement of middles and also you know which middle belongs to which segment, you can find an arrangement in original array.
If you just want to find a convex hull, use convex hull algorithm directly, it's O(n log n), which is fast enough, but also you can find a Quickhull algorithm in javascript here. quickhull is also in O(n logn), but in average, the worst case is O(n^2), but it's fast because of less constant factor.
but in the case of general algorithm:
Set one end of each segment as First, and another end as second (randomly).
Sort your segments by their first x and put it in array First after that in array first sort segments with same first x by their first y and put two extra int into your structure to save start and end position of items with same first x.
Then again sort your segments with the second x value, .... and make array second.
Above actions both are in O(n log n).
Now pick first segment in array First, search for its second x value in both arrays First and second, in the case you find similar values, search for their y values in related subarray (you have start and end position of items with same x). You know there is only one segment with this order (also is not current segment), so finding next segment takes O(log n) and because in all there is n-1 next segment it takes O(n logn) (also preprocessing), which is extremely faster than O(n^2).
It should be possible to turn the points into a (double, unordered?) linked list in linear time:
for (var i=0; i<segments.length; i++) {
var a = segments[i].va,
b = segments[i].vb;
// nexts being the two adjacent points (in unknown order)
if (a.nexts) a.nexts.push(b); else a.nexts = [b];
if (b.nexts) b.nexts.push(a); else b.nexts = [a];
}
Now you can iterate it to build the array:
var prev = segments[0].va,
start = segments[0].vb, // start somewhere, in some direction
points = [],
cur = start;
do {
points.push(cur);
var nexts = cur.nexts,
next = nexts[0] == prev ? nexts[1] : nexts[0];
delete cur.nexts; // un-modify the object
prev = cur;
cur = next;
} while (cur && cur != start)
return points;
If you do not want to modify the objects, an EcmaScript6 Map (with object keys) would come in handy. As a workaround, you could use a JSON serialisation of your point coordinates as keys of a normal object, however you are then limited to polygons that do not contain a coordinate twice. Or just use the unique voronoiId property that your library adds to the vertices for identifying them.
For a convex polygon, you don't even need to know the side segments. You just need a bunch of vertices. The procedure to order the vertices is pretty simple.
average all the vertices together to get a point inside the polygon. note that this doesn't even need to be the centroid. it just needs to be a point inside the polygon. call this point C.
for each vertex V[i], compute the angle the line segment from V[i] to C forms with the line segment from V[i] to V[i]+(1,0). call this a[i].
sort the angles of vertices using the vertices as satellite data.
the sorted vertices are in order around the polygon. there are some redundancies that you can remove. 1 runs in linear time, 2 runs in linear time, 3 runs in n log n.

Categories

Resources