Optimizing an algorithm subtracting first elements from last elements - javascript

I'm trying to optimize a slow part of a larger algorithm.
I have an array of random-looking numbers:
[295, 292, 208, 393, 394, 291, 182, 145, 175, 288, 71, 86, 396, 422]
(actual arrays are much longer), and an index: N (N = 5)
What I want to do is subtract subtract 1 from the last M elements for each of the first N elements that are smaller
So (pseudo-code):
for a = 1..5
for b = 6..N
if ary[a] < ary[b]
ary[b]--;
Obviously this is a horribly inefficient O(N^2) algorithm. I'm trying to think of a faster way to do it, but can't. It seems like I should be able to pre-compute the values to subtract somehow, and reduce it to:
for a = 1..5
// ???
for b = 6..N
ary[b] -= ???
but I'm missing something.
[edit] I feel like an idiot. I didn't properly explain what I want, fixed.

Let's reorganize the loops:
for b = 6..N
for a = 1..5
if ary[a] < ary[b]
ary[b] -= ary[a];
The result will be the same. But now the logic is more clear: from each element from the second part of array you subtract ary[1], then ary[2] and so on until some ary[a] is bigger that what remains from ary[b].
This can be optimized the following way. Calculate the cumulative sums of the first half of array: let sum[1]=ary[1], sum[2]=sum[1]+ary[2], sum[3]=sum[2]+ary[3], and so on. This can be done in O(N).
Now for each b you need to find such a_last that sum[a_last]<ary[b], but sum[a_last+1]>=ary[b] -- this will mean that from ary[b] you will subtract ary[1]+...+ary[a_last]=sum[a_last]. You can find such a_last using binary search in O(log N), thus making the overall algorithm O(N log N).
The pseudocode:
sum[0] = 0
for a = 1..5
sum[a] = sum[a-1] + ary[a]
for b = 6..N
a_last = maximal a such that sum[a]<ary[b] // use binary search
ary[b] -= sum[a_last]

It seems to me that Petr's solution doesn't follow the specification of the problem: we should subtract 1 (one) for each "hit", not the VALUE of ary[a].
Here's an alternative:
place the testvalues ary[j], j=0..N-1 in a SORTED array: sortedSmallAry[N].
Then run over all b, i.e. b=N, ..., Total-1, and for each ary[b]:
run j=0..N-1 over the sortedSmallAry, test whether sortedSmallAry[j] is smaller than ary[b], count the hits, exit as soon as it fails (because it's sorted). If N is relatively large you can use a binary search in sortedSmallAray to determine how many of its elements satisfy the condition.
Subtract the hitcount from ary[b].

Related

Efficient way to compute the median of an array of canvas in JavaScript

I have an array of N HTMLCanvasElements that come from N frames of a video, and I want to compute the "median canvas" in the sense that every component (r, g, b, opacity) of every pixel is the median of the corresponding component in all the canvases.
The video frames are 1280x720, so that the pixels data for every canvas (obtained with canvas.getContext('2d').getImageData(0, 0, canvas.width, canvas.height).data) is a Uint8ClampedArray of length 3.686.400.
The naive way to compute the median is to:
prepare a result Uint8ClampedArray of length 3.686.400
prepare a temporary Uint8ClampedArray of length N
loop from 0 to 3.686.399
a) loop over the N canvases to fill the array
b) compute the median of the array
c) store the median to the result array
But it's very slow, even for 4 canvases.
Is there an efficient way (or existing code) to do that? My question is very similar to Find median of list of images, but I need to to this in JavaScript, not Python.
Note: for b), I use d3.median() which doesn't work on typed arrays, as far as I understand, so that it implies converting to numbers, then converting back to Uint8Clamped.
Note 2: I don't know much of GLSL shaders, but maybe using the GPU would be a way to get faster results. It would require to pass data from the CPU to the GPU though, which takes time if done repeatedly.
Note 3: the naive solution is there: https://observablehq.com/#severo/compute-the-approximate-median-image-of-a-video
You wrote
I use d3.median() which doesn't work on typed arrays…
Although that is not exactly true it leads into the right direction. Internally d3.median() uses the d3.quantile() method which starts off like this:
export default function quantile(values, p, valueof) {
values = Float64Array.from(numbers(values, valueof));
As you can see, this in fact does make use of typed arrays, it is just not your Uint8ClampedArray but a Float64Array instead. Because floating-point arithmetic is much more computation-intensive than its integer counterpart (including the conversion itself) this has a dramatic effect on the performance of your code. Doing this some 3 million times in a tight loop kills the efficiency of your solution.
Since you are retrieving all your pixel values from a Uint8ClampedArray you can be sure that you are always dealing with integers, though. That said, it is fairly easy to build a custom function median(values) derived from d3.median() and d3.quantile():
function median(values) {
// No conversion to floating point values needed.
if (!(n = values.length)) return;
if (n < 2) return d3.min(values);
var n,
i = (n - 1) * 0.5,
i0 = Math.floor(i),
value0 = d3.max(d3.quickselect(values, i0).subarray(0, i0 + 1)),
value1 = d3.min(values.subarray(i0 + 1));
return value0 + (value1 - value0) * (i - i0);
}
On top of getting rid of the problematic conversion on the first line this implementation additionally applies some more micro-optimizations because in your case you are always looking for the 2-quantile (i.e. the median). That might not seem much at first, but doing this multiple million times in a loop it does make a difference.
With minimal changes to your own code you can call it like this:
// medianImageData.data[i] = d3.median(arr); Instead of this use line below.
medianImageData.data[i] = median(arr);
Have a look at my working fork of your Observable notebook.

Why -Infinity is like a base for comparing numbers to return max?

In this code i can't seem to understand why -Infinity is behaving like a base so when compared to it returns the biggest number from an array of numbers.
function max(...numbers) {
let result = -Infinity;
for (let number of numbers) {
if (number > result) result = number;
}
return result;
}
It is confusing at first and probably in your mind a solution would sound like this:
let result = 0;
The problem is that when we want to find the MAXIMUM value of an array we need to compare every element with each other. It is more like a "habit" that we set the MAXIMUM to -INFINITY. That simply means that the biggest element so far is the lowest possible number that we can express. Does it make sense? We simply assume that the biggest number we will every find is -Infinity. Then we compare the elements from the array with this base number(in our case -Infinity) and if we were false (and probably we were) then we replace -Infinity with the next number that's bigger than our current value. We do that for the whole range of numbers and that's how we find the Maximum element.
You can pick multiple elements as the starting point, but never pick a number entered by yourself( you should do that ONLY if the exercise asks so).
If you would pick for example:
let result = 0;
then you might have a problem. Maybe the numbers are all negative, for example [-3,-13,-5,13,-99] but you already set the biggest number to 0 so every comparation would be wrong and useless.
So, keep in mind that is a good practice, in this case, to set the base value to -Infinity or if you would like to take another approach then set the base value to the first element in the array.
In using this numbers to find the max of a series of numbers ,you loop through an array of numbers, each number will be compared to -infinity. And since the program is running from left to right the result will update itself each time it finds a bigger number. I tried this comparison method with an actual number.
let edge;
let array1 = [1, 2, 3, 4, 5, 6, 8, 9, 100, 200];
function maxwell(){
for(let checker of array1){
if(checker > 2)edge = checker;
}return edge;
}console.log(maxwell())

Selecting an element from an array 35% more often

edited
I have a list with N elements, where K elements are "special" and the rest are "normal". What I'm trying to do is pick an element at random, but special elements should be selected 35% more often than normal items.
For example:
var myList = [
{id: 1, special: 0},
{id: 2, special: 1} // <= special item
];
After 600 selections, the normal element should be selected 250 times, and the second should be selected 35% more times than that, or 350 times.
This is different from the suggested duplicate question because my weights do not add up to 1. I can have any arbitrary of elements in my list, and zero or more of them are special. The weight is always 1.35 for special items, and 1.0 for normal items.
Your question is ambiguous with regards to the "35% more often" part. (1) Does it mean that special values as a whole are chosen 35% more than normal values as a whole? (2) Or does it mean that special values are simply weighted 1.35 and normal values are weighted 1?
The two question variants I have described have different answers.
Answer (1)
Note you must always have at least one special and at least one normal value.
We know that every time you sample a value it is either Special or Normal, but not both:
P(Special) + P(Normal) = 1
We know that the likelihood of Special is 35% larger than the likelihood of Normal:
P(Special) = 1.35 * P(Normal)
This is a system of two linear equations with two unknowns. Here is its solution:
P(Normal) = 20 / 47
P(Special) = 27 / 47
Simply divide your set of values into two sets, Specials and Normals. Now to sample do the following:
Sample r uniformly from [0, 1].
If r < 20 / 47, then uniformly sample from Normals.
Else, then uniformly sample from Specials.
Answer (2)
Randomly select an item from the list.
If it is special or Math.random() < 1 / 1.35, then you are done.
Else, return to step 1.
You could take an approach where to first determine if you want to to trigger your "higher probability" code, and then pull from your custom list instead.
if ( Math.random() < .35 ) {
element = randChoice(myCustomListOfMoreProbableChoices);
} else {
element = randChoice(myListOfAllChoices);
}
Create an array for indexes to be chosen for your main array then randomly choose an element from that array to select from you main array:
var mainArray = [1, 2, 3];
var indexesArray = [0, 0, 0, 1, 2]; //1 has 3/5 chance of being chosen while 2 and 3 only have 1/5
Randomly select an index from indexesArray using a random number then:
mainArray[randomFromIndexArray];

How to split an array into two subsets and keep sum of sub-values of array as equal as possible

I really need an master of algorithm here! So the thing is I got for example an array like this:
[
[870, 23]
[970, 78]
[110, 50]
]
and I want to split it up, so that it looks like this:
// first array
[
[970, 78]
]
// second array
[
[870, 23]
[110, 50]
]
so now, why do I want it too look like this?
Because I want to keep the sum of sub values as equal as possible. So 970 is about 870 + 110 and 78 is about 23 + 50.
So in this case it's very easy because if you would just split them and only look at the first sub-value it will already be correct but I want to check both and keep them as equal as possible, so that it'll also work with an array which got 100 sub-arrays! So if anyone can tell me the algorithm with which I can program this it would be really great!
Scales:
~1000 elements (sublists) in the array
Elements are integers up to 10^9
I am looking for a "close enough solution" - it does not have to be the exact optimal solution.
First, as already established - the problem is NP-Hard, with a reduction form Partition Problem.
Reduction:
Given an instance of partition problem, create lists of size 1 each. The result will be this problem exactly.
Conclusion from the above:
This problem is NP-Hard, and there is no known polynomial solution.
Second, Any exponential and pseudo polynomial solutions will take just too long to work, due to the scale of the problem.
Third, It leaves us with heuristics and approximation algorithms.
I suggest the following approach:
Normalize the scales of the sublists, so all the elements will be in the same scale (say, all will be normalzied to range [-1,1] or all will be normalized to standard normal distribution).
Create a new list, in which, each element will be the sum of the matching sublist in the normalized list.
Use some approximation or heuristical solution that was developed for the subset-sum / partition problem.
The result will not be optimal, but optimal is really unattanable here.
From what I gather from the discussion under the original post, you're not searching for a single splitting point, but rather you want to distribute all pairs among two sets, such that the sums in each of the two sets are approximately equal.
Since a close enough solution is acceptable, maybe you could try an approach based on simulated annealing?
(see http://en.wikipedia.org/wiki/Simulated_annealing)
In short, the idea is that you start out by randomly assigning each pair to either the Left or the Right set.
Next, you generate a new state by either
a) moving a randomly selected pair from the Left to the Right set,
b) moving a randomly selected pair
from the Right to the Left set, or
c) doing both.
Next, determine if this new state is better or worse than the current state. If it is better, use it.
If it is worse, take it only if it is accepted by the acceptance probability function, which is a function
that initially allows worse states to be used, but favours them less and less as time moves on (or the "temperature decreases", in SA terms).
After a large number of iterations (say 100.000), you should have a pretty good result.
Optionally, rerun this algorithm multiple times because it may get stuck in local optima (although the acceptance probability function attempts to counter this).
Advantages of this approach are that it's simple to implement, and you can decide for yourself how long
you want it to continue searching for a better solution.
I'm assuming that we're just looking for a place in the middle of the array to split it into its first and second part.
It seems like a linear algorithm could do this. Something like this in JavaScript.
arrayLength = 2;
tolerance = 10;
// Initialize the two sums.
firstSum = [];
secondSum = [];
for (j = 0; j < arrayLength; j++)
{
firstSum[j] = 0;
secondSum[j] = 0;
for (i = 0; i < arrays.length; i++)
{
secondSum += arrays[i][j];
}
}
// Try splitting at every place in "arrays".
// Try to get the sums as close as possible.
for (i = 0; i < arrays.length; i++)
{
goodEnough = true;
for (j = 0; j < arrayLength; j++)
{
if (Math.abs(firstSum[j] - secondSum[j]) > tolerance)
goodEnough = false;
}
if (goodEnough)
{
alert("split before index " + i);
break;
}
// Update the sums for the new position.
for (j = 0; j < arrayLength; j++)
{
firstSum[j] += arrays[i][j];
secondSum[j] -= arrays[i][j];
}
}
Thanks for all the answers, the bruteforce attack was a good idea and NP-Hard is related to this too, but it turns out that this is a multiple knapsack problem and can be solved using this pdf document.

Even sets of integers

Given an array of integers heights I would like to split these into n sets, each with equal totalHeight (sum of the values in set), or as close to as possible. There must be a fixed distance, gap, between each value in a set. Sets do not have to have the same number of values.
For example, supposing:
heights[0, 1, 2, 3, 4, 5] = [120, 78, 110, 95, 125, 95]
n = 3
gaps = 10
Possible arrangements would be:
a[0, 1], b[2, 3], c[4, 5] giving totalHeight values of
a = heights[0] + gap + heights[1] = 120 + 10 + 78 = 208
b = heights[2] + gap + heights[3] = 110 + 10 + 95 = 215
c = heights[4] + gap + heights[5] = 125 + 10 + 95 = 230
a[0], b[1, 2, 3], c[4, 5] giving totalHeight values of
a = heights[0] = 120
b = heights[1] + gap + heights[2] + gap + heights[3] = 303
c = heights[4] + gap + heights[5] = 125 + 10 + 95 = 230
And so on. I want to find the combination that gives the most evenly-sized sets. So in this example the first combination is better since it gives an overall error of:
max - min = 230 - 208 = 22
Whereas the second combination gives an error of 183. I'm trying to do this in JavaScript, but I'm just looking for some sort of outline of an algorithm. Pseudo code or whatever would be great. Any help would be highly appreciated.
MY POOR ATTEMPTS: Obviously one way of solving this would be to just try every possible combination. That would be horrible though once heights gets large.
Another method I tried is to get the expected mean height of the sets, calculated as the sum of the values in height / n. Then I tried to fill each set individually by getting as close to this average as possible. It works alright in some cases, but it's too slow.
NOTE: If it helps, I would be happy to have symmetric sets. So for example, with sets (a, b c), a = b. Or with five sets (a, b, c, d, e), a = b and c = d. I think this would be even more difficult to implement but I could be wrong.
EDIT: For anyone who may be interested, the best I could come up with was the following algorithm:
Sort heights in descending order.
Create n sets.
Put the first n values from heights into the first slot of each set. i.e. put the n largest values at the start of each set. Remove the values from heights as they are added.
While heights.count > 0
Find the smallest totalHeight (including gap) in each of the n sets.
Add the next value in heights to this set (and remove the value from heights).
Then there's some little algorithm at the end where each set can make x number of swaps with the other sets, if the totalHeight gets closer to the average. I'm keeping x small because this process could go on forever.
It's not terrible, but obviously not perfect.
Seems like it is NP-complete and reducible to Subset sum problem or more precisely to Partition problem.
Your second approach - finding the mean (height / n), then attempting to fill sets with a mean as close as possible seems like a good practical approach. You say this is too slow... the following implementation is O(n*m log n) where m is the maximum number of elements allowed in a set. If m can be very large, then this could be quite slow, however if m is constrained to be within a certain range then it could approach O(n log n) which is about as fast as you are going to get.
Find the mean height of all values. h_mean = Sum(h) / n; O(n).
Sort all heights. O(n log n).
Examine the highest and lowest height.
Add the value which is furthest from the mean to a new set.
Remove this value from the sorted heights.
Repeat for max_number allowed in set = 1 .. m (m < n / 2)
{
Repeat:
{
If the set mean is higher than the mean.
Add the lowest value from the sorted heights.
Remove this value from the sorted heights.
If the set mean is lower than the mean
Add the highest value from the sorted heights.
Remove this value from the sorted heights.
Recalculate the set mean (taking account of the gap).
If the new set mean is further from the h_mean than the last OR
If the set has too many elements
break
}
Until all numbers are used.
Keep track of the standard deviations for this assignment.
If this assignment is the best so far, keep it.
}
This isn't going to give a provably optimal solution, but it's simple and that has a lot going for it...
Note, in this algorithm, all sets have the same number of elements, m. You repeat an iteration for different values of m, say, 2, 3, 4 (note that m should be a factor of N). Each set ends up with approximately m * mean_height for total height.
You may ask, well what if N is prime?
Then clearly, one set will value short on total value.
Does this mean this algorithm is useless?
Not at all. It's simple and it should produce a good first attempt at a solution. You may wish to use this algorithm first, then refine the first result using optimization techniques (such as selective swapping of heights being sets).

Categories

Resources