edited
I have a list with N elements, where K elements are "special" and the rest are "normal". What I'm trying to do is pick an element at random, but special elements should be selected 35% more often than normal items.
For example:
var myList = [
{id: 1, special: 0},
{id: 2, special: 1} // <= special item
];
After 600 selections, the normal element should be selected 250 times, and the second should be selected 35% more times than that, or 350 times.
This is different from the suggested duplicate question because my weights do not add up to 1. I can have any arbitrary of elements in my list, and zero or more of them are special. The weight is always 1.35 for special items, and 1.0 for normal items.
Your question is ambiguous with regards to the "35% more often" part. (1) Does it mean that special values as a whole are chosen 35% more than normal values as a whole? (2) Or does it mean that special values are simply weighted 1.35 and normal values are weighted 1?
The two question variants I have described have different answers.
Answer (1)
Note you must always have at least one special and at least one normal value.
We know that every time you sample a value it is either Special or Normal, but not both:
P(Special) + P(Normal) = 1
We know that the likelihood of Special is 35% larger than the likelihood of Normal:
P(Special) = 1.35 * P(Normal)
This is a system of two linear equations with two unknowns. Here is its solution:
P(Normal) = 20 / 47
P(Special) = 27 / 47
Simply divide your set of values into two sets, Specials and Normals. Now to sample do the following:
Sample r uniformly from [0, 1].
If r < 20 / 47, then uniformly sample from Normals.
Else, then uniformly sample from Specials.
Answer (2)
Randomly select an item from the list.
If it is special or Math.random() < 1 / 1.35, then you are done.
Else, return to step 1.
You could take an approach where to first determine if you want to to trigger your "higher probability" code, and then pull from your custom list instead.
if ( Math.random() < .35 ) {
element = randChoice(myCustomListOfMoreProbableChoices);
} else {
element = randChoice(myListOfAllChoices);
}
Create an array for indexes to be chosen for your main array then randomly choose an element from that array to select from you main array:
var mainArray = [1, 2, 3];
var indexesArray = [0, 0, 0, 1, 2]; //1 has 3/5 chance of being chosen while 2 and 3 only have 1/5
Randomly select an index from indexesArray using a random number then:
mainArray[randomFromIndexArray];
Related
I receive one or more trees of nodes.
Nodes may, or may not have ID properties on them.
Currently I am iterating through the tree and adding random 8 digit numbers on nodes which do not have ID property. As I do not expect more than 10k nodes in the trees chance of having collisions is very small.
Still I am considering how best to reduce the length of the IDs to maybe 4 digits while making sure there are no collisions within one tree. What comes to my mind is to iterate once through the tree gathering existing IDs into a Set and than again adding new IDs while checking against the Set that there are no collisions. Set would have to be reset for each tree.
I would appreciate your opinion on this matter and advice if there are more performant ways of achieving this.
Appendix A:
I am considering following (simplified 0-9) issue. If I have a Set of existing IDs [0, 1, 2, 5, 8, 9] I would have to generate random numbers until I get e.g. 4 (no collision) which I am concerned would be a bit slow on a larger Set and surely not the optimal route.
Here you have a really simple approach which will generate for you an array of not used numbers with a given MAX range.
const MAX = 30;
const usedNumbers = [3, 4, 12, 13, 14, 16, 23, 27];
// https://stackoverflow.com/questions/3746725/how-to-create-an-array-containing-1-n
const notUsedNumbers = Array.from(Array(MAX), (_, i) => i+1).filter(i => !usedNumbers.includes(i));
console.log(notUsedNumbers);
And link to fiddle: https://jsfiddle.net/L9r6anq1/
10^8 possibilities with random selection means you have a 50% chance of collision with only 10^4 objects (see Birthday Paradox); that is not "very small" odds. Reducing that to only 10^4 possibilities with 10^4 objects means collisions will approach 100% as you get toward the end and, if you ever have 10k+1 objects one day, will never terminate.
In general, if you want to use a relatively short ID space, you're going to need a very efficient conflict detection system, e.g. keeping all assigned (or not assigned) values in a scratchpad, or just give up on randomly assigning values and go sequentially.
I am implementing a simple application, where I have a bunch of lineHeights stored like so:
lineHeights = [2, 4, 5, 7, 2, 4, 5, /* ... */]
Now given two heights, a and b I need to find the range of lines in between a and b
For example, if the heights are [2, 2, 4, 5, 2], then the range of lines in between 3 and 7 would be [1, 2] as lines 1 to 2 are contained by the range of heights given.
A naive implementation would be to store the lines as an array, and travel up the array to see which lines are in between the heights given, as shown in the following pseudo code (which I haven't tested, but I hope you can understand what I'm showing):
get_line_range_between_heights (lines start_height end_height)
index = 0
height = 0
while true
height = height + lines[index]
if height >= start_height
break
index = index + 1
result_ranges = [index]
while true
height = height + lines[index]
if height >= end_height
break
index = index + 1
push index into result_ranges
return result_ranges
However, the simple implementation comes at a cost - while inserting and removing heights is fast, querying becomes an O(n) operation, where n is the number of lines
So my question is - is there a sort of data structure (for example something like a binary search tree) which ideally would have insert, delete and search (for lines in between two heights) operations in O(n) or better, which is specialised for this sort of problem (ideally with an implementation, or a link to one)?
I am having issues with understanding dynamic programming solutions to various problems, specifically the coin change problem:
"Given a value N, if we want to make change for N cents, and we have infinite supply of each of S = { S1, S2, .. , Sm} valued coins, how many ways can we make the change? The order of coins doesn’t matter.
For example, for N = 4 and S = {1,2,3}, there are four solutions: {1,1,1,1},{1,1,2},{2,2},{1,3}. So output should be 4. For N = 10 and S = {2, 5, 3, 6}, there are five solutions: {2,2,2,2,2}, {2,2,3,3}, {2,2,6}, {2,3,5} and {5,5}. So the output should be 5."
There is another variation of this problem where the solution is the minimum number of coins to satisfy the amount.
These problems appear very similar, but the solutions are very different.
Number of possible ways to make change: the optimal substructure for this is DP(m,n) = DP(m-1, n) + DP(m, n-Sm) where DP is the number of solutions for all coins up to the mth coin and amount=n.
Minimum amount of coins: the optimal substructure for this is
DP[i] = Min{ DP[i-d1], DP[i-d2],...DP[i-dn] } + 1 where i is the total amount and d1..dn represent each coin denomination.
Why is it that the first one required a 2-D array and the second a 1-D array? Why is the optimal substructure for the number of ways to make change not "DP[i] = DP[i-d1]+DP[i-d2]+...DP[i-dn]" where DP[i] is the number of ways i amount can be obtained by the coins. It sounds logical to me, but it produces an incorrect answer. Why is that second dimension for the coins needed in this problem, but not needed in the minimum amount problem?
LINKS TO PROBLEMS:
http://comproguide.blogspot.com/2013/12/minimum-coin-change-problem.html
http://www.geeksforgeeks.org/dynamic-programming-set-7-coin-change/
Thanks in advance. Every website I go to only explains how the solution works, not why other solutions do not work.
Lets first talk about the number of ways, DP(m,n) = DP(m-1, n) + DP(m, n-Sm). This in indeed correct because either you can use the mth denomination or you can avoid it. Now you say why don't we write it as DP[i] = DP[i-d1]+DP[i-d2]+...DP[i-dn]. Well this will lead to over counting , lets take an example where n=4 m=2 and S={1,3}. Now according to your solution dp[4]=dp[1]+dp[3]. ( Assuming 1 to be a base case dp[1]=1 ) .Now dp[3]=dp[2]+dp[0]. ( Again dp[0]=1 by base case ). Again applying the same dp[2]=dp[1]=1. Thus in total you get answer as 3 when its supposed to be just 2 ( (1,3) and (1,1,1,1) ). Its so because
your second method treats (1,3) and (3,1) as two different solution.Your second method can be applied to case where order matters, which is also a standard problem.
Now to your second question you say that minimum number of denominations can
be found out by DP[i] = Min{ DP[i-d1], DP[i-d2],...DP[i-dn] } + 1. Well this is correct as in finding minimum denominations, order or no order does not matter. Why this is linear / 1-D DP , well although the DP array is 1-D each state depends on at most m states unlike your first solution where array is 2-D but each state depends on at most 2 states. So in both case run time which is ( number of states * number of states each state depends on ) is the same which is O(nm). So both are correct, just your second solution saves memory. So either you can find it by 1-D array method or by 2-D by using the recurrence
dp(n,m)=min(dp(m-1,n),1+dp(m,n-Sm)). (Just use min in your first recurrence)
Hope I cleared the doubts , do post if still something is unclear.
This is a very good explanation of the coin change problem using Dynamic Programming.
The code is as follows:
public static int change(int amount, int[] coins){
int[] combinations = new int[amount + 1];
combinations[0] = 1;
for(int coin : coins){
for(int i = 1; i < combinations.length; i++){
if(i >= coin){
combinations[i] += combinations[i - coin];
//printAmount(combinations);
}
}
//System.out.println();
}
return combinations[amount];
}
I'm trying to optimize a slow part of a larger algorithm.
I have an array of random-looking numbers:
[295, 292, 208, 393, 394, 291, 182, 145, 175, 288, 71, 86, 396, 422]
(actual arrays are much longer), and an index: N (N = 5)
What I want to do is subtract subtract 1 from the last M elements for each of the first N elements that are smaller
So (pseudo-code):
for a = 1..5
for b = 6..N
if ary[a] < ary[b]
ary[b]--;
Obviously this is a horribly inefficient O(N^2) algorithm. I'm trying to think of a faster way to do it, but can't. It seems like I should be able to pre-compute the values to subtract somehow, and reduce it to:
for a = 1..5
// ???
for b = 6..N
ary[b] -= ???
but I'm missing something.
[edit] I feel like an idiot. I didn't properly explain what I want, fixed.
Let's reorganize the loops:
for b = 6..N
for a = 1..5
if ary[a] < ary[b]
ary[b] -= ary[a];
The result will be the same. But now the logic is more clear: from each element from the second part of array you subtract ary[1], then ary[2] and so on until some ary[a] is bigger that what remains from ary[b].
This can be optimized the following way. Calculate the cumulative sums of the first half of array: let sum[1]=ary[1], sum[2]=sum[1]+ary[2], sum[3]=sum[2]+ary[3], and so on. This can be done in O(N).
Now for each b you need to find such a_last that sum[a_last]<ary[b], but sum[a_last+1]>=ary[b] -- this will mean that from ary[b] you will subtract ary[1]+...+ary[a_last]=sum[a_last]. You can find such a_last using binary search in O(log N), thus making the overall algorithm O(N log N).
The pseudocode:
sum[0] = 0
for a = 1..5
sum[a] = sum[a-1] + ary[a]
for b = 6..N
a_last = maximal a such that sum[a]<ary[b] // use binary search
ary[b] -= sum[a_last]
It seems to me that Petr's solution doesn't follow the specification of the problem: we should subtract 1 (one) for each "hit", not the VALUE of ary[a].
Here's an alternative:
place the testvalues ary[j], j=0..N-1 in a SORTED array: sortedSmallAry[N].
Then run over all b, i.e. b=N, ..., Total-1, and for each ary[b]:
run j=0..N-1 over the sortedSmallAry, test whether sortedSmallAry[j] is smaller than ary[b], count the hits, exit as soon as it fails (because it's sorted). If N is relatively large you can use a binary search in sortedSmallAray to determine how many of its elements satisfy the condition.
Subtract the hitcount from ary[b].
Given an array of integers heights I would like to split these into n sets, each with equal totalHeight (sum of the values in set), or as close to as possible. There must be a fixed distance, gap, between each value in a set. Sets do not have to have the same number of values.
For example, supposing:
heights[0, 1, 2, 3, 4, 5] = [120, 78, 110, 95, 125, 95]
n = 3
gaps = 10
Possible arrangements would be:
a[0, 1], b[2, 3], c[4, 5] giving totalHeight values of
a = heights[0] + gap + heights[1] = 120 + 10 + 78 = 208
b = heights[2] + gap + heights[3] = 110 + 10 + 95 = 215
c = heights[4] + gap + heights[5] = 125 + 10 + 95 = 230
a[0], b[1, 2, 3], c[4, 5] giving totalHeight values of
a = heights[0] = 120
b = heights[1] + gap + heights[2] + gap + heights[3] = 303
c = heights[4] + gap + heights[5] = 125 + 10 + 95 = 230
And so on. I want to find the combination that gives the most evenly-sized sets. So in this example the first combination is better since it gives an overall error of:
max - min = 230 - 208 = 22
Whereas the second combination gives an error of 183. I'm trying to do this in JavaScript, but I'm just looking for some sort of outline of an algorithm. Pseudo code or whatever would be great. Any help would be highly appreciated.
MY POOR ATTEMPTS: Obviously one way of solving this would be to just try every possible combination. That would be horrible though once heights gets large.
Another method I tried is to get the expected mean height of the sets, calculated as the sum of the values in height / n. Then I tried to fill each set individually by getting as close to this average as possible. It works alright in some cases, but it's too slow.
NOTE: If it helps, I would be happy to have symmetric sets. So for example, with sets (a, b c), a = b. Or with five sets (a, b, c, d, e), a = b and c = d. I think this would be even more difficult to implement but I could be wrong.
EDIT: For anyone who may be interested, the best I could come up with was the following algorithm:
Sort heights in descending order.
Create n sets.
Put the first n values from heights into the first slot of each set. i.e. put the n largest values at the start of each set. Remove the values from heights as they are added.
While heights.count > 0
Find the smallest totalHeight (including gap) in each of the n sets.
Add the next value in heights to this set (and remove the value from heights).
Then there's some little algorithm at the end where each set can make x number of swaps with the other sets, if the totalHeight gets closer to the average. I'm keeping x small because this process could go on forever.
It's not terrible, but obviously not perfect.
Seems like it is NP-complete and reducible to Subset sum problem or more precisely to Partition problem.
Your second approach - finding the mean (height / n), then attempting to fill sets with a mean as close as possible seems like a good practical approach. You say this is too slow... the following implementation is O(n*m log n) where m is the maximum number of elements allowed in a set. If m can be very large, then this could be quite slow, however if m is constrained to be within a certain range then it could approach O(n log n) which is about as fast as you are going to get.
Find the mean height of all values. h_mean = Sum(h) / n; O(n).
Sort all heights. O(n log n).
Examine the highest and lowest height.
Add the value which is furthest from the mean to a new set.
Remove this value from the sorted heights.
Repeat for max_number allowed in set = 1 .. m (m < n / 2)
{
Repeat:
{
If the set mean is higher than the mean.
Add the lowest value from the sorted heights.
Remove this value from the sorted heights.
If the set mean is lower than the mean
Add the highest value from the sorted heights.
Remove this value from the sorted heights.
Recalculate the set mean (taking account of the gap).
If the new set mean is further from the h_mean than the last OR
If the set has too many elements
break
}
Until all numbers are used.
Keep track of the standard deviations for this assignment.
If this assignment is the best so far, keep it.
}
This isn't going to give a provably optimal solution, but it's simple and that has a lot going for it...
Note, in this algorithm, all sets have the same number of elements, m. You repeat an iteration for different values of m, say, 2, 3, 4 (note that m should be a factor of N). Each set ends up with approximately m * mean_height for total height.
You may ask, well what if N is prime?
Then clearly, one set will value short on total value.
Does this mean this algorithm is useless?
Not at all. It's simple and it should produce a good first attempt at a solution. You may wish to use this algorithm first, then refine the first result using optimization techniques (such as selective swapping of heights being sets).