Math.random in regards to arrays - javascript

I am confused about how arrays work in tandem with functions like Math.random(). Since the Math.random() function selects a number greater than or equal to 0 and less than 1, what specific number is assigned to each variable in an array? For example, in the code below, what number would have to be selected to print out 1? What number would have to be selected to print out jaguar?
var examples= [1, 2, 3, 56, "foxy", 9999, "jaguar", 5.4, "caveman"];
var example= examples[Math.round(Math.random() * (examples.length-1))];
console.log(example);
Is each element in an array assigned a position number equal to x/n (x being the position number relative to the first element and n being the number of elements)? Since examples has 9 elements, would 1 be at position 1/9 and would 9999 be at position 6/9?

Math.round() vs. Math.floor()
The first thing to note: Math.round() is never the right function to use when you're dealing with a value returned by Math.random(). It should be Math.floor() instead, and then you don't need that -1 correction on the length. This is because Math.random() returns a value that is >= 0 and < 1.
This is a bit tricky, so let's take a specific example: an array with three elements. As vihan1086's excellent answer explains, the elements of this array are numbered 0, 1, and 2. To select a random element from this array, you want an equal chance of getting any one of those three values.
Let's see how that works out with Math.round( Math.random() * array.length - 1 ). The array length is 3, so we will multiply Math.random() by 2. Now we have a value n that is >= 0 and < 2. We round that number to the nearest integer:
If n is >= 0 and < .5, it rounds to 0.
If n is >= .5 and < 1.5, it rounds to 1.
If n is >= 1.5 and < 2, it rounds to 2.
So far so good. We have a chance of getting any of the three values we need, 0, 1, or 2. But what are the chances?
Look closely at those ranges. The middle range (.5 up to 1.5) is twice as long as the other two ranges (0 up to .5, and 1.5 up to 2). Instead of an equal chance for any of the three index values, we have a 25% chance of getting 0, a 50% chance of getting 1, and a 25% chance of 2. Oops.
Instead, we need to multiply the Math.random() result by the entire array length of 3, so n is >= 0 and < 3, and then floor that result: Math.floor( Math.random() * array.length ) It works like this:
If n is >= 0 and < 1, it floors to 0.
If n is >= 1 and < 2, it floors to 1.
If n is >= 2 and < 3, it floors to 2.
Now we clearly have an equal chance of hitting any of the three values 0, 1, or 2, because each of those ranges is the same length.
Keeping it simple
Here is a recommendation: don't write all this code in one expression. Break it up into simple functions that are self-explanatory and make sense. Here's how I like to do this particular task (picking a random element from an array):
// Return a random integer in the range 0 through n - 1
function randomInt( n ) {
return Math.floor( Math.random() * n );
}
// Return a random element from an array
function randomElement( array ) {
return array[ randomInt(array.length) ];
}
Then the rest of the code is straightforward:
var examples = [ 1, 2, 3, 56, "foxy", 9999, "jaguar", 5.4, "caveman" ];
var example = randomElement( examples );
console.log( example );
See how much simpler it is this way? Now you don't have to do that math calculation every time you want to get a random element from an array, you can simply call randomElement(array).

They're is quite a bit happening so I'll break it up:
Math.random
You got the first part right. Math.random will generate a number >= 0 and < 1. Math.random can return 0 but chances are almost 0 I think it's like 10^{-16} (you are 10 billion times more likely to get struck by lightning). This will make a number such as:
0.6687583869788796
Let's stop there for a second
Arrays and their indexes
Each item in an array has an index or position. This ranges from 0 - infinity. In JavaScript, arrays start at zero, not one. Here's a chart:
[ 'foo', 'bar', 'baz' ]
Now the indexes are as following:
name | index
-----|------
foo | 0
bar | 1
baz | 2
To get an item from it's index, use []:
fooBarBazArray[0]; // foo
fooBarBazArray[2]; // baz
Array length
Now the array length won't be the same as the largest index. It will be the length as if we counted it. So the above array will return 3. Each array has a length property which contains it's length:
['foo', 'bar', 'baz'].length; // Is 3
More Random Math
Now let's take a look at this randomizing thing:
Math.round(Math.random() * (mathematics.length-1))
They're is a lot going on. Let's break it down:
Math.random()
So first we generate a random number.
* mathematics.length - 1
The goal of this random is to generate a random array index. We need to subtract 1 from the length to get the highest index.
First Part conclusions
This now gives us a number ranging from 0 - max array index. On the sample array I showed earlier:
Math.random() * (['foo', 'bar', 'baz'].length - 1)
Now they're is a little problem:
This code makes a random number between 0 and the length. That means the -1 shouldn't be there. Let's fix this code:
Math.random() * ['foo', 'bar', 'baz'].length
Running this code, I get:
2.1972009977325797
1.0244733088184148
0.1671080442611128
2.0442249791231006
1.8239217158406973
Finally
To get out random index, we have to make this from an ugly decimal to a nice integer: Math.floor will basically truncate the decimal off.
Math.floor results:
2
0
2
1
2
We can put this code in the [] to select an item in the array at the random index.
More Information / Sources
Random Numbers
More solutions

You're looking at simple multiplication, and a bug in your code. It should reference the array 'examples' that you are selecting from, instead of some thing you haven't mentioned called 'mathematics':
var example = examples[Math.round(Math.random() * (examples.length-1))];
^^
Then you're just multiplying a random number by the number of things in the array. So the maximum random number is 1 and if there are 50 things in your array you multiply the random number by 50, and now the maximum random number is 50.
And all the smaller random numbers (0 to 1) are also scaled 50x and now spread from (0 to 50) with roughly the same randomness to them. Then you round it to the nearest whole number, which is a random index into your array from 1 to n, and you can do element[thatnumber] to pick it out.
Full examples:
Math.random() returns numbers between 0 and 1 (it can return 0 but chances of that are incredibly small):
Math.random()
0.11506261994225964
Math.random()
0.5607304393516861
Math.random()
0.5050221864582
Math.random()
0.4070177578793308
Math.random()
0.6352060229006462
Multiply those numbers by something to scale them up; 1 x 10 = 10 and so Math.random() * 10 = random numbers between 0 and 10.
Math.random() *n returns numbers between 0 and n:
Math.random() * 10
2.6186012867183326
Math.random() * 10
5.616868671026196
Math.random() * 10
0.7765205189156167
Math.random() * 10
6.299650241067698
Then Math.round(number) knocks the decimals off and leaves the nearest whole number between 1 and 10:
Math.round(Math.random() * 10)
5
Then you select that numbered element:
examples[ Math.round(Math.random() * 10) ];
And you use .length-1 because indexing counts from 0 and finishes at length-1, (see #vihan1086's explanation which has lots about array indexing).
This approach is not very good at being random - particularly it's much less likely to pick the first and last elements. I didn't realise when I wrote this, but #Michael Geary's answer is much better - avoiding Math.round() and not using length-1.

This is an old question but I will provide a new and shorter solution to get a random item from an array.
Math.random
It returns a number between 0 and 1 (1 not included).
Bitwise not ~
This operator behaves returning the oposite value that you are providing, so:
a = 5
~a // -5
It also forgets about decimals, so for instance:
a = 5.95
~a // -5
It is skipping the decimals, so somehow it behaves like Math.floor (without returning a negative value, of course).
Doubled operators
Negative logical operator !, used to coerce to a boolean type is !!null // false and we are forcing it by double negation.
If we use the same idea but for numbers, we are forcing a number to floor if we do: ~~5.999 // 5
Therefore,
TLDR;
getRandom = (arr, len = arr.length) => arr[~~(Math.random() * len)]
example:
getRandom([1,2,3,4,5]) // random item between 1 and 5

Related

is nanoid's random algorithm really better then random % alphabet? [duplicate]

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand() in C++.
So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).
Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability!
When rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11
When rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11
When rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11
This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.
So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.
So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:
int x;
do {
x = rand();
} while (x >= n);
but that's inefficient for low values of n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.
A more efficient formula approach would be to take some large range with a length divisible by n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:
int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;
For small values of n, this will rarely require more than one call to rand().
Works cited and further reading:
CPlusPlus Reference
Eternally Confuzzled
Keep selecting a random is a good way to remove the bias.
Update
We could make the code fast if we search for an x in range divisible by n.
// Assumptions
// rand() in [0, RAND_MAX]
// n in (0, RAND_MAX]
int x;
// Keep searching for an x in a range divisible by n
do {
x = rand();
} while (x >= RAND_MAX - (RAND_MAX % n))
x %= n;
The above loop should be very fast, say 1 iteration on average.
#user1413793 is correct about the problem. I'm not going to discuss that further, except to make one point: yes, for small values of n and large values of RAND_MAX, the modulo bias can be very small. But using a bias-inducing pattern means that you must consider the bias every time you calculate a random number and choose different patterns for different cases. And if you make the wrong choice, the bugs it introduces are subtle and almost impossible to unit test. Compared to just using the proper tool (such as arc4random_uniform), that's extra work, not less work. Doing more work and getting a worse solution is terrible engineering, especially when doing it right every time is easy on most platforms.
Unfortunately, the implementations of the solution are all incorrect or less efficient than they should be. (Each solution has various comments explaining the problems, but none of the solutions have been fixed to address them.) This is likely to confuse the casual answer-seeker, so I'm providing a known-good implementation here.
Again, the best solution is just to use arc4random_uniform on platforms that provide it, or a similar ranged solution for your platform (such as Random.nextInt on Java). It will do the right thing at no code cost to you. This is almost always the correct call to make.
If you don't have arc4random_uniform, then you can use the power of opensource to see exactly how it is implemented on top of a wider-range RNG (ar4random in this case, but a similar approach could also work on top of other RNGs).
Here is the OpenBSD implementation:
/*
* Calculate a uniformly distributed random number less than upper_bound
* avoiding "modulo bias".
*
* Uniformity is achieved by generating new random numbers until the one
* returned is outside the range [0, 2**32 % upper_bound). This
* guarantees the selected random number will be inside
* [2**32 % upper_bound, 2**32) which maps back to [0, upper_bound)
* after reduction modulo upper_bound.
*/
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
It is worth noting the latest commit comment on this code for those who need to implement similar things:
Change arc4random_uniform() to calculate 2**32 % upper_bound as
-upper_bound % upper_bound. Simplifies the code and makes it the
same on both ILP32 and LP64 architectures, and also slightly faster on
LP64 architectures by using a 32-bit remainder instead of a 64-bit
remainder.
Pointed out by Jorden Verwer on tech#
ok deraadt; no objections from djm or otto
The Java implementation is also easily findable (see previous link):
public int nextInt(int n) {
if (n <= 0)
throw new IllegalArgumentException("n must be positive");
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
Definition
Modulo Bias is the inherent bias in using modulo arithmetic to reduce an output set to a subset of the input set. In general, a bias exists whenever the mapping between the input and output set is not equally distributed, as in the case of using modulo arithmetic when the size of the output set is not a divisor of the size of the input set.
This bias is particularly hard to avoid in computing, where numbers are represented as strings of bits: 0s and 1s. Finding truly random sources of randomness is also extremely difficult, but is beyond the scope of this discussion. For the remainder of this answer, assume that there exists an unlimited source of truly random bits.
Problem Example
Let's consider simulating a die roll (0 to 5) using these random bits. There are 6 possibilities, so we need enough bits to represent the number 6, which is 3 bits. Unfortunately, 3 random bits yields 8 possible outcomes:
000 = 0, 001 = 1, 010 = 2, 011 = 3
100 = 4, 101 = 5, 110 = 6, 111 = 7
We can reduce the size of the outcome set to exactly 6 by taking the value modulo 6, however this presents the modulo bias problem: 110 yields a 0, and 111 yields a 1. This die is loaded.
Potential Solutions
Approach 0:
Rather than rely on random bits, in theory one could hire a small army to roll dice all day and record the results in a database, and then use each result only once. This is about as practical as it sounds, and more than likely would not yield truly random results anyway (pun intended).
Approach 1:
Instead of using the modulus, a naive but mathematically correct solution is to discard results that yield 110 and 111 and simply try again with 3 new bits. Unfortunately, this means that there is a 25% chance on each roll that a re-roll will be required, including each of the re-rolls themselves. This is clearly impractical for all but the most trivial of uses.
Approach 2:
Use more bits: instead of 3 bits, use 4. This yield 16 possible outcomes. Of course, re-rolling anytime the result is greater than 5 makes things worse (10/16 = 62.5%) so that alone won't help.
Notice that 2 * 6 = 12 < 16, so we can safely take any outcome less than 12 and reduce that modulo 6 to evenly distribute the outcomes. The other 4 outcomes must be discarded, and then re-rolled as in the previous approach.
Sounds good at first, but let's check the math:
4 discarded results / 16 possibilities = 25%
In this case, 1 extra bit didn't help at all!
That result is unfortunate, but let's try again with 5 bits:
32 % 6 = 2 discarded results; and
2 discarded results / 32 possibilities = 6.25%
A definite improvement, but not good enough in many practical cases. The good news is, adding more bits will never increase the chances of needing to discard and re-roll. This holds not just for dice, but in all cases.
As demonstrated however, adding an 1 extra bit may not change anything. In fact if we increase our roll to 6 bits, the probability remains 6.25%.
This begs 2 additional questions:
If we add enough bits, is there a guarantee that the probability of a discard will diminish?
How many bits are enough in the general case?
General Solution
Thankfully the answer to the first question is yes. The problem with 6 is that 2^x mod 6 flips between 2 and 4 which coincidentally are a multiple of 2 from each other, so that for an even x > 1,
[2^x mod 6] / 2^x == [2^(x+1) mod 6] / 2^(x+1)
Thus 6 is an exception rather than the rule. It is possible to find larger moduli that yield consecutive powers of 2 in the same way, but eventually this must wrap around, and the probability of a discard will be reduced.
Without offering further proof, in general using double the number
of bits required will provide a smaller, usually insignificant,
chance of a discard.
Proof of Concept
Here is an example program that uses OpenSSL's libcrypo to supply random bytes. When compiling, be sure to link to the library with -lcrypto which most everyone should have available.
#include <iostream>
#include <assert.h>
#include <limits>
#include <openssl/rand.h>
volatile uint32_t dummy;
uint64_t discardCount;
uint32_t uniformRandomUint32(uint32_t upperBound)
{
assert(RAND_status() == 1);
uint64_t discard = (std::numeric_limits<uint64_t>::max() - upperBound) % upperBound;
RAND_bytes((uint8_t*)(&randomPool), sizeof(randomPool));
while(randomPool > (std::numeric_limits<uint64_t>::max() - discard)) {
RAND_bytes((uint8_t*)(&randomPool), sizeof(randomPool));
++discardCount;
}
return randomPool % upperBound;
}
int main() {
discardCount = 0;
const uint32_t MODULUS = (1ul << 31)-1;
const uint32_t ROLLS = 10000000;
for(uint32_t i = 0; i < ROLLS; ++i) {
dummy = uniformRandomUint32(MODULUS);
}
std::cout << "Discard count = " << discardCount << std::endl;
}
I encourage playing with the MODULUS and ROLLS values to see how many re-rolls actually happen under most conditions. A sceptical person may also wish to save the computed values to file and verify the distribution appears normal.
Mark's Solution (The accepted solution) is Nearly Perfect.
int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;
edited Mar 25 '16 at 23:16
Mark Amery 39k21170211
However, it has a caveat which discards 1 valid set of outcomes in any scenario where RAND_MAX (RM) is 1 less than a multiple of N (Where N = the Number of possible valid outcomes).
ie, When the 'count of values discarded' (D) is equal to N, then they are actually a valid set (V), not an invalid set (I).
What causes this is at some point Mark loses sight of the difference between N and Rand_Max.
N is a set who's valid members are comprised only of Positive Integers, as it contains a count of responses that would be valid. (eg: Set N = {1, 2, 3, ... n } )
Rand_max However is a set which ( as defined for our purposes ) includes any number of non-negative integers.
In it's most generic form, what is defined here as Rand Max is the Set of all valid outcomes, which could theoretically include negative numbers or non-numeric values.
Therefore Rand_Max is better defined as the set of "Possible Responses".
However N operates against the count of the values within the set of valid responses, so even as defined in our specific case, Rand_Max will be a value one less than the total number it contains.
Using Mark's Solution, Values are Discarded when: X => RM - RM % N
EG:
Ran Max Value (RM) = 255
Valid Outcome (N) = 4
When X => 252, Discarded values for X are: 252, 253, 254, 255
So, if Random Value Selected (X) = {252, 253, 254, 255}
Number of discarded Values (I) = RM % N + 1 == N
IE:
I = RM % N + 1
I = 255 % 4 + 1
I = 3 + 1
I = 4
X => ( RM - RM % N )
255 => (255 - 255 % 4)
255 => (255 - 3)
255 => (252)
Discard Returns $True
As you can see in the example above, when the value of X (the random number we get from the initial function) is 252, 253, 254, or 255 we would discard it even though these four values comprise a valid set of returned values.
IE: When the count of the values Discarded (I) = N (The number of valid outcomes) then a Valid set of return values will be discarded by the original function.
If we describe the difference between the values N and RM as D, ie:
D = (RM - N)
Then as the value of D becomes smaller, the Percentage of unneeded re-rolls due to this method increases at each natural multiplicative. (When RAND_MAX is NOT equal to a Prime Number this is of valid concern)
EG:
RM=255 , N=2 Then: D = 253, Lost percentage = 0.78125%
RM=255 , N=4 Then: D = 251, Lost percentage = 1.5625%
RM=255 , N=8 Then: D = 247, Lost percentage = 3.125%
RM=255 , N=16 Then: D = 239, Lost percentage = 6.25%
RM=255 , N=32 Then: D = 223, Lost percentage = 12.5%
RM=255 , N=64 Then: D = 191, Lost percentage = 25%
RM=255 , N= 128 Then D = 127, Lost percentage = 50%
Since the percentage of Rerolls needed increases the closer N comes to RM, this can be of valid concern at many different values depending on the constraints of the system running he code and the values being looked for.
To negate this we can make a simple amendment As shown here:
int x;
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) );
x %= n;
This provides a more general version of the formula which accounts for the additional peculiarities of using modulus to define your max values.
Examples of using a small value for RAND_MAX which is a multiplicative of N.
Mark'original Version:
RAND_MAX = 3, n = 2, Values in RAND_MAX = 0,1,2,3, Valid Sets = 0,1 and 2,3.
When X >= (RAND_MAX - ( RAND_MAX % n ) )
When X >= 2 the value will be discarded, even though the set is valid.
Generalized Version 1:
RAND_MAX = 3, n = 2, Values in RAND_MAX = 0,1,2,3, Valid Sets = 0,1 and 2,3.
When X > (RAND_MAX - ( ( RAND_MAX % n ) + 1 ) % n )
When X > 3 the value would be discarded, but this is not a vlue in the set RAND_MAX so there will be no discard.
Additionally, in the case where N should be the number of values in RAND_MAX; in this case, you could set N = RAND_MAX +1, unless RAND_MAX = INT_MAX.
Loop-wise you could just use N = 1, and any value of X will be accepted, however, and put an IF statement in for your final multiplier. But perhaps you have code that may have a valid reason to return a 1 when the function is called with n = 1...
So it may be better to use 0, which would normally provide a Div 0 Error, when you wish to have n = RAND_MAX+1
Generalized Version 2:
int x;
if n != 0 {
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) );
x %= n;
} else {
x = rand();
}
Both of these solutions resolve the issue with needlessly discarded valid results which will occur when RM+1 is a product of n.
The second version also covers the edge case scenario when you need n to equal the total possible set of values contained in RAND_MAX.
The modified approach in both is the same and allows for a more general solution to the need of providing valid random numbers and minimizing discarded values.
To reiterate:
The Basic General Solution which extends mark's example:
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x;
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) ) );
x %= n;
The Extended General Solution which Allows one additional scenario of RAND_MAX+1 = n:
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x;
if n != 0 {
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) ) );
x %= n;
} else {
x = rand();
}
In some languages ( particularly interpreted languages ) doing the calculations of the compare-operation outside of the while condition may lead to faster results as this is a one-time calculation no matter how many re-tries are required. YMMV!
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x; // Resulting random number
int y; // One-time calculation of the compare value for x
y = RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n)
if n != 0 {
do {
x = rand();
} while (x > y);
x %= n;
} else {
x = rand();
}
There are two usual complaints with the use of modulo.
one is valid for all generators. It is easier to see in a limit case. If your generator has a RAND_MAX which is 2 (that isn't compliant with the C standard) and you want only 0 or 1 as value, using modulo will generate 0 twice as often (when the generator generates 0 and 2) as it will generate 1 (when the generator generates 1). Note that this is true as soon as you don't drop values, whatever the mapping you are using from the generator values to the wanted one, one will occurs twice as often as the other.
some kind of generator have their less significant bits less random than the other, at least for some of their parameters, but sadly those parameter have other interesting characteristic (such has being able to have RAND_MAX one less than a power of 2). The problem is well known and for a long time library implementation probably avoid the problem (for instance the sample rand() implementation in the C standard use this kind of generator, but drop the 16 less significant bits), but some like to complain about that and you may have bad luck
Using something like
int alea(int n){
assert (0 < n && n <= RAND_MAX);
int partSize =
n == RAND_MAX ? 1 : 1 + (RAND_MAX-n)/(n+1);
int maxUsefull = partSize * n + (partSize-1);
int draw;
do {
draw = rand();
} while (draw > maxUsefull);
return draw/partSize;
}
to generate a random number between 0 and n will avoid both problems (and it avoids overflow with RAND_MAX == INT_MAX)
BTW, C++11 introduced standard ways to the the reduction and other generator than rand().
With a RAND_MAX value of 3 (in reality it should be much higher than that but the bias would still exist) it makes sense from these calculations that there is a bias:
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
random_between(1, 3) % 2 = more likely a 1
In this case, the % 2 is what you shouldn't do when you want a random number between 0 and 1. You could get a random number between 0 and 2 by doing % 3 though, because in this case: RAND_MAX is a multiple of 3.
Another method
There is much simpler but to add to other answers, here is my solution to get a random number between 0 and n - 1, so n different possibilities, without bias.
the number of bits (not bytes) needed to encode the number of possibilities is the number of bits of random data you'll need
encode the number from random bits
if this number is >= n, restart (no modulo).
Really random data is not easy to obtain, so why use more bits than needed.
Below is an example in Smalltalk, using a cache of bits from a pseudo-random number generator. I'm no security expert so use at your own risk.
next: n
| bitSize r from to |
n < 0 ifTrue: [^0 - (self next: 0 - n)].
n = 0 ifTrue: [^nil].
n = 1 ifTrue: [^0].
cache isNil ifTrue: [cache := OrderedCollection new].
cache size < (self randmax highBit) ifTrue: [
Security.DSSRandom default next asByteArray do: [ :byte |
(1 to: 8) do: [ :i | cache add: (byte bitAt: i)]
]
].
r := 0.
bitSize := n highBit.
to := cache size.
from := to - bitSize + 1.
(from to: to) do: [ :i |
r := r bitAt: i - from + 1 put: (cache at: i)
].
cache removeFrom: from to: to.
r >= n ifTrue: [^self next: n].
^r
Modulo reduction is a commonly seen way to make a random integer generator avoid the worst case of running forever.
When the range of possible integers is unknown, however, there is no way in general to "fix" this worst case of running forever without introducing bias. It's not just modulo reduction (rand() % n, discussed in the accepted answer) that will introduce bias this way, but also the "multiply-and-shift" reduction of Daniel Lemire, or if you stop rejecting an outcome after a set number of iterations. (To be clear, this doesn't mean there is no way to fix the bias issues present in pseudorandom generators. For example, even though modulo and other reductions are biased in general, they will have no issues with bias if the range of possible integers is a power of 2 and if the random generator produces unbiased random bits or blocks of them.)
The following answer of mine discusses the relationship between running time and bias in random generators, assuming we have a "true" random generator that can produce unbiased and independent random bits. The answer doesn't even involve the rand() function in C because it has many issues. Perhaps the most serious here is the fact that the C standard does not explicitly specify a particular distribution for the numbers returned by rand(), not even a uniform distribution.
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
As the accepted answer indicates, "modulo bias" has its roots in the low value of RAND_MAX. He uses an extremely small value of RAND_MAX (10) to show that if RAND_MAX were 10, then you tried to generate a number between 0 and 2 using %, the following outcomes would result:
rand() % 3 // if RAND_MAX were only 10, gives
output of rand() | rand()%3
0 | 0
1 | 1
2 | 2
3 | 0
4 | 1
5 | 2
6 | 0
7 | 1
8 | 2
9 | 0
So there are 4 outputs of 0's (4/10 chance) and only 3 outputs of 1 and 2 (3/10 chances each).
So it's biased. The lower numbers have a better chance of coming out.
But that only shows up so obviously when RAND_MAX is small. Or more specifically, when the number your are modding by is large compared to RAND_MAX.
A much better solution than looping (which is insanely inefficient and shouldn't even be suggested) is to use a PRNG with a much larger output range. The Mersenne Twister algorithm has a maximum output of 4,294,967,295. As such doing MersenneTwister::genrand_int32() % 10 for all intents and purposes, will be equally distributed and the modulo bias effect will all but disappear.
I just wrote a code for Von Neumann's Unbiased Coin Flip Method, that should theoretically eliminate any bias in the random number generation process. More info can be found at (http://en.wikipedia.org/wiki/Fair_coin)
int unbiased_random_bit() {
int x1, x2, prev;
prev = 2;
x1 = rand() % 2;
x2 = rand() % 2;
for (;; x1 = rand() % 2, x2 = rand() % 2)
{
if (x1 ^ x2) // 01 -> 1, or 10 -> 0.
{
return x2;
}
else if (x1 & x2)
{
if (!prev) // 0011
return 1;
else
prev = 1; // 1111 -> continue, bias unresolved
}
else
{
if (prev == 1)// 1100
return 0;
else // 0000 -> continue, bias unresolved
prev = 0;
}
}
}

Improvements in Algo/code for following HackerRank problem

I'm aware, SO is not a place for homework and hence, being very specific to the scope of question.
I was trying to solve this problem on HackerRank: Array Manipulation - Crush. The problem statement is quite simple and I implemented following code:
function arrayManipulation(n, queries) {
const arr = new Array(n).fill(0)
for (let j = 0; j < queries.length; j++) {
const query = queries[j];
const i = query[0] - 1;
const limit = query[1];
const value = query[2];
while (i < limit) {
arr[i++] += value;
}
}
return Math.max.apply(null, arr);
}
Now, it works fine for half the test-cases but breaks with following message: Terminated due to timeout for test-cases 7 - 13 as the time limit is 1 sec.
So the question is, what are the areas where I can improve this code. In my understanding, with current algo, there is not much scope (I may be wrong), so how can I improve algo?
Note: Not looking for alternates using array functions like .map or .reduce as for is faster. Also, using Math.max.apply(context, array) as it is faster that having custom loop. Attaching references for them.
References:
How might I find the largest number contained in a JavaScript array?
Javascript efficiency: 'for' vs 'forEach'
We could make some observations for this problem
Let's keep a running sum representing the current value when we iterate from start to end of the array.
If we break each operation into two other operation (a b k) -> (a k) and (b -k) with (a k) means adding k into the running sum at position a and (b -k) means subtracting k from the sum at position b.
We could sort all of these operations by first their position, then their operator (addition preceding subtraction) we could see that we could always obtain the correct result.
Time complexty O (q log q) with q is the amount of queries.
Example:
a b k
1 5 3
4 8 7
6 9 1
we will break it into
(1 3) (5 -3) (4 7) (8 -7) (6 1) (9 -1)
Sort them:
(1 3) (4 7) (5 -3) (6 1) (8 -7) (9 -1)
Then go through one by one:
Start sum = 0
-> (1 3) -> sum = 3
-> (4 7) -> sum = 10
-> (5 -3) -> sum = 7
-> (6 1) -> sum = 8
-> (8 -7) -> sum = 1
-> (9 -1) -> sum = 0
The max sum is 10 -> answer for the problem.
My Java code which passed all tests https://ideone.com/jNbKHa
This algorithm will help.
https://www.geeksforgeeks.org/difference-array-range-update-query-o1/
Using this algorithm you can solve the problen in O(n+q) where n = size of the array and q = no of queries.
Why your brute force solution will not pass all test cases?
Today generation system can perform 10^8 operation in one second. keep this in mind you have to process N=10^7 input per query in the worse case. as you are using two nested for loops(one for adding K element and other for processing m queries) then the complexity of your solution is O(NM).
if you use your solution with O(NM) complexity it has to handle (10^7 *10 ^5)= 10^12 operation in worse case (which can not be computed in 1 sec at all)
That is the reason you will get the time out error for your brute force solution.
So you need to optimise your code which can be done with the help of prefix sum array.
instead of adding k to all the elements within a range from a to b in an array, accumulate the difference array
Whenever we add anything at any index into an array and apply prefix sum algorithm the same element will be added to every element till the end of the array.
ex- n=5, m=1, a=2 b=5 k=5
i 0.....1.....2.....3.....4.....5.....6 //take array of size N+2 to avoid index out of bound
A[i] 0 0 0 0 0 0 0
Add k=5 to at a=2
A[a]=A[a]+k // start index from where k element should be added
i 0.....1.....2.....3.....4.....5.....6
A[i] 0 0 5 0 0 0 0
now apply prefix sum algorithm
i 0.....1.....2.....3.....4.....5.....6
A[i] 0 0 5 5 5 5 5
so you can see K=5 add to all the element till the end after applying prefix sum but we don't have to add k till the end. so to negate this effect we have to add -K also after b+1 index so that only from [a,b] range only will have K element addition effect.
A[b+1]=A[b]-k // to remove the effect of previously added k element after bth index.
that's why adding -k in the initial array along with +k.
i 0.....1.....2.....3.....4.....5.....6
A[i] 0 0 5 0 0 0 -5
Now apply prefix sum Array
i 0.....1.....2.....3.....4.....5.....6
A[i] 0 0 5 5 5 5 0
You can see now K=5 got added from a=2 to b=5 which was expected.
Here we are only updating two indices for every query so complexity will be O(1).
Now apply the same algorithm in the input
# 0.....1.....2.....3.....4.....5.....6 //taken array of size N+2 to avoid index out of bound
5 3 # 0 0 0 0 0 0 0
1 2 100 # 0 100 0 -100 0 0 0
2 5 100 # 0 100 100 -100 0 0 -100
3 4 100 # 0 100 100 0 0 -100 -100
To calculate the max prefix sum, accumulate the difference array to 𝑁 while taking the maximum accumulated prefix.
After performing all the operation now apply prefix sum Array
i 0.....1.....2.....3.....4.....5.....6
A[i] 0 100 200 200 200 100 0
Now you can traverse this array to find max which is 200.
traversing the array will take O(N) time and updating the two indices for each query will take O(1)* number of queries(m)
overall complexity=O(N)+O(M)
= O(N+M)
it means = (10^7+10^5) which is less than 10^8 (per second)
Note: If searching for video tutorial , you must check it out here for detailed explanation.
I think the trick is not to actually perform the manipulations on arrays.
You can simply track the changes in index-intervals.
Keep a sorted list of intervals ( sorted by begin-index).
e.g. Input: Internal representation
5 3 NOTHING TO DO
1 2 100 [1 2 value 100]
2 5 100 [1 1 value 100][2 2 value 200(100+100)][3 5 value 100]
3 4 100 [1 1 value 100][2 2 value 200(100+100)][3 4 value 200(100+100)][5 5 value 100]
as an optimization you could merge intervals with same value
-> [1 1 value 100][2 4 value 200][5 5 value 100]
In the last step you iterate through your intervals and take the highest value.

Math.round(Math.random()) combination seems to be generating too many 0's in my javascript programme

I am trying to make a really simple script to generate a 2d array in which every nth array does not contain it's own index as an element but contains a random amount of other randomised index values as their elements and can not be empty. The following is a little code I wrote to attempt to achieve this:
totElList = []
numEls = 1000
for (i=0;i<numEls;i++) {
totElList[i] = []
for (j=0;j<numEls;j++) {
totElList[i][j] = j
}
}
for (i in totElList) {
totsplice = Math.round(Math.random()*(numEls-1))
totElList[i].splice(i,1)
for (j=0;j<totsplice;j++) {
rand = Math.round(Math.random()*totElList[i].length)
while (typeof(totElList[i][rand]) === undefined) {rand = Math.round(Math.random()*totElList[i].length)}
totElList[i].splice(rand,1)
}
}
The problem is when I run this the totElList array seems to contain more 0's than any other number even though I assumed elements would be removed at random. I did a test to confirm this. The amount of 0's is always the maximum out of all possible values for a given numEls. I am guessing this is something to do with the workings of Math.random() and Math.round() but I am unsure. Could somebody please give me some insight? Thank you.
Instead of Math.round, take Math.floor which works better for indices, because it is zero based.
Example with one digit and a factor of 2 as index for an array with length of 2.
As you see, the second half index is moved to the next index of every number, which results into a wrong index and with the greatest random value, you get an index outside of the wanted length.
value round floor comment for round
----- ----- ----- -----------------
0.0 0 0
0.1 0 0
0.2 0 0
0.3 0 0
0.4 0 0
0.5 1 0 wrong index
0.6 1 0 wrong index
0.7 1 0 wrong index
0.8 1 0 wrong index
0.9 1 0 wrong index
1.0 1 1
1.1 1 1
1.2 1 1
1.3 1 1
1.4 1 1
1.5 2 1 wrong index
1.6 2 1 wrong index
1.7 2 1 wrong index
1.8 2 1 wrong index
1.9 2 1 wrong index

JavaScript random number one/zero implementation

Hi I found this piece of JS code which generates zero or one: I don't understand how the pipe (ORing) is involved here?
var randomNum = ((Math.random () * 2 | 0) + 1) - 1; // random number between 0 and 1​
I found another way
Math.floor(Math.random()*2)
which accomplishes the same goal. Which one is preferred?
"I don't understand how the pipe (ORing) is involved here?"
The pipe is the bitwise OR operator, and is just used here as a short way to get rid of the fractional part of the random number.
So the random number generates something from 0 to 1.9999999999, and dropping the decimal gives you 0 or 1.
"Which one is preferred?"
I'd say clarity if preferred in your general code, so Math.floor().
You could also do this:
var randomNum = Math.random() < 0.5 ? 0 : 1;
You could use Math.round(Math.random()), which makes rounding and returns zero and one only. It is equally distributed.
var i = 1e7,
count = [0, 0];
while (i--) {
count[Math.round(Math.random())]++;
}
console.log(count);

What is the variable result doing in this javascript function

In the code below, if result is set to one, the code returns a number 1024 (2 to the power of 10). If result is set to 2, the code returns the number 2048 (or 2 to the power of 11), BUT if result is set to 3, the code doesn`t return the number 4096 (as I would expect, because 2 to the power of 12) but rather 3072. Why does it return 3072 if "result" is set to 3, but otherwise if set to 1 and 2 it followers the pattern of power to the 10th and 11th
function power(base, exponent) {
var result = 1;
for (var count = 0; count < exponent; count++)
result *= base;
return result;
}
show(power(2, 10));
In this code, result is used as an accumulator. The code computes result * base**exponent; result is not part of the exponent.
I think you're confusing yourself. The thing being raised to some power is your first parameter (in your example 2). The exponent is the 2nd parameter (10). Your example works correctly. Pass in 12 as your exponent and you will see your expected result.
The result parameter is just a place holder in which you accumulate your results (2x2x2x2....)
The numbers doing all the work are the ones you pass in as parameters to the function.
So, in your example, you're going to take 2 and multiply it by itself 10 times, each time storing the cumulative result in the variable "result". The thing that's throwing you off is that result is initially set to 1. This is never meant to be altered. It's built that way so that if you set your exponent to 0 you will end up with a result of 1 (because any number raised to the zeroth power = 1).
Anyway... don't worry about what result is set to. Focus on how the loop works with the interchangeable variable values that are being passed in, in the function call.
Do out the multiplication:
3 * 2 = 6
6 * 2 = 16
12 * 2 = 24
24 * 2 = 48
48 * 2 = 96
96 * 2 = 192
192 * 2 = 384
384 * 2 = 768
768 * 2 = 1536
1536 * 2 = 3072
Multiplication table perhaps? Also, what's wrong with Math.pow, just trying to accomplish it yourself?

Categories

Resources