I'm looking for a way to influence Math.random().
I have this function to generate a number from min to max:
var rand = function(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
Is there a way to make it more likely to get a low and high number than a number in the middle?
For example; rand(0, 10) would return more of 0,1,9,10 than the rest.
Is there a way to make it more likely to get a low and high number than a number in the middle?
Yes. You want to change the distribution of the numbers generated.
http://en.wikipedia.org/wiki/Random_number_generation#Generation_from_a_probability_distribution
One simple solution would be to generate an array with say, 100 elements.
In those 100 elements represent the numbers you are interested in more frequently.
As a simple example, say you wanted number 1 and 10 to show up more frequently, you could overrepresent it in the array. ie. have number one in the array 20 times, number 10 in the array 20 times, and the rest of the numbers in there distributed evenly. Then use a random number between 0-100 as the array index. This will increase your probability of getting a 1 or a 10 versus the other numbers.
You need a distribution map. Mapping from random output [0,1] to your desired distribution outcome. like [0,.3] will yield 0, [.3,.5] will yield 1, and so on.
Sure. It's not entirely clear whether you want a smooth rolloff so (for example) 2 and 8 are returned more often than 5 or 6, but the general idea works either way.
The typical way to do this is to generate a larger range of numbers than you'll output. For example, lets start with 5 as the base line occurring with frequency N. Let's assume that you want 4 or 7 to occur at frequency 2N, 3 or 8 at frequency 3N, 2 or 9 and frequency 4N and 0 or 10 at frequency 5N.
Adding those up, we need values from 1 to 29 (or 0 to 28, or whatever) from the generator. Any of the first 5 gives an output of 0. Any of the next 4 gives and output of 1. Any of the next 3 gives an output of 2, and so on.
Of course, this doesn't change the values returned by the original generator -- it just lets us write a generator of our own that produces numbers following the distribution we've chosen.
Not really. There is a sequence of numbers that are generated based off the seed. Your random numbers come from the sequence. When you call random, you are grabbing the next element of the sequence.
Can you influence the output of Math.random in javascript (which runs client side)?
No. At least not in any feasible/practical manner.
But what you could do is to create your own random number generator that produces number in the distribution that you need.
There are probably an infinite number of ways of doing it, and you might want to think about the exact shape/curvature of the probability function.
It can be probably be done in one line, but here is a multi-line approach that uses your existing function definition (named rand, here):
var dd = rand(1,5) + rand(0,5);
var result;
if (dd > 5)
result = dd - 5;
else result = 6 - dd;
One basic result is that if U is a random variable with uniform distribution and F is the cumulative distribution you want to sample from, then Y = G(X) where G is the inverse of F has F as its cumulative distribution. This might not necessarily be the most efficient way of doing and generating random numbers from all sort of distributions is a research subfield in and of itself. But for a simple transformation it might just do the trick. Like in your case, F(x) could be 4*(x-.5)^3+.5, it seems to satisfy all constraints and is easy to invert and use as a transformation of the basic random number generator.
Related
I am trying to find the first number in the Fibonacci sequence to contain over 1000 digits.
Given a number n (e.g. 4), I found a way to find what place the first number with n-digits has in the Fibonacci sequence as well as a way to find the number given its place in the sequence.
Say, for example, you need to know the first number with 4 digits in the Fibonacci sequence as well as its place in the sequence. My code would work like this:
var phi = (1+Math.sqrt(5))/2;
var nDigits = 4;
var fEntry = Math.ceil(2 + Math.log(Math.pow(10, nDigits-
1))/Math.log(phi));
var fNumber = 2 * Math.pow(phi, fEntry);
console.log(fEntry);
console.log(fNumber);
In the console you would see fEntry (that is, the place the number has in the Fibonacci sequence) and fNumber (the number you're looking for). If you want to find the first number with 4 digits and its place in the sequence, for example, you'll get number 1597 at place 17, which is correct.
So far so good.
Problems arise when I want to find big numbers. I need to find the first number with 1000 digits in the Fibonacci sequence, but when I write nDigits = 1000 and run the code, the console displays "Infinity" for fEntry and for fNumber. I guess the reason is that my code involves calculations with numbers higher than what Javascript can deal with.
How can I find that number and avoid Infinity?
How can I find that number and avoid Infinity?
You can't, with the number type. Although it can hold massive values, it loses integer accuracy after Number.MAX_SAFE_INTEGER (9,007,199,254,740,991):
const a = Number.MAX_SAFE_INTEGER;
console.log(a); // 9007199254740991
console.log(a + 1); // 9007199254740992, so far so good
console.log(a + 2); // 9007199254740992, oh dear...
You can use the new BigInt on platforms that support it. Alternately, any of several "big int" libraries that store the numbers as strings of digits (literally).
I have this simple test in nodejs, I left it running overnight and could not get Math.random() to repeat. I realize that sooner or later the values (or even the whole sequence) will repeat, but is there any reasonable expectancy as to when it is going to happen?
let v = {};
for (let i = 0;; i++) {
let r = Math.random();
if (r in v) break;
v[r] = r;
}
console.log(i);
It is browser specific:
https://www.ecma-international.org/ecma-262/6.0/#sec-math.random
20.2.2.27
Math.random ( ) Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo
randomly with approximately uniform distribution over that range,
using an implementation-dependent algorithm or strategy. This function
takes no arguments.
Each Math.random function created for distinct code Realms must
produce a distinct sequence of values from successive calls.
The requirement here is just pseudo-random with uniform distribution.
Here's a blog post from V8 (Chrome and NodeJs's Javascript Engine).
https://v8.dev/blog/math-random
Where they say they are using xorshift128+, which has a maximal period of 2^128 -1.
Related (on another site): Acceptable to rely on random ints being unique?
Also extremely related: How many double numbers are there between 0.0 and 1.0?
Mathematically, there are an infinite number of real numbers between 0 and 1. However, there are only a finite number of possible values that Math.Random could generate (because computers only have a finite number of bits to represent numbers). Let's say that there are N possible values that it could generate. Then, by the Pigeonhole Principle, there is a 100% chance of getting at least one duplicate value once you generate exactly N + 1 values.
At this point, the Birthday Paradox demonstrates that you should start seeing duplicates surprisingly quickly. According to this "paradox" (which isn't a true paradox, just counterintuitive), given a room with only 23 people, there's a greater than 50% chance of two of them having the same birthday.
Returning to our example, the rule of thumb for calculating this (see the linked Wikipedia article) suggests that Math.Random reaches a 50% probability of duplicates once you generate approximately sqrt(N) numbers.
From the linked Stack Overflow question, if we assume that there are 7,036,874,417,766 numbers between 0 and 1 like the accepted answer says (and please read the linked question for a more detailed explanation of how many there actually are), then sqrt(7036874417766) is just over 2.652 million, which isn't actually all that many. If you are generating 10,000 random numbers per second, you'd reach 50% probability in approximately 737 hours, which is just under 31 days. Less fortunately, even at 10,000 per second, it would take approximately 195,468 hours (which is approximately 22.3 years) to reach 100% probability.
Some of the other answers give much higher figures for how many numbers there are, so take your pick.
I have an array that could be any size and I'm interested in a fixed number of elements, but evenly distributed. So say I want 1000 elements. If there are 1000 or less elements in the array, I want them all, so that's easy. If there were 2000 elements in the array, I would want every second one. 3000 elements and I'd want every third one... and so on.
I'm having trouble implementing this. I thought of using modulo and this works for round numbers. e.g. 2000 / 1000 = 2 so a simple if (n % 2 == 0) works here (so skip each element where it evaluates true). Likewise for 1200: the excess is 200 so 1200 / 200 = 6 so if (n % 6 == 0) also works.
However, it breaks down when the number of elements divided by the excess isn't a nice round number:
elements = 2788
limit = 1000
factor = 2788 / (2788 - 1000) = 1.559288
Here the modulo operator won't work as n % 1.55928 is never going to give a remainder of zero. Does anyone know how to achieve this? I'm using JavaScript.
Thanks!
What you want is a uniform discrete distribution. Because the number of elements is not divisible by the limit, you would want some flexibility in the limit to keep it evenly distributed. To do so, you could just perform the sampling in elements/limit intervals, rounding the result to its nearest integer.
If your limit is flexible and you can cover more ground by finding the biggest possible limit (in this case, 697).
I want to normalize an array so that each value is
in [0-1) .. i.e. "the max will never be 1 but the min can be 0."
This is not unlike the random function returning numbers in the same range.
While looking at this, I found that .99999999999999999===1 is true!
Ditto (1-Number.MIN_VALUE) === 1 But Math.ceil(Number.MIN_VALUE) is 1, as it should be.
Some others: Math.floor(.999999999999) is 0
while Math.floor(.99999999999999999) is 1
OK so there are rounding problems in JS.
Is there any way I can normalize a set of numbers to lie in the range [0,1)?
It may help to examine the steps that JavaScript performs of each of your expressions.
In .99999999999999999===1:
The source text .99999999999999999 is converted to a Number. The closest Number is 1, so that is the result. (The next closest Number is 0.99999999999999988897769753748434595763683319091796875, which is 1–2–53.)
Then 1 is compared to 1. The result is true.
In (1-Number.MIN_VALUE) === 1:
Number.MIN_VALUE is 2–1074, about 5e–304.
1–2–1074 is extremely close to one. The exact value cannot be represented as a Number, so the nearest value is used. Again, the nearest value is 1.
Then 1 is compared to 1. The result is true.
In Math.ceil(Number.MIN_VALUE):
Number.MIN_VALUE is 2–1074, about 5e–304.
The ceiling function of that value is 1.
In Math.floor(.999999999999):
The source text .999999999999 is converted to a Number. The closest Number is 0.99999999999900002212172012150404043495655059814453125, so that is the result.
The floor function of that value is 0.
In Math.floor(.99999999999999999):
The source text .99999999999999999 is converted to a Number. The closest Number is 1, so that is the result.
The floor function of 1 is 1.
There are only two surprising things here, at most. One is that the numerals in the source text are converted to internal Number values. But this should not be surprising. Of course text has to be converted to internal representations of numbers, and the Number type cannot perfectly store all the infinitely many numbers. So it has to round. And of course numbers very near 1 round to 1.
The other possibly surprising thing is that 1-Number.MIN_VALUE is 1. But this is actually the same issue: The exact result is not representable, but it is very near 1, so 1 is used.
The Math.floor function works correctly. It never introduces any error, and you do not have to do anything to guarantee that it will round down. It always does.
However, since you want to normalize numbers, it seems likely you are going to divide numbers at some point. When you divide, there may be rounding problems, because many results of division are not exactly representable, so they must be rounded.
However, that is a separate problem, and you have not given enough information in this question to address the specific calculations you plan to do. You should open a separate question for it.
Javascript will treat any number between 0.999999999999999994 and 1 as 1, so just subtract .000000000000000006.
Of course that's not as easy as it sounds, since .000000000000000006 is evaluated as 0 in Javascript, so you could do something like:
function trueFloor(x)
{
x = x * 100;
if(x > .0000000000000006)
x = x - .0000000000000006;
x = Math.floor(x/100);
return x;
}
EDIT: Or at least you'd think you could. Apparently JS casts .99999999999999999 to 1 before passing it to a function, so you'd have to try something like:
trueFloor("0.99999999999999999")
function trueFloor(str)
{
x=str.substring(0,9) + 0;
return Math.floor(x); //=> 0
}
Not sure why you'd need that level of precision, but in theory, I guess it works. You can see a working fiddle here
As long as you cast your insanely precise float as a string, that's probably your best bet.
Please understand one thing: this...
.999999999999999999
... is just a Number literal. Just as
.999999999999999998
.999999999999999997
.999999999999999996
...
... you see the pattern.
How JavaScript treats these literals is completely another story. And yes, this treatment is limited by the number of bits that can be used to store a Number value.
The number of possible floating point literals is infinite by definition - no matter how small is the range set for them. For example, take the ones shown above: how many of numbers very close to 1 you may express? Right, it's infinite: just keep appending 9 to the line.
But the container for each Number value is quite finite: it has 64 bits. That means, it can store 2^64 different values (Infinite, -Infinite and NaN among them) - and that's all.
You want to work with such literals anyway? Use Strings to store them, not Numbers - and some BigMath JS library (take your pick) to work with those values - as Strings, again.
But from your question it looks like you're not, as you talked about array of Numbers - Number values, that is. And in no way there can be .999999999999999999 stored there, as there is no such Number value in JavaScript.
What are some simple ways to hash a 32-bit integer (e.g. IP address, e.g. Unix time_t, etc.) down to a 16-bit integer?
E.g. hash_32b_to_16b(0x12345678) might return 0xABCD.
Let's start with this as a horrible but functional example solution:
function hash_32b_to_16b(val32b) {
return val32b % 0xffff;
}
Question is specifically about JavaScript, but feel free to add any language-neutral solutions, preferably without using library functions.
The context for this question is generating unique IDs (e.g. a 64-bit ID might be composed of several 16-bit hashes of various 32-bit values). Avoiding collisions is important.
Simple = good. Wacky+obfuscated = amusing.
The key to maximizing the preservation of entropy of some original 32-bit 'signal' is to ensure that each of the 32 input bits has an independent and equal ability to alter the value of the 16-bit output word.
Since the OP is requesting a bit-size which is exactly half of the original, the simplest way to satisfy this criteria is to xor the upper and lower halves, as others have mentioned. Using xor is optimal because—as is obvious by the definition of xor—independently flipping any one of the 32 input bits is guaranteed to change the value of the 16-bit output.
The problem becomes more interesting when you need further reduction beyond just half-the-size, say from a 32-bit input to, let's say, a 2-bit output. Remember, the goal is to preserve as much entropy from the source as possible, so solutions which involve naively masking off the two lowest bits with (i & 3) are generally heading in the wrong direction; doing that guarantees that there's no way for any bits except the unmasked bits to affect the result, and that generally means there's an arbitrary, possibly valuable part of the runtime signal which is being summarily discarded without principle.
Following from the earlier paragraph, you could of course iterate with xor three additional times to produce a 2-bit output with the desired property of being equally-influenced by each/any of the input bits. That solution is still optimally correct of course, but involves looping or multiple unrolled operations which, as it turns out, aren't necessary!
Fortunately, there is a nice technique of only two operations which gives the same optimal result for this situation. As with xor, it not only ensures that, for any given 32-bit value, twiddling any input bit will result in a change to the 2-bit output, but also that, given a uniform distribution of input values, the distribution of 2-bit output values will also be perfectly uniform. In the current example, the method divides the 4,294,967,296 possible input values into exactly 1,073,741,824 each of the four possible 2-bit hash results { 0, 1, 2, 3 }.
The method I mention here uses specific magic values that I discovered via exhaustive search, and which don't seem to be discussed very much elsewhere on the internet, at least for the particular use under discussion here (i.e., ensuring a uniform hash distribution that's maximally entropy-preserving). Curiously, according to this same exhaustive search, the magic values are in fact unique, meaning that for each of target bit-widths { 16, 8, 4, 2 }, the magic value I show below is the only value that, when used as I show here, satisfies the perfect hashing criteria outlined above.
Without further ado, the unique and mathematically optimal procedure for hashing 32-bits to n = { 16, 8, 4, 2 } is to multiply by the magic value corresponding to n (unsigned, discarding overflow), and then take the n highest bits of the result. To isolate those result bits as a hash value in the range [0 ... (2ⁿ - 1)], simply right-shift (unsigned!) the multiplication result by 32 - n bits.
The "magic" values, and C-like expression syntax are as follows:
Method
Maximum-entropy-preserving hash for reducing 32 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------ ----------- -----------------------
16 0x80008001 16 (i * 0x80008001) >> 16
8 0x80808081 24 (i * 0x80808081) >> 24
4 0x88888889 28 (i * 0x88888889) >> 28
2 0xAAAAAAAB 30 (i * 0xAAAAAAAB) >> 30
Maximum-entropy-preserving hash for reducing 64 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------------ ----------- -------------------------------
32 0x8000000080000001 32 (i * 0x8000000080000001) >> 32
16 0x8000800080008001 48 (i * 0x8000800080008001) >> 48
8 0x8080808080808081 56 (i * 0x8080808080808081) >> 56
4 0x8888888888888889 60 (i * 0x8888888888888889) >> 60
2 0xAAAAAAAAAAAAAAAB 62 (i * 0xAAAAAAAAAAAAAAAB) >> 62
Notes:
Use unsigned multiply and discard any overflow (64-bit multiply is not needed).
If isolating the result using right-shift (as shown), be sure to use an unsigned shift operation.
Further discussion
I find this all this quite cool. In practical terms, the key information-theoretical requirement is the guarantee that, for any m-bit input value and its corresponding n-bit hash value result, flipping any one of the m source bits always causes some change in the n-bit result value. Now although there are 2ⁿ possible result values in total, one of them is already "in-use" (by the result itself) since "switching" to that one from any other result would be no change at all. This leaves 2ⁿ - 1 result values that are eligible to be used by the entire set of m input values flipped by a single bit.
Let's consider an example; in fact, to show how this technique might seem to border on spooky or downright magical, we'll consider the more extreme case where m = 64 and n = 2. With 2 output bits there are four possible result values, { 0, 1, 2, 3 }. Assuming an arbitrary 64-bit input value 0x7521d9318fbdf523, we obtain its 2-bit hash value of 1:
(0x7521d9318fbdf523 * 0xAAAAAAAAAAAAAAAB) >> 62 // result --> '1'
So the result is 1 and the claim is that no value in the set of 64 values where a single-bit of 0x7521d9318fbdf523 is toggled may have that same result value. That is, none of those 64 other results can use value 1 and all must instead use either 0, 2, or 3. So in this example it seems like every one of the 2⁶⁴ input values—to the exclusion of 64 other input values—will selfishly hog one-quarter of the output space for itself. When you consider the sheer magnitude of these interacting constraints, can a simultaneously satisfying solution overall even exist?
Well sure enough, to show that (exactly?) one does, here are the hash result values, listed in order, for inputs that flipping a single bit of 0x7521d9318fbdf523 (one at a time), from MSB (position 63) down to LSB (0).
3 2 0 3 3 3 3 3 3 0 0 0 3 0 3 3 0 3 3 3 0 0 3 3 3 0 0 3 3 0 3 3 // continued…
0 0 3 0 0 3 0 3 0 0 0 3 0 3 3 3 0 3 0 3 3 3 3 3 3 0 0 0 3 0 0 3 // notice: no '1' values
As you can see, there are no 1 values, which entails that every bit in the source "as-is" must be contributing to influence the result (or, if you prefer, the de facto state of each-and-every bit in 0x7521d9318fbdf523 is essential to keeping the entire overall result from being "not-1"). Because no matter what single-bit change you make to the 64-bit input, the 2-bit result value will no longer be 1.
Keep in mind that the "missing-value" table shown above was dumped from the analysis of just the one randomly-chosen example value 0x7521d9318fbdf523; every other possible input value has a similar table of its own, each one eerily missing its owner's actual result value while yet somehow being globally consistent across its set-membership. This property essentially corresponds to maximally preserving the available entropy during the (inherently lossy) bit-width reduction task.
So we see that every one of the 2⁶⁴ possible source values independently imposes, on exactly 64 other source values, the constraint of excluding one of the possible result values. What defies my intuition about this is that there are untold quadrillions of these 64-member sets, each of whose members also belongs to 63 other, seemingly unrelated bit-twiddling sets. Yet somehow despite this most confounding puzzle of interwoven constraints, it is nevertheless trivial to exploit the one (I surmise) resolution which simultaneously satisfies them all exactly.
All this seems related to something you may have noticed in the tables above: namely, I don't see any obvious way to extend the technique to the case of compressing down to a 1-bit result. In this case, there are only two possible result values { 0, 1 }, so if any/every given (e.g.) 64-bit input value still summarily excludes its own result from being the result for all 64 of its single-bit-flip neighbors, then that now essentially imposes the other, only remaining value on those 64. The math breakdown we see in the table seems to be signalling that a simultaneous result under such conditions is a bridge too far.
In other words, the special 'information-preserving' characteristic of xor (that is, its luxuriously reliable guarantee that, as opposed to and, or, etc., it c̲a̲n̲ and w̲i̲l̲l̲ always change a bit) not surprisingly exacts a certain cost, namely, a fiercely non-negotiable demand for a certain amount of elbow room—at least 2 bits—to work with.
I think this is the best you're going to get. You could compress the code to a single line but the var's are there for now as documentation:
function hash_32b_to_16b(val32b) {
var rightBits = val32b & 0xffff; // Left-most 16 bits
var leftBits = val32b & 0xffff0000; // Right-most 16 bits
leftBits = leftBits >>> 16; // Shift the left-most 16 bits to a 16-bit value
return rightBits ^ leftBits; // XOR the left-most and right-most bits
}
Given the parameters of the problem, the best solution would have each 16-bit hash correspond to exactly 2^16 32-bit numbers. It would also IMO hash sequential 32-bit numbers differently. Unless I'm missing something, I believe this solution does those two things.
I would argue that security cannot be a consideration in this problem, as the hashed value is just too few bits. I believe that the solution I gave provides even distribution of 32-bit numbers to 16-bit hashes
This depends on the nature of the integers.
If they can contain some bit-masks, or can differ by powers of two, then simple XORs will have high probability of collisions.
You can try something like (i>>16) ^ ((i&0xffff) * p) with p being a prime number.
Security-hashes like MD5 are all good, but they are obviously an overkill here. Anything more complex than CRC16 is overkill.
I would say just apply a standard hash like sha1 or md5 and then grab the last 16 bits of that.
Assuming that you expect the least significant bits to 'vary' the most, I think you're probably going to get a good enough distribution by just using the lower 16-bits of the value as a hash.
If the numbers you're going to hash won't have that kind of distribution, then the additional step of xor-ing in the upper 16 bits might be helpful.
Of course this suggestion is if you're intending to use the hash merely for some sort of lookup/storage scheme and aren't looking for the crypto-related properties of non-guessability and non-reversability (which the xor-ing suggestions don't really buy you either).
Something simple like this....
function hash_32b_to_16b(val32b) {
var h = hmac(secretKey, sha512);
var v = val32b;
for(var i = 0; i < 4096; ++i)
v = h(v);
return v % 0xffff;
}