When does Math.random() start repeating? - javascript

I have this simple test in nodejs, I left it running overnight and could not get Math.random() to repeat. I realize that sooner or later the values (or even the whole sequence) will repeat, but is there any reasonable expectancy as to when it is going to happen?
let v = {};
for (let i = 0;; i++) {
let r = Math.random();
if (r in v) break;
v[r] = r;
}
console.log(i);

It is browser specific:
https://www.ecma-international.org/ecma-262/6.0/#sec-math.random
20.2.2.27
Math.random ( ) Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo
randomly with approximately uniform distribution over that range,
using an implementation-dependent algorithm or strategy. This function
takes no arguments.
Each Math.random function created for distinct code Realms must
produce a distinct sequence of values from successive calls.
The requirement here is just pseudo-random with uniform distribution.
Here's a blog post from V8 (Chrome and NodeJs's Javascript Engine).
https://v8.dev/blog/math-random
Where they say they are using xorshift128+, which has a maximal period of 2^128 -1.

Related (on another site): Acceptable to rely on random ints being unique?
Also extremely related: How many double numbers are there between 0.0 and 1.0?
Mathematically, there are an infinite number of real numbers between 0 and 1. However, there are only a finite number of possible values that Math.Random could generate (because computers only have a finite number of bits to represent numbers). Let's say that there are N possible values that it could generate. Then, by the Pigeonhole Principle, there is a 100% chance of getting at least one duplicate value once you generate exactly N + 1 values.
At this point, the Birthday Paradox demonstrates that you should start seeing duplicates surprisingly quickly. According to this "paradox" (which isn't a true paradox, just counterintuitive), given a room with only 23 people, there's a greater than 50% chance of two of them having the same birthday.
Returning to our example, the rule of thumb for calculating this (see the linked Wikipedia article) suggests that Math.Random reaches a 50% probability of duplicates once you generate approximately sqrt(N) numbers.
From the linked Stack Overflow question, if we assume that there are 7,036,874,417,766 numbers between 0 and 1 like the accepted answer says (and please read the linked question for a more detailed explanation of how many there actually are), then sqrt(7036874417766) is just over 2.652 million, which isn't actually all that many. If you are generating 10,000 random numbers per second, you'd reach 50% probability in approximately 737 hours, which is just under 31 days. Less fortunately, even at 10,000 per second, it would take approximately 195,468 hours (which is approximately 22.3 years) to reach 100% probability.
Some of the other answers give much higher figures for how many numbers there are, so take your pick.

Related

Can one encrypt an n-digit number, returning a unique n-digit number?

StackOverflow warns me that I may be down-voted for this question, but I'd appreciate your not doing so, as I post this simply to try to understand a programming exercise I've been posed with, and over which I've been puzzling a while now.
I'm doing some javascript coding exercises and one of the assignments was to devise an "encryption function", encipher, which would encrypt a 4-digit number by multiplying it by a number sufficiently low such that none of its digits exceeds 9, so that a 4-digit number is returned. Thus
encipher(0204)
might yield
0408
where the multiplier would have been 2. -- This is very basic material, simply to practice the Javascript. -- But as far as I can see, the numbers returned can never be deciphered (which is the next part of the exercise). Even if you store a dictionary internal to encipher, along the lines of
{'0408':'2'}, etc
so that you could do a lookup on 0408 and return 0204, these entries could not be assured to be unique. If one for example were to get the number 9999 to be deciphered, one would never know whether the original number was 9999 (multiplied by 1), 3333 (multiplied by 3) or 1111 (multiplied by 9). Is that correct? I realise this is a fairly silly and artificial problem, but I'm trying to understand if the instructions to the exercise are not quite right, or if I'm missing something. Here is the original problem:
Now, let's add one more level of security. After changing the position of the digits, we will multiply each member by a number whose multiplication does not exceed 10. (If it is higher than 10, we will get a two-digit multiplication and the code will no longer be 4 values). Now, implement in another function the decrypter (), which will receive as an argument an encrypted code (and correspondingly multiplied in the section above and return the decrypted code.
Leaving the exercise behind, I'm just curious whether there exists any way to "encrypt" (when I say "encrypt", I mean at a moderate javascript level, as I'm not a cryptography expert) an n-digit number and return a unique n-digit number?
Thanks for any insights. --
encrypt a 4-digit number by multiplying it by a number sufficiently low such that none of its digits exceeds 9, so that a 4-digit number is returned
If your input is 9999, there is no integer other than 1 or 0 that you can multiply your input by and get a positive number with a maximum of 4 digits. Therefore, there is no solution that involves only integer multiplication. However, integer multiplication can be used as part of an algorithm such as rotating digits (see below).
If instead you're looking for some sort of bijective algorithm (one that uniquely maps A to B and B to A), you can look at something like rotating the digits left or right, reversing the order of the digits, or using a unique mapping of each individual digit to another. Those can also be mixed.
Examples
Rotate
1234 -> 2341
Reverse
1234 -> 4321
Remap digits e.g. 2 mapped to 8, 3 mapped to 1
2323 -> 8181
Note that none of these are cryptographically sound methods to encrypt information, but they do seem to more-or-less meet the objectives of the exercise.

Odds to get the usually excluded upper-bound with Math.random()

This may look more like a math question but as it is exclusively linked to Javascript's pseudo-random number generator I guess it is a good fit for SO. If not, feel free to move it elsewhere.
First off, I'm aware that ES does not specify the algorithm to be used in the pseudo-random number generator - Math.random() -, but it does specify that the range should have an approximate uniform distribution:
15.8.2.14 random ( )
Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments.
So far, so good. Now I've recently stumbled upon this piece of data from MDN:
Note that as numbers in JavaScript are IEEE 754 floating point numbers with round-to-nearest-even behavior, these ranges, excluding the one for Math.random() itself, aren't exact, and depending on the bounds it's possible in extremely rare cases (on the order of 1 in 2^62) to calculate the usually-excluded upper bound.
Okay. It led me to some testing, the results are (obviously) the same on Chrome console and Firefox's Firebug:
>> 0.99999999999999995
1
>> 0.999999999999999945
1
>> 0.999999999999999944
0.9999999999999999
Let's put it in a simple practical example to make my question more clear:
Math.floor(Math.random() * 1)
Considering the code above, IEEE 754 floating point numbers with round-to-nearest-even behavior, under the assessment of Math.random() range being evenly distributed, I concluded that the odds for it to return the usually excluded upper bound (1 in my code above) would be 0.000000000000000055555..., that is approximately 1/18,000,000,000,000,000.
Looking at the MDN number now, 1/2^62 evaluates to 1/4,611,686,018,427,387,904, that is, over 200 times smaller than the result from my calc.
Am I doing the wrong math? Is Firefox's pseudo-random number generator just not evenly distributed enough as to generate this 200 times difference?
I know how to work around this and I'm aware that such small odds shouldn't even be considered for every day's uses, but I'd love to understand what is going on here and if my math is broken or Mozilla's (I hope it is former). =] Any input is appreciated.
You should not be worried about rounding the number from Math.random() up to 1.
When I was looking at the implementation (inferred from results I am getting) in the current versions of IE, Chrome, and FF, there are several observations that almost certainly mean that you should always get a number in the interval 0 to 0.11111111111111111111111111111111111111111111111111111 in binary (which is 0.999999999999999944.toString(2) and a few smaller decimal numbers too btw.).
Chrome: Here it is simple. It generates numbers by generating 32 bit number and dividing it by 1 << 32. (You can see that (1 << 30) * 4 * Math.random() always return a whole number).
FF: Here it seems that the number is always generated to be at most the 0.11... (53x 1) and it really uses just those 53 decimal places. (You can see that Math.random().toString(2).length - 2 does not return more than 53).
IE: Here it is very similar to FF, except that the number of places can be more if the first digits after a decimal dot are 0 and those will not round to 1 for sure. (You can see that Math.random().toString(2).match(/1[01]*$/)[0].length does not return more than 53).
I think (although I cannot provide any proof now) that any implementation should fall to one of those described groups and have no problem with rounding to 1.

True or better Random numbers with Javascript [duplicate]

This question already has answers here:
Secure random numbers in javascript?
(10 answers)
Closed 12 months ago.
I have all kinds of resources that rely on javascript random numbers. However, I've been seeing a lot of problems where random isn't so random because of the way I'm generating random numbers.
Is there any javascript resource for me to generate true, or just better random numbers?
I know that I can interface with Random.org, but what other options do I have?
I'm using:
function rand( lowest, highest){
var adjustedHigh = (highest - lowest) + 1;
return Math.floor(Math.random()*adjustedHigh) + parseFloat(lowest);
}
Assuming you're not just seeing patterns where there aren't any, try a Mersenee Twister (Wikipedia article here). There are various implementations like this one on github.
Similar SO question:
Seedable JavaScript random number generator
If you want something closer to truly random, then consider using the random.org API to get truly random numbers, although I would suggest only using that to seed, not for every number, as you need to abide by their usage limits.
Tweaking numbers so they "look random"
I agree with Phil H that humans are so good at finding patterns that they often think they see patterns even in "perfectly random" sequences of numbers (clustering illusion, apophenia, gambler's fallacy, etc).
Plots of true random positions generally have lots of clumps and points that "coincidentally" fall very close together, which looks pretty suspicious.
Artists often take completely randomly generated patterns and "nudge" them to make them appear "more random", even though that careful nudging actually makes the pattern less random (a), (b), (c), (d), etc.
Alternatively, a low-discrepancy sequence sometimes "looks better" than a true random sequence and is much faster to generate.
Fast random number generators
There are many "random number generators" across a whole spectrum from "extremely fast" to "relatively slow" and from "easy for even a human to see patterns" to "unlikely that unassisted humans could ever see any patterns" to "cryptographically secure and, after seeded with adequate amounts of entropy, as far as we can tell, indistinguishable from random to any attacker using less than all the energy produced by humanity for a month."
Non-cryptographic-strength random number generators that still give excellent output (unlikely that unassisted humans could ever see any patterns) include the Mersenne twister, multiply-with-carry, Lagged Fibonacci generator, Well equidistributed long-period linear, Xorshift, etc.
Cryptographic random number techniques that work with some browsers
I hear that Cryptocat and other JavaScript applications use the convenient window.crypto.getRandomValues() or window.msCrypto.getRandomValues() or SubtleCrypto.generateKey() functions that are designed to generate cryptographic random numbers. Unfortunately, that function is not available in IE 11 and below.
Since web browsers use random numbers all the time (for every "https://" page they fetch), it's quite likely that these functions (where available) may run faster than most random number generators written in JavaScript -- even non-cryptographic algorithms.
Cryptographic random number techniques compatible with ancient and modern browsers
One way to generate true random numbers in JavaScript is to capture mouse events and add them into a pool of entropy, keeping track of some (hopefully conservative) estimate of the entropy added. Once the pool is "full" (estimates indicate that at least 128 bits of entropy have been added), use some cryptographically secure random number generator to generate random numbers from the pool -- typically by using a one-way hash so that a sequence of a few thousand output numbers are not enough to deduce the state of the entropy pool and hence predict the next output number.
One implementation: http://lightsecond.com/passphrase.html
Further reading
window.crypto
Compatibility of window.crypto.getRandomValues()
Secure random numbers in javascript?
https://security.stackexchange.com/questions/20029/generate-cryptographically-strong-pseudorandom-numbers-in-javascript
Is there any built in browser support for crypto random numbers in IE and Webkit? Firefox has window.crypto
Better random function in JavaScript
While looking for an alternative for Math.random I stumbled on this question.
While those are valid answers, the solution that worked for me was simply using Math.random twice.
And use a modulus on the decimals of the float.
Basically to increase the randomness.
Maybe it might be usefull for some who were guided by google to this question.
Here's a snippet with the function, and one that runs it a million times.
function rand(min, max){
return (Math.floor(Math.pow(10,14)*Math.random()*Math.random())%(max-min+1))+min;
}
// testing rand
function rollRands(min, max, rolls) {
let roll = 0, n = 0;
let counts = {};
for(let i = min; i <= max; i++){
counts[i]=0
}
while (roll < rolls){
roll++;
counts[rand(min,max)]++;
}
return counts;
}
console.log(rollRands(36, 42, 1000000));
Rando.js is cryptographically secure. It's basically window.crypto.getRandomValues() that uses window.msCrypto.getRandomValues() as a failsafe and Math.random() as a last resort failsafe, but it's easier to implement and use. Here's a basic cryptographically secure random [0, 1) number:
console.log(rando());
<script src="https://randojs.com/2.0.0.js"></script>
Nice and easy. If that's all you wanted, you're good to go. If you want it to do more for you, it's also capable of all this:
console.log(rando(5)); //an integer between 0 and 5 (could be 0 or 5));
console.log(rando(5, 10)); //a random integer between 5 and 10 (could be 5 or 10));
console.log(rando(5, "float")); //a floating-point number between 0 and 5 (could be exactly 0, but never exactly 5));
console.log(rando(5, 10, "float")); //a floating-point number between 5 and 10 (could be exactly 5, but never exactly 10));
console.log(rando(true, false)); //either true or false
console.log(rando(["a", "b"])); //{index:..., value:...} object representing a value of the provided array OR false if array is empty
console.log(rando({a: 1, b: 2})); //{key:..., value:...} object representing a property of the provided object OR false if object has no properties
console.log(rando("Gee willikers!")); //a character from the provided string OR false if the string is empty. Reoccurring characters will naturally form a more likely return value
console.log(rando(null)); //ANY invalid arguments return false
//Prevent repetitions by grabbing a sequence and looping through it
console.log(randoSequence(5)); //an array of integers from 0 through 5 in random order
console.log(randoSequence(5, 10)); //an array of integers from 5 through 10 in random order
console.log(randoSequence(["a", "b"])); //an array of {index:..., value:...} objects representing the values of the provided array in random order
console.log(randoSequence({a: 1, b: 2})); //an array of {key:..., value:...} objects representing the properties of the provided object in random order
console.log(randoSequence("Good gravy!")); //an array of the characters of the provided string in random order
console.log(randoSequence(null)); //ANY invalid arguments return false
<script src="https://randojs.com/2.0.0.js"></script>
It supports working with jQuery elements too, but I left that out of this demo so I wouldn't have to source in jQuery. If you need that, just check it out on the GitHub or website.
There seems to be slight confusion here between two things that are very different:
random numbers;
pseudorandom numbers.
Apologies to those who know this already, but the two are worlds apart. Pseudorandom numbers appear random and may even pass sophisticated tests of randomness, but they are deterministic. Because of this, they are useless for cryptography, and may have other flaws where true randomness is required.
True randomness is non-deterministic, and thus unpredictable. A key concept here is one of entropy, or the amount of non-redundant information contained. There is a limited number of ways of obtaining truly random data. 'Good' sources are:
Radioactive decay — often difficult to do;
Background radio noise — contrary to popular believe, this is mostly not related to the background microwave radiation from the big bang, but more parochial;
The noise from electrons moving across a reverse-biased Zener diode: actually quite useful and easy to implement in practice, with simple circuitry.
Other 'sources of randomness' like mouse movements and internal variations in computer disk timing, etc. are often harnessed, but may be less-than-perfect. Generally, I've found that it's easier to access the entropy pool on a Linux system than under Windows, but this may just be personal bias.
If you just want random-appearing numbers, then yes, using the Mersenne twister is a viable option. It jumps around like crazy. If you're generating random numbers to use as e.g. version 4 UUIDs, then you need to be more careful. You can't simply 'add entropy' if it's not there, even by applying deterministic cryptographic functions.
If you intend to use your randomness for cryptography, you should also be intensely aware of the many ways your source of randomness can become compromised. For example, if you're using an Internet-based 'source of randomness', who can tap into this?
Even better, you can use quantum cryptography to generate randomness that is very hard to predict. You can use the ANU Quantum Random Numbers API for some randomness that is coercible into a number similarly output by Math.random.
you can generate a pool of random numbers just by requesting some data asynchronously because performance.now() gives you time precision up to microseconds. Then use the response time as a salt in a randomising algorithm,
var randomNumbers = [];
for(var i = 0; i < 10; i++) {
setTimeout(function () {
var timeStart = performance.now();
xhttp = new XMLHttpRequest();
xhttp.open('GET', 'https://cdn.polyfill.io/v2/polyfill.min.js?rand=' + Math.random(), true);
xhttp.onload = function () {
var timeEnd = performance.now() - timeStart;
var rNumber = parseInt(timeEnd.toString().replace('.', ''));
randomNumbers.push(rNumber)
};
xhttp.send();
}, i * 10);
}
There are many factors that will affect this time:
browser speed
route one way
server response time
route back
It's not good to generate millions of numbers this way but a few. Maybe concatenate a few results to get a good, long random number.

Influence Math.random()

I'm looking for a way to influence Math.random().
I have this function to generate a number from min to max:
var rand = function(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
Is there a way to make it more likely to get a low and high number than a number in the middle?
For example; rand(0, 10) would return more of 0,1,9,10 than the rest.
Is there a way to make it more likely to get a low and high number than a number in the middle?
Yes. You want to change the distribution of the numbers generated.
http://en.wikipedia.org/wiki/Random_number_generation#Generation_from_a_probability_distribution
One simple solution would be to generate an array with say, 100 elements.
In those 100 elements represent the numbers you are interested in more frequently.
As a simple example, say you wanted number 1 and 10 to show up more frequently, you could overrepresent it in the array. ie. have number one in the array 20 times, number 10 in the array 20 times, and the rest of the numbers in there distributed evenly. Then use a random number between 0-100 as the array index. This will increase your probability of getting a 1 or a 10 versus the other numbers.
You need a distribution map. Mapping from random output [0,1] to your desired distribution outcome. like [0,.3] will yield 0, [.3,.5] will yield 1, and so on.
Sure. It's not entirely clear whether you want a smooth rolloff so (for example) 2 and 8 are returned more often than 5 or 6, but the general idea works either way.
The typical way to do this is to generate a larger range of numbers than you'll output. For example, lets start with 5 as the base line occurring with frequency N. Let's assume that you want 4 or 7 to occur at frequency 2N, 3 or 8 at frequency 3N, 2 or 9 and frequency 4N and 0 or 10 at frequency 5N.
Adding those up, we need values from 1 to 29 (or 0 to 28, or whatever) from the generator. Any of the first 5 gives an output of 0. Any of the next 4 gives and output of 1. Any of the next 3 gives an output of 2, and so on.
Of course, this doesn't change the values returned by the original generator -- it just lets us write a generator of our own that produces numbers following the distribution we've chosen.
Not really. There is a sequence of numbers that are generated based off the seed. Your random numbers come from the sequence. When you call random, you are grabbing the next element of the sequence.
Can you influence the output of Math.random in javascript (which runs client side)?
No. At least not in any feasible/practical manner.
But what you could do is to create your own random number generator that produces number in the distribution that you need.
There are probably an infinite number of ways of doing it, and you might want to think about the exact shape/curvature of the probability function.
It can be probably be done in one line, but here is a multi-line approach that uses your existing function definition (named rand, here):
var dd = rand(1,5) + rand(0,5);
var result;
if (dd > 5)
result = dd - 5;
else result = 6 - dd;
One basic result is that if U is a random variable with uniform distribution and F is the cumulative distribution you want to sample from, then Y = G(X) where G is the inverse of F has F as its cumulative distribution. This might not necessarily be the most efficient way of doing and generating random numbers from all sort of distributions is a research subfield in and of itself. But for a simple transformation it might just do the trick. Like in your case, F(x) could be 4*(x-.5)^3+.5, it seems to satisfy all constraints and is easy to invert and use as a transformation of the basic random number generator.

Hash 32bit int to 16bit int?

What are some simple ways to hash a 32-bit integer (e.g. IP address, e.g. Unix time_t, etc.) down to a 16-bit integer?
E.g. hash_32b_to_16b(0x12345678) might return 0xABCD.
Let's start with this as a horrible but functional example solution:
function hash_32b_to_16b(val32b) {
return val32b % 0xffff;
}
Question is specifically about JavaScript, but feel free to add any language-neutral solutions, preferably without using library functions.
The context for this question is generating unique IDs (e.g. a 64-bit ID might be composed of several 16-bit hashes of various 32-bit values). Avoiding collisions is important.
Simple = good. Wacky+obfuscated = amusing.
The key to maximizing the preservation of entropy of some original 32-bit 'signal' is to ensure that each of the 32 input bits has an independent and equal ability to alter the value of the 16-bit output word.
Since the OP is requesting a bit-size which is exactly half of the original, the simplest way to satisfy this criteria is to xor the upper and lower halves, as others have mentioned. Using xor is optimal because—as is obvious by the definition of xor—independently flipping any one of the 32 input bits is guaranteed to change the value of the 16-bit output.
The problem becomes more interesting when you need further reduction beyond just half-the-size, say from a 32-bit input to, let's say, a 2-bit output. Remember, the goal is to preserve as much entropy from the source as possible, so solutions which involve naively masking off the two lowest bits with (i & 3) are generally heading in the wrong direction; doing that guarantees that there's no way for any bits except the unmasked bits to affect the result, and that generally means there's an arbitrary, possibly valuable part of the runtime signal which is being summarily discarded without principle.
Following from the earlier paragraph, you could of course iterate with xor three additional times to produce a 2-bit output with the desired property of being equally-influenced by each/any of the input bits. That solution is still optimally correct of course, but involves looping or multiple unrolled operations which, as it turns out, aren't necessary!
Fortunately, there is a nice technique of only two operations which gives the same optimal result for this situation. As with xor, it not only ensures that, for any given 32-bit value, twiddling any input bit will result in a change to the 2-bit output, but also that, given a uniform distribution of input values, the distribution of 2-bit output values will also be perfectly uniform. In the current example, the method divides the 4,294,967,296 possible input values into exactly 1,073,741,824 each of the four possible 2-bit hash results { 0, 1, 2, 3 }.
The method I mention here uses specific magic values that I discovered via exhaustive search, and which don't seem to be discussed very much elsewhere on the internet, at least for the particular use under discussion here (i.e., ensuring a uniform hash distribution that's maximally entropy-preserving). Curiously, according to this same exhaustive search, the magic values are in fact unique, meaning that for each of target bit-widths { 16, 8, 4, 2 }, the magic value I show below is the only value that, when used as I show here, satisfies the perfect hashing criteria outlined above.
Without further ado, the unique and mathematically optimal procedure for hashing 32-bits to n = { 16, 8, 4, 2 } is to multiply by the magic value corresponding to n (unsigned, discarding overflow), and then take the n highest bits of the result. To isolate those result bits as a hash value in the range [0 ... (2ⁿ - 1)], simply right-shift (unsigned!) the multiplication result by 32 - n bits.
The "magic" values, and C-like expression syntax are as follows:
Method
Maximum-entropy-preserving hash for reducing 32 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------ ----------- -----------------------
16 0x80008001 16 (i * 0x80008001) >> 16
8 0x80808081 24 (i * 0x80808081) >> 24
4 0x88888889 28 (i * 0x88888889) >> 28
2 0xAAAAAAAB 30 (i * 0xAAAAAAAB) >> 30
Maximum-entropy-preserving hash for reducing 64 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------------ ----------- -------------------------------
32 0x8000000080000001 32 (i * 0x8000000080000001) >> 32
16 0x8000800080008001 48 (i * 0x8000800080008001) >> 48
8 0x8080808080808081 56 (i * 0x8080808080808081) >> 56
4 0x8888888888888889 60 (i * 0x8888888888888889) >> 60
2 0xAAAAAAAAAAAAAAAB 62 (i * 0xAAAAAAAAAAAAAAAB) >> 62
Notes:
Use unsigned multiply and discard any overflow (64-bit multiply is not needed).
If isolating the result using right-shift (as shown), be sure to use an unsigned shift operation.
Further discussion
I find this all this quite cool. In practical terms, the key information-theoretical requirement is the guar­antee that, for any m-bit input value and its corresponding n-bit hash value result, flipping any one of the m source bits always causes some change in the n-bit result value. Now al­though there are 2ⁿ possible result values in total, one of them is already "in-use" (by the result itself) since "switching" to that one from any other result would be no change at all. This leaves 2ⁿ - 1 result values that are eligible to be used by the entire set of m input values flipped by a single bit.
Let's consider an example; in fact, to show how this technique might seem to border on spooky or downright magical, we'll consider the more extreme case where m = 64 and n = 2. With 2 output bits there are four possible result values, { 0, 1, 2, 3 }. Assuming an arbitrary 64-bit input value 0x7521d9318fbdf523, we obtain its 2-bit hash value of 1:
(0x7521d9318fbdf523 * 0xAAAAAAAAAAAAAAAB) >> 62 // result --> '1'
So the result is 1 and the claim is that no value in the set of 64 values where a single-bit of 0x7521d9318fbdf523 is toggled may have that same result value. That is, none of those 64 other results can use value 1 and all must instead use either 0, 2, or 3. So in this example it seems like every one of the 2⁶⁴ input values—to the exclusion of 64 other input values—will selfishly hog one-quarter of the output space for itself. When you consider the sheer magnitude of these interacting constraints, can a simultaneously satisfying solution overall even exist?
Well sure enough, to show that (exactly?) one does, here are the hash result values, listed in order, for inputs that flipping a single bit of 0x7521d9318fbdf523 (one at a time), from MSB (position 63) down to LSB (0).
3 2 0 3 3 3 3 3 3 0 0 0 3 0 3 3 0 3 3 3 0 0 3 3 3 0 0 3 3 0 3 3 // continued…
0 0 3 0 0 3 0 3 0 0 0 3 0 3 3 3 0 3 0 3 3 3 3 3 3 0 0 0 3 0 0 3 // notice: no '1' values
As you can see, there are no 1 values, which entails that every bit in the source "as-is" must be contributing to influence the result (or, if you prefer, the de facto state of each-and-every bit in 0x7521d9318fbdf523 is essential to keeping the entire overall result from being "not-1"). Because no matter what single-bit change you make to the 64-bit input, the 2-bit result value will no longer be 1.
Keep in mind that the "missing-value" table shown above was dumped from the analysis of just the one randomly-chosen example value 0x7521d9318fbdf523; every other possible input value has a similar table of its own, each one eerily missing its owner's actual result value while yet somehow being globally consistent across its set-membership. This property essentially corresponds to maximally preserving the available entropy during the (inherently lossy) bit-width reduction task.
So we see that every one of the 2⁶⁴ possible source values independently imposes, on exactly 64 other source values, the constraint of excluding one of the possible result values. What defies my intuition about this is that there are untold quadrillions of these 64-member sets, each of whose members also belongs to 63 other, seemingly unrelated bit-twiddling sets. Yet somehow despite this most confounding puzzle of interwoven constraints, it is nevertheless trivial to exploit the one (I surmise) resolution which simultaneously satisfies them all exactly.
All this seems related to something you may have noticed in the tables above: namely, I don't see any obvious way to extend the technique to the case of compressing down to a 1-bit result. In this case, there are only two possible result values { 0, 1 }, so if any/every given (e.g.) 64-bit input value still summarily excludes its own result from being the result for all 64 of its single-bit-flip neighbors, then that now essentially imposes the other, only remaining value on those 64. The math breakdown we see in the table seems to be signalling that a simultaneous result under such conditions is a bridge too far.
In other words, the special 'information-preserving' characteristic of xor (that is, its luxuriously reliable guarantee that, as opposed to and, or, etc., it c̲a̲n̲ and w̲i̲l̲l̲ always change a bit) not surprisingly exacts a certain cost, namely, a fiercely non-negotiable demand for a certain amount of elbow room—at least 2 bits—to work with.
I think this is the best you're going to get. You could compress the code to a single line but the var's are there for now as documentation:
function hash_32b_to_16b(val32b) {
var rightBits = val32b & 0xffff; // Left-most 16 bits
var leftBits = val32b & 0xffff0000; // Right-most 16 bits
leftBits = leftBits >>> 16; // Shift the left-most 16 bits to a 16-bit value
return rightBits ^ leftBits; // XOR the left-most and right-most bits
}
Given the parameters of the problem, the best solution would have each 16-bit hash correspond to exactly 2^16 32-bit numbers. It would also IMO hash sequential 32-bit numbers differently. Unless I'm missing something, I believe this solution does those two things.
I would argue that security cannot be a consideration in this problem, as the hashed value is just too few bits. I believe that the solution I gave provides even distribution of 32-bit numbers to 16-bit hashes
This depends on the nature of the integers.
If they can contain some bit-masks, or can differ by powers of two, then simple XORs will have high probability of collisions.
You can try something like (i>>16) ^ ((i&0xffff) * p) with p being a prime number.
Security-hashes like MD5 are all good, but they are obviously an overkill here. Anything more complex than CRC16 is overkill.
I would say just apply a standard hash like sha1 or md5 and then grab the last 16 bits of that.
Assuming that you expect the least significant bits to 'vary' the most, I think you're probably going to get a good enough distribution by just using the lower 16-bits of the value as a hash.
If the numbers you're going to hash won't have that kind of distribution, then the additional step of xor-ing in the upper 16 bits might be helpful.
Of course this suggestion is if you're intending to use the hash merely for some sort of lookup/storage scheme and aren't looking for the crypto-related properties of non-guessability and non-reversability (which the xor-ing suggestions don't really buy you either).
Something simple like this....
function hash_32b_to_16b(val32b) {
var h = hmac(secretKey, sha512);
var v = val32b;
for(var i = 0; i < 4096; ++i)
v = h(v);
return v % 0xffff;
}

Categories

Resources