I am trying out Advent Of Code 2021, A bunch of challenges in coding basically.
On Day 3 2021 Part 2, I run into a problem with recursion and loop-based code.
Here is the JS code:
let input = `00100
11110
10110
10111
10101
01111
00111
11100
10000
11001
00010
01010`
let numbers = input.split("\n")
let onBits = []
let offBits = []
for (let i = 0; i < numbers[0].length; i++) {
if (numbers.length == 1) {
console.log(numbers[0])
break;
}
for (let j = 0; j < numbers.length; j++) {
let bit = parseInt(numbers[j].split("")[i])
if (bit == 1) {
onBits.push(numbers[j])
} else if (bit == 0) {
offBits.push(numbers[j])
}
}
if (onBits.length > offBits.length) {
numbers = onBits
console.log(`${onBits.length} > ${offBits.length}`)
} else if (onBits.length == offBits.length) {
numbers = onBits
console.log(`${onBits.length} == ${offBits.length} so OnBits.`)
} else if (onBits.length < offBits.length) {
numbers = offBits
console.log(`${onBits.length} < ${offBits.length}`)
}
}
I am basically getting the input and splitting it, Then I get the most common first bit and get every number that has that as their first bit.
So, for example,
The first common bit is 1, that means the numbers array will become an array that has every number that has 1 as their first bit.
And then I loop this with the resulted array until I get one specified number.
But, for some reason this goes into an infinite loop?
The offBits apparently increase rapidly with no reason into an infinite loop.
Any help is appreciated.
In this case, I am trying to find the oxygen generator rating.
Sorry for making this complicated.
Full problem (part 1, part 2)
--- Day 3: Binary Diagnostic ---
The submarine has been making some odd creaking noises, so you ask it to produce a diagnostic report just in case.
The diagnostic report (your puzzle input) consists of a list of binary numbers which, when decoded properly, can tell you many useful things about the conditions of the submarine. The first parameter to check is the power consumption.
You need to use the binary numbers in the diagnostic report to generate two new binary numbers (called the gamma rate and the epsilon rate). The power consumption can then be found by multiplying the gamma rate by the epsilon rate.
Each bit in the gamma rate can be determined by finding the most common bit in the corresponding position of all numbers in the diagnostic report. For example, given the following diagnostic report:
00100
11110
10110
10111
10101
01111
00111
11100
10000
11001
00010
01010
Considering only the first bit of each number, there are five 0 bits and seven 1 bits. Since the most common bit is 1, the first bit of the gamma rate is 1.
The most common second bit of the numbers in the diagnostic report is 0, so the second bit of the gamma rate is 0.
The most common value of the third, fourth, and fifth bits are 1, 1, and 0, respectively, and so the final three bits of the gamma rate are 110.
So, the gamma rate is the binary number 10110, or 22 in decimal.
The epsilon rate is calculated in a similar way; rather than use the most common bit, the least common bit from each position is used. So, the epsilon rate is 01001, or 9 in decimal. Multiplying the gamma rate (22) by the epsilon rate (9) produces the power consumption, 198.
Use the binary numbers in your diagnostic report to calculate the gamma rate and epsilon rate, then multiply them together. What is the power consumption of the submarine? (Be sure to represent your answer in decimal, not binary.)
--- Part Two ---
Next, you should verify the life support rating, which can be determined by multiplying the oxygen generator rating by the CO2 scrubber rating.
Both the oxygen generator rating and the CO2 scrubber rating are values that can be found in your diagnostic report - finding them is the tricky part. Both values are located using a similar process that involves filtering out values until only one remains. Before searching for either rating value, start with the full list of binary numbers from your diagnostic report and consider just the first bit of those numbers. Then:
Keep only numbers selected by the bit criteria for the type of rating value for which you are searching. Discard numbers which do not match the bit criteria.
If you only have one number left, stop; this is the rating value for which you are searching.
Otherwise, repeat the process, considering the next bit to the right.
The bit criteria depends on which type of rating value you want to find:
To find oxygen generator rating, determine the most common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 1 in the position being considered.
To find CO2 scrubber rating, determine the least common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 0 in the position being considered.
For example, to determine the oxygen generator rating value using the same example diagnostic report from above:
Start with all 12 numbers and consider only the first bit of each number. There are more 1 bits (7) than 0 bits (5), so keep only the 7 numbers with a 1 in the first position: 11110, 10110, 10111, 10101, 11100, 10000, and 11001.
Then, consider the second bit of the 7 remaining numbers: there are more 0 bits (4) than 1 bits (3), so keep only the 4 numbers with a 0 in the second position: 10110, 10111, 10101, and 10000.
In the third position, three of the four numbers have a 1, so keep those three: 10110, 10111, and 10101.
In the fourth position, two of the three numbers have a 1, so keep those two: 10110 and 10111.
In the fifth position, there are an equal number of 0 bits and 1 bits (one each). So, to find the oxygen generator rating, keep the number with a 1 in that position: 10111.
As there is only one number left, stop; the oxygen generator rating is 10111, or 23 in decimal.
Then, to determine the CO2 scrubber rating value from the same example above:
Start again with all 12 numbers and consider only the first bit of each number. There are fewer 0 bits (5) than 1 bits (7), so keep only the 5 numbers with a 0 in the first position: 00100, 01111, 00111, 00010, and 01010.
Then, consider the second bit of the 5 remaining numbers: there are fewer 1 bits (2) than 0 bits (3), so keep only the 2 numbers with a 1 in the second position: 01111 and 01010.
In the third position, there are an equal number of 0 bits and 1 bits (one each). So, to find the CO2 scrubber rating, keep the number with a 0 in that position: 01010.
As there is only one number left, stop; the CO2 scrubber rating is 01010, or 10 in decimal.
Finally, to find the life support rating, multiply the oxygen generator rating (23) by the CO2 scrubber rating (10) to get 230.
Use the binary numbers in your diagnostic report to calculate the oxygen generator rating and CO2 scrubber rating, then multiply them together. What is the life support rating of the submarine? (Be sure to represent your answer in decimal, not binary.)
Related
I have this simple test in nodejs, I left it running overnight and could not get Math.random() to repeat. I realize that sooner or later the values (or even the whole sequence) will repeat, but is there any reasonable expectancy as to when it is going to happen?
let v = {};
for (let i = 0;; i++) {
let r = Math.random();
if (r in v) break;
v[r] = r;
}
console.log(i);
It is browser specific:
https://www.ecma-international.org/ecma-262/6.0/#sec-math.random
20.2.2.27
Math.random ( ) Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo
randomly with approximately uniform distribution over that range,
using an implementation-dependent algorithm or strategy. This function
takes no arguments.
Each Math.random function created for distinct code Realms must
produce a distinct sequence of values from successive calls.
The requirement here is just pseudo-random with uniform distribution.
Here's a blog post from V8 (Chrome and NodeJs's Javascript Engine).
https://v8.dev/blog/math-random
Where they say they are using xorshift128+, which has a maximal period of 2^128 -1.
Related (on another site): Acceptable to rely on random ints being unique?
Also extremely related: How many double numbers are there between 0.0 and 1.0?
Mathematically, there are an infinite number of real numbers between 0 and 1. However, there are only a finite number of possible values that Math.Random could generate (because computers only have a finite number of bits to represent numbers). Let's say that there are N possible values that it could generate. Then, by the Pigeonhole Principle, there is a 100% chance of getting at least one duplicate value once you generate exactly N + 1 values.
At this point, the Birthday Paradox demonstrates that you should start seeing duplicates surprisingly quickly. According to this "paradox" (which isn't a true paradox, just counterintuitive), given a room with only 23 people, there's a greater than 50% chance of two of them having the same birthday.
Returning to our example, the rule of thumb for calculating this (see the linked Wikipedia article) suggests that Math.Random reaches a 50% probability of duplicates once you generate approximately sqrt(N) numbers.
From the linked Stack Overflow question, if we assume that there are 7,036,874,417,766 numbers between 0 and 1 like the accepted answer says (and please read the linked question for a more detailed explanation of how many there actually are), then sqrt(7036874417766) is just over 2.652 million, which isn't actually all that many. If you are generating 10,000 random numbers per second, you'd reach 50% probability in approximately 737 hours, which is just under 31 days. Less fortunately, even at 10,000 per second, it would take approximately 195,468 hours (which is approximately 22.3 years) to reach 100% probability.
Some of the other answers give much higher figures for how many numbers there are, so take your pick.
When I add a bunch of floating-point numbers with JavaScript, what is the error bound on the sum? What error bound should be used to check if two sums are equal?
In a simple script, I add a bunch of floating-point numbers and compare sums. I notice that sometimes the result is not correct (two sums that should be equal are not). I am pretty weak at numerical analysis, but even after reviewing Is floating point math broken? and What Every Computer Scientist Should Know About Floating-Point Arithmetic and Comparing Floating Point Numbers, 2012 Edition I am confused about how best to compare floating-point sums in JavaScript.
First, I was confused by: The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded (as if they were computed exactly then rounded to the nearest floating-point number). If JavaScript is based on the IEEE standard, how can 0.1 + 0.2 != 0.3?
I think I answered this for myself: It's easier for me to think about an example in base 10. If 1/3 is approximated 0.333...333 and 2/3 is approximated 0.666...667, 1/3 + 1/3 = 0.666...666 is exactly rounded (it is the exact sum of two approximations) but != 0.666...667. Intermediate results of exactly rounded operations are still rounded, which can still introduce error.
How big is machine epsilon? JavaScript floating-point numbers are apparently 64-bits, and apparently IEEE double precision format machine epsilon is about 1e-16?
When I add a bunch (n) of floating-point numbers (naive summation, without pairwise or Kahan summation), what is the error bound on the sum? Intuitively it is proportional to n. The worst-case example I can think of (again in base 10) is 2/3 - 1/3 - 1/3 + 2/3 - 1/3 - 1/3 + etc. I think each iteration will increment the error term by 1 ULP while the sum remains zero, so both the error term and relative error will grow without bound?
In the section "Errors in Summation" Goldberg is more precise (error term is bounded by n * machine epsilon * sum of the absolute values) but also points out that if the sum is being done in an IEEE double precision format, machine epsilon is about 1e-16, so n * machine epsilon will be much less than 1 for any reasonable value of n (n much less than 1e16). How can this error bound be used to check if two floating-point sums are equal? What relationship between the sums, 1, 1e-16, n, etc. must be true if they are equal?
Another intuition: If the bunch of numbers are all positive (mine are) then although the error term can grow without bound, the relative error will not, because the sum must grow at the same time. In base 10, the worst-case example I can think of (in which the error term grows fastest while the sum grows slowest) is if 1.000...005 is approximated 1.000...000. Repeatedly adding this number will increment the error term by 1/2 ULP (of the summand, 0.000...005) while incrementing the sum by 1 first place unit. The worst relative error is 4.5 ULP (0.000...045, when the sum is 9.000...000) which is (base - 1) / 2 ULP which is 1/2 ULP in base 2?
If two floating-point sums are equal, then their absolute difference must be less than twice the error bound, which is 1 ULP in base 2? So in JavaScript, Math.abs(a - b) < a * 1e-16 + b * 1e-16?
Comparing Floating Point Numbers, 2012 Edition describes another technique for comparing floating-point numbers, also based on relative error. In JavaScript, is it possible to find the number of representable numbers between two floating-point numbers?
The maximum possible error in the sum of n numbers added consecutively is proportional to n2, not to n.
The key reason for this is that each addition may have some error proportional to its sum, and those sums keep growing as more additions are made. In the worse case, the sums grow in proportion to n (if you add n x’s together, you get nx). So, in the end, there are n sums that have grown in proportion to n, yielding a total possible error proportional to n2.
JavaScript is specified by the ECMA Language Specification, which says that IEEE-754 64-bit binary floating-point is used and round-to-nearest mode is used. I do not see any provision allowing extra precision as some languages do.
Suppose all numbers have magnitude at most b, where b is some representable value. If your numbers have a distribution that can be characterized more specifically, then an error bound tighter than described below might be derived.
When the exact mathematical result of an operation is y, and there is no overflow, then the maximum error in IEEE-754 binary floating-point with round-to-nearest mode is 1/2 ULP(y), where ULP(y) is the distance between the two representable values just above and below y in magnitude (using y itself as the “above” value if it is exactly representable). This is the maximum error because y is always either exactly on the midpoint between two bordering values or is on one side or the other, so the distance from y to one of the bordering values is at most the distance from the midpoint to a bordering value.
(In IEEE-754 64-bit binary, the ULP of all numbers less than 2-1022 in magnitude is 2-1074. The ULP of all larger powers of two is 2-52 times the number; e.g., 2-52 for 1. The ULP for non-powers of two is the ULP of the largest power of two smaller than the number, e.g., 2-52 for any number above 1 and below 2.)
When the first two numbers in a series are added, the exact result is at most 2b, so the error in this first addition is at most 1/2 ULP(2b). When the third number is added, the result is at most 3b, so the error in this addition is at most 1/2 ULP(3b). The total error so far is at most 1/2 (ULP(2b) + ULP(3b)).
At this point, the addition could round up, so the partial sum so far could be slightly more than 3b, and the next sum could be slightly more than 4b. If we want to compute a strict bound on the error, we could use an algorithm such as:
Let bound = 0.
For i = 2 to n:
bound += 1/2 ULP(i*b + bound).
That is, for each of the additions that will be performed, add an error bound that is 1/2 the ULP of the largest conceivable result given the actual values added plus all the previous errors. (The pseudo-code above would need to be implemented extended precision or with rounding upward in order to retain mathematical rigor.)
Thus, given only the number of numbers to be added and a bound on their magnitudes, we can pre-compute an error bound without knowing their specific values in advance. This error bound will grow in proportion to n2.
If this potential error is too high, there are ways to reduce it:
Instead of adding numbers consecutively, they can be split in half, and the sums of the two halves can be added. Each of the halves can be recursively summed in this way. When this is done, the maximum magnitudes of the partial sums will be smaller, so the bounds on their errors will be smaller. E.g., with consecutive additions of 1, we have sums 2, 3, 4, 5, 6, 7, 8, but, with this splitting, we have parallel sums of 2, 2, 2, 2, then 4, 4, then 8.
We can sort the numbers and keep the sums smaller by adding numbers that cancel each other out (complementary positive and negative numbers) or adding smaller numbers first.
The Kahan summation algorithm can be employed to get some extended precision without much extra effort.
Considering one particular case:
Consider adding n non-negative numbers, producing a calculated sum s. Then the error in s is at most (n-1)/2 • ULP(s).
Proof: Each addition has error at most 1/2 ULP(x), where x is the calculated value. Since we are adding non-negative values, the accumulating sum never decreases, so it is never more than s, and its ULP is at most the ULP of s. So the n-1 additions produce at most n-1 errors of ULP(s)/2.
I am trying to understand the way to add, subtract, divide, and multiply by operating on the bits.
It is necessary to do some optimizing in my JavaScript program due to many calculations running after an event has happened.
By using the code below for a reference I am able to understand that the carry holds the &ing value. Then by doing the XOr that sets the sum var to the bits that do not match in each n1 / n2 variable.
Here is my question.;) What does shifting the (n1 & n2)<<1 by 1 do? What is the goal by doing this? As with the XOr it is obvious that there is no need to do anything else with those bits because their decimal values are ok as they are in the sum var. I can't picture in my head what is being accomplished by the & shift operation.
function add(n1,n2)
{
var carry, sum;
// Find out which bits will result in a carry.
// Those bits will affect the bits directly to
// the left, so we shall shift one bit.
carry = (n1 & n2) << 1;
// In digital electronics, an XOR gate is also known
// as a quarter adder. Basically an addition is performed
// on each individual bit, and the carry is discarded.
//
// All I'm doing here is applying the same concept.
sum = n1 ^ n2;
// If any bits match in position, then perform the
// addition on the current sum and the results of
// the carry.
if (sum & carry)
{
return add(sum, carry);
}
// Return the sum.
else
{
return sum ^ carry;
};
};
The code above works as expected but it does not return the floating point values. I've got to have the total to be returned along with the floating point value.
Does anyone have a function that I can use with the above that will help me with floating point values? Are a website with a clear explanation of what I am looking for? I've tried searching for the last day are so and cannot find anything to go look over.
I got the code above from this resource.
http://www.dreamincode.net/code/snippet3015.htm
Thanks ahead of time!
After thinking about it doing a left shift to the 1 position is a multiplication by 2.
By &ing like this : carry = (n1 & n2) << 1; the carry var will hold a string of binaries compiled of the matched positions in n1 and n2. So, if n1 is 4 and n2 is 4 they both hold the same value. Therefore, by combing the two and right shifting to the 1 index will multiply 4 x 2 = 8; so carry would now equal 8.
1.) var carry = 00001000 =8
&
00001000 =8
2.) carry = now holds the single value of 00001000 =8
A left shift will multiply 8 x 2 =16, or 8 + 8 = 16
3.)carry = carry <<1 , shift all bits over one position
4.) carry now holds a single value of 00010000 = 16
I still cannot find anything on working with floating point values. If anyone has anything do post a link.
It doesn't work because the code assumes that the floating point numbers are represented as integer numbers, which they aren't. Floating point numbers are represented using the IEEE 754 standard, which breaks the numbers in three parts: a sign bit, a group of bits representing an exponent, and another group representing a number between 1 (inclusive) and 2 (exclusive), the mantissa, and the value is calculated as
(sign is set ? 1 : -1) * (mantissa ^ (exponent - bias))
Where the bias depends on the precision of the floating point number. So the algorithm you use for adding two numbers assumes that the bits represent an integer which is not the case for floating point numbers. Operations such as bitwise-AND and bitwise-OR also don't give the results that you'd expect in an integer world.
Some examples, in double precision, the number 2.3 is represented as (in hex) 4002666666666666, while the number 5.3 is represented as 4015333333333333. OR-ing those two numbers will give you 4017777777777777, which represents (roughly) 5.866666.
There are some good pointers on this format, I found the links at http://www.psc.edu/general/software/packages/ieee/ieee.php, http://babbage.cs.qc.edu/IEEE-754/ and http://www.binaryconvert.com/convert_double.html fairly good for understanding it.
Now, if you still want to implement the bitwise addition for those numbers, you can. But you'll have to break the number down in its parts, then normalize the numbers in the same exponent (otherwise you won't be able to add them), perform the addition on the mantissa, and finally normalize it back to the IEEE754 format. But, as #LukeGT said, you'll likely not get a better performance than the JS engine you're running. And some JS implementations don't even support bitwise operations on floating point numbers, so what usually ends up happening is that they first cast the numbers to integers, then perform the operation, which will make your results incorrect as well.
Floating point values have a complicated bit structure, which is very difficult to manipulate with bit operations. As a result, I doubt you could do any better than the Javascript engine at computing them. Floating point calculations are inherently slow, so you should try to avoid them if you're worried about speed.
Try using integers to represent a decimal number to x amount of digits instead. For example if you were working with currency, you could store things in terms of whole cents as opposed to dollars with fractional values.
Hope that helps.
I'm looking for a way to influence Math.random().
I have this function to generate a number from min to max:
var rand = function(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
Is there a way to make it more likely to get a low and high number than a number in the middle?
For example; rand(0, 10) would return more of 0,1,9,10 than the rest.
Is there a way to make it more likely to get a low and high number than a number in the middle?
Yes. You want to change the distribution of the numbers generated.
http://en.wikipedia.org/wiki/Random_number_generation#Generation_from_a_probability_distribution
One simple solution would be to generate an array with say, 100 elements.
In those 100 elements represent the numbers you are interested in more frequently.
As a simple example, say you wanted number 1 and 10 to show up more frequently, you could overrepresent it in the array. ie. have number one in the array 20 times, number 10 in the array 20 times, and the rest of the numbers in there distributed evenly. Then use a random number between 0-100 as the array index. This will increase your probability of getting a 1 or a 10 versus the other numbers.
You need a distribution map. Mapping from random output [0,1] to your desired distribution outcome. like [0,.3] will yield 0, [.3,.5] will yield 1, and so on.
Sure. It's not entirely clear whether you want a smooth rolloff so (for example) 2 and 8 are returned more often than 5 or 6, but the general idea works either way.
The typical way to do this is to generate a larger range of numbers than you'll output. For example, lets start with 5 as the base line occurring with frequency N. Let's assume that you want 4 or 7 to occur at frequency 2N, 3 or 8 at frequency 3N, 2 or 9 and frequency 4N and 0 or 10 at frequency 5N.
Adding those up, we need values from 1 to 29 (or 0 to 28, or whatever) from the generator. Any of the first 5 gives an output of 0. Any of the next 4 gives and output of 1. Any of the next 3 gives an output of 2, and so on.
Of course, this doesn't change the values returned by the original generator -- it just lets us write a generator of our own that produces numbers following the distribution we've chosen.
Not really. There is a sequence of numbers that are generated based off the seed. Your random numbers come from the sequence. When you call random, you are grabbing the next element of the sequence.
Can you influence the output of Math.random in javascript (which runs client side)?
No. At least not in any feasible/practical manner.
But what you could do is to create your own random number generator that produces number in the distribution that you need.
There are probably an infinite number of ways of doing it, and you might want to think about the exact shape/curvature of the probability function.
It can be probably be done in one line, but here is a multi-line approach that uses your existing function definition (named rand, here):
var dd = rand(1,5) + rand(0,5);
var result;
if (dd > 5)
result = dd - 5;
else result = 6 - dd;
One basic result is that if U is a random variable with uniform distribution and F is the cumulative distribution you want to sample from, then Y = G(X) where G is the inverse of F has F as its cumulative distribution. This might not necessarily be the most efficient way of doing and generating random numbers from all sort of distributions is a research subfield in and of itself. But for a simple transformation it might just do the trick. Like in your case, F(x) could be 4*(x-.5)^3+.5, it seems to satisfy all constraints and is easy to invert and use as a transformation of the basic random number generator.
What are some simple ways to hash a 32-bit integer (e.g. IP address, e.g. Unix time_t, etc.) down to a 16-bit integer?
E.g. hash_32b_to_16b(0x12345678) might return 0xABCD.
Let's start with this as a horrible but functional example solution:
function hash_32b_to_16b(val32b) {
return val32b % 0xffff;
}
Question is specifically about JavaScript, but feel free to add any language-neutral solutions, preferably without using library functions.
The context for this question is generating unique IDs (e.g. a 64-bit ID might be composed of several 16-bit hashes of various 32-bit values). Avoiding collisions is important.
Simple = good. Wacky+obfuscated = amusing.
The key to maximizing the preservation of entropy of some original 32-bit 'signal' is to ensure that each of the 32 input bits has an independent and equal ability to alter the value of the 16-bit output word.
Since the OP is requesting a bit-size which is exactly half of the original, the simplest way to satisfy this criteria is to xor the upper and lower halves, as others have mentioned. Using xor is optimal because—as is obvious by the definition of xor—independently flipping any one of the 32 input bits is guaranteed to change the value of the 16-bit output.
The problem becomes more interesting when you need further reduction beyond just half-the-size, say from a 32-bit input to, let's say, a 2-bit output. Remember, the goal is to preserve as much entropy from the source as possible, so solutions which involve naively masking off the two lowest bits with (i & 3) are generally heading in the wrong direction; doing that guarantees that there's no way for any bits except the unmasked bits to affect the result, and that generally means there's an arbitrary, possibly valuable part of the runtime signal which is being summarily discarded without principle.
Following from the earlier paragraph, you could of course iterate with xor three additional times to produce a 2-bit output with the desired property of being equally-influenced by each/any of the input bits. That solution is still optimally correct of course, but involves looping or multiple unrolled operations which, as it turns out, aren't necessary!
Fortunately, there is a nice technique of only two operations which gives the same optimal result for this situation. As with xor, it not only ensures that, for any given 32-bit value, twiddling any input bit will result in a change to the 2-bit output, but also that, given a uniform distribution of input values, the distribution of 2-bit output values will also be perfectly uniform. In the current example, the method divides the 4,294,967,296 possible input values into exactly 1,073,741,824 each of the four possible 2-bit hash results { 0, 1, 2, 3 }.
The method I mention here uses specific magic values that I discovered via exhaustive search, and which don't seem to be discussed very much elsewhere on the internet, at least for the particular use under discussion here (i.e., ensuring a uniform hash distribution that's maximally entropy-preserving). Curiously, according to this same exhaustive search, the magic values are in fact unique, meaning that for each of target bit-widths { 16, 8, 4, 2 }, the magic value I show below is the only value that, when used as I show here, satisfies the perfect hashing criteria outlined above.
Without further ado, the unique and mathematically optimal procedure for hashing 32-bits to n = { 16, 8, 4, 2 } is to multiply by the magic value corresponding to n (unsigned, discarding overflow), and then take the n highest bits of the result. To isolate those result bits as a hash value in the range [0 ... (2ⁿ - 1)], simply right-shift (unsigned!) the multiplication result by 32 - n bits.
The "magic" values, and C-like expression syntax are as follows:
Method
Maximum-entropy-preserving hash for reducing 32 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------ ----------- -----------------------
16 0x80008001 16 (i * 0x80008001) >> 16
8 0x80808081 24 (i * 0x80808081) >> 24
4 0x88888889 28 (i * 0x88888889) >> 28
2 0xAAAAAAAB 30 (i * 0xAAAAAAAB) >> 30
Maximum-entropy-preserving hash for reducing 64 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------------ ----------- -------------------------------
32 0x8000000080000001 32 (i * 0x8000000080000001) >> 32
16 0x8000800080008001 48 (i * 0x8000800080008001) >> 48
8 0x8080808080808081 56 (i * 0x8080808080808081) >> 56
4 0x8888888888888889 60 (i * 0x8888888888888889) >> 60
2 0xAAAAAAAAAAAAAAAB 62 (i * 0xAAAAAAAAAAAAAAAB) >> 62
Notes:
Use unsigned multiply and discard any overflow (64-bit multiply is not needed).
If isolating the result using right-shift (as shown), be sure to use an unsigned shift operation.
Further discussion
I find this all this quite cool. In practical terms, the key information-theoretical requirement is the guarantee that, for any m-bit input value and its corresponding n-bit hash value result, flipping any one of the m source bits always causes some change in the n-bit result value. Now although there are 2ⁿ possible result values in total, one of them is already "in-use" (by the result itself) since "switching" to that one from any other result would be no change at all. This leaves 2ⁿ - 1 result values that are eligible to be used by the entire set of m input values flipped by a single bit.
Let's consider an example; in fact, to show how this technique might seem to border on spooky or downright magical, we'll consider the more extreme case where m = 64 and n = 2. With 2 output bits there are four possible result values, { 0, 1, 2, 3 }. Assuming an arbitrary 64-bit input value 0x7521d9318fbdf523, we obtain its 2-bit hash value of 1:
(0x7521d9318fbdf523 * 0xAAAAAAAAAAAAAAAB) >> 62 // result --> '1'
So the result is 1 and the claim is that no value in the set of 64 values where a single-bit of 0x7521d9318fbdf523 is toggled may have that same result value. That is, none of those 64 other results can use value 1 and all must instead use either 0, 2, or 3. So in this example it seems like every one of the 2⁶⁴ input values—to the exclusion of 64 other input values—will selfishly hog one-quarter of the output space for itself. When you consider the sheer magnitude of these interacting constraints, can a simultaneously satisfying solution overall even exist?
Well sure enough, to show that (exactly?) one does, here are the hash result values, listed in order, for inputs that flipping a single bit of 0x7521d9318fbdf523 (one at a time), from MSB (position 63) down to LSB (0).
3 2 0 3 3 3 3 3 3 0 0 0 3 0 3 3 0 3 3 3 0 0 3 3 3 0 0 3 3 0 3 3 // continued…
0 0 3 0 0 3 0 3 0 0 0 3 0 3 3 3 0 3 0 3 3 3 3 3 3 0 0 0 3 0 0 3 // notice: no '1' values
As you can see, there are no 1 values, which entails that every bit in the source "as-is" must be contributing to influence the result (or, if you prefer, the de facto state of each-and-every bit in 0x7521d9318fbdf523 is essential to keeping the entire overall result from being "not-1"). Because no matter what single-bit change you make to the 64-bit input, the 2-bit result value will no longer be 1.
Keep in mind that the "missing-value" table shown above was dumped from the analysis of just the one randomly-chosen example value 0x7521d9318fbdf523; every other possible input value has a similar table of its own, each one eerily missing its owner's actual result value while yet somehow being globally consistent across its set-membership. This property essentially corresponds to maximally preserving the available entropy during the (inherently lossy) bit-width reduction task.
So we see that every one of the 2⁶⁴ possible source values independently imposes, on exactly 64 other source values, the constraint of excluding one of the possible result values. What defies my intuition about this is that there are untold quadrillions of these 64-member sets, each of whose members also belongs to 63 other, seemingly unrelated bit-twiddling sets. Yet somehow despite this most confounding puzzle of interwoven constraints, it is nevertheless trivial to exploit the one (I surmise) resolution which simultaneously satisfies them all exactly.
All this seems related to something you may have noticed in the tables above: namely, I don't see any obvious way to extend the technique to the case of compressing down to a 1-bit result. In this case, there are only two possible result values { 0, 1 }, so if any/every given (e.g.) 64-bit input value still summarily excludes its own result from being the result for all 64 of its single-bit-flip neighbors, then that now essentially imposes the other, only remaining value on those 64. The math breakdown we see in the table seems to be signalling that a simultaneous result under such conditions is a bridge too far.
In other words, the special 'information-preserving' characteristic of xor (that is, its luxuriously reliable guarantee that, as opposed to and, or, etc., it c̲a̲n̲ and w̲i̲l̲l̲ always change a bit) not surprisingly exacts a certain cost, namely, a fiercely non-negotiable demand for a certain amount of elbow room—at least 2 bits—to work with.
I think this is the best you're going to get. You could compress the code to a single line but the var's are there for now as documentation:
function hash_32b_to_16b(val32b) {
var rightBits = val32b & 0xffff; // Left-most 16 bits
var leftBits = val32b & 0xffff0000; // Right-most 16 bits
leftBits = leftBits >>> 16; // Shift the left-most 16 bits to a 16-bit value
return rightBits ^ leftBits; // XOR the left-most and right-most bits
}
Given the parameters of the problem, the best solution would have each 16-bit hash correspond to exactly 2^16 32-bit numbers. It would also IMO hash sequential 32-bit numbers differently. Unless I'm missing something, I believe this solution does those two things.
I would argue that security cannot be a consideration in this problem, as the hashed value is just too few bits. I believe that the solution I gave provides even distribution of 32-bit numbers to 16-bit hashes
This depends on the nature of the integers.
If they can contain some bit-masks, or can differ by powers of two, then simple XORs will have high probability of collisions.
You can try something like (i>>16) ^ ((i&0xffff) * p) with p being a prime number.
Security-hashes like MD5 are all good, but they are obviously an overkill here. Anything more complex than CRC16 is overkill.
I would say just apply a standard hash like sha1 or md5 and then grab the last 16 bits of that.
Assuming that you expect the least significant bits to 'vary' the most, I think you're probably going to get a good enough distribution by just using the lower 16-bits of the value as a hash.
If the numbers you're going to hash won't have that kind of distribution, then the additional step of xor-ing in the upper 16 bits might be helpful.
Of course this suggestion is if you're intending to use the hash merely for some sort of lookup/storage scheme and aren't looking for the crypto-related properties of non-guessability and non-reversability (which the xor-ing suggestions don't really buy you either).
Something simple like this....
function hash_32b_to_16b(val32b) {
var h = hmac(secretKey, sha512);
var v = val32b;
for(var i = 0; i < 4096; ++i)
v = h(v);
return v % 0xffff;
}