Related
I am following a guide on Gameboy emulation and, in a snippet of code I saw the following:
while(true)
{
var op = MMU.rb(Z80._r.pc++); // Fetch instruction
Z80._map[op](); // Dispatch
Z80._r.pc &= 65535; // Mask PC to 16 bits
Z80._clock.m += Z80._r.m; // Add time to CPU clock
Z80._clock.t += Z80._r.t;
}
Where pc is a 16-bit program counter register and 65535 in hexadecimal is 0xFFFF , what is the purpose of masking a 16-bit value with 0xFFFF? As far as I know this does nothing? Or is it something to do with the sign bit?
I think the important part is that you use JavaScript - it has only one numeric type - floating point. But apparently underlying engine can recognize when it should use integers instead - using bit mask is a strong suggestion that we want to use it as integer since bit operations usually doesn't make sense for floats. It also trims all used pits in this particular variable to last 16 - what guarantee you have that earlier it wasn't using bits older than last 16? If all later operations works on assumption that number is 16-bit then without using mask your assumptions are prone to break.
what is the purpose of masking a 16-bit value
None. But there is no 16-bit value - it's just a (floating-point) number in javascript. Only to make it emulate a 16 bit value this number is cut down to 16 bits after the program counter is incremented - it did not overflow from ++ but just did take the value 65536.
That's also what the comment says: // Mask PC to 16 bits.
The short answer: it throws out all bits except the 16 lower bits. That way, when run on a 32/64 bit machine, you'll discard all others.
JS uses >16 bits and to ensure you're working with 16 bits only you discard the rest by AND-ing with 0xFFFF (or 65535). In this particular example the program-counter, which is 16 bits on a gameboy (apparently :P ), is 'wrapped around' to 0 when the value reaches 65536. An if (Z80._r.pc > 65535) Z80._r.pc = 0 would do the same but would probably perform worse. This kind of "trick" is used very often in bit manipulation code.
How to calculator googolplex (10^(10^100)) the leading N (Ex: 100) binary digits from the left ?
I know how to calculate a binary from the right to left, but this may take hundreds of years (Reference) to run...
Don't have an answer, but have a suggestion for further analysis.
if you want it in binary, then you want the bits starting in the Nth bit, where N=X+1 where X is described as follows:
2^X = 10^(10^100)
take log(b=10) =>
X = 10^100 / log(2) ==> ~ 3.3 E 100
Still not sure how to reduce it from there, but maybe playing with logarithm identities might be interesting. If you can compute X, maybe you can come up with a long division algorithm, though the running time argument on your reference makes me imagine the runtime for computing this could be the same. I.e. see you in ~600 years.
Another idea might be to investigate how numeric coprocessors create iEEE mantissas in binary.
Maybe there is an algorithm there you can leverage for something like this.
Just guessing though
So some basic math helps to outline the approach. I'll emphasize some highlights:
You want to compute the left digits accurately, in binary.
Digits, mathematically, are invariant with respect to multiplication/division by the base
Powers are equivalent to multiplication in log space
Multiplication/division by the base is equivalent to adding whole numbers in log space
BitBlitz has the right idea--you can use logarithms to solve this. In particular, take the logarithm of 10 in base 2, multiply that by 10^100, and ignore everything to the left of the (base 2) decimal place. To give you an idea, 10^100 is obviously 100 digits; using the 1K=2^10=10^3 approximation, that makes about 100/3 K or 33K, times 10 which is about 330 bits of shifting the log left to get past all of those bits you don't care about. Once you've flipped through that and start hitting the binary "decimals", you'll be computing the logarithms of the digits--left to right. Gather a huge chunk of such digits, perform the inverse log of it, and your resulting binary digits will match what you want to get.
You're definitely going to need a bignum library for this task; long double just isn't going to cut it. But it should be reasonably possible using this approach to gather a reasonable number of leftmost digits.
I want to normalize an array so that each value is
in [0-1) .. i.e. "the max will never be 1 but the min can be 0."
This is not unlike the random function returning numbers in the same range.
While looking at this, I found that .99999999999999999===1 is true!
Ditto (1-Number.MIN_VALUE) === 1 But Math.ceil(Number.MIN_VALUE) is 1, as it should be.
Some others: Math.floor(.999999999999) is 0
while Math.floor(.99999999999999999) is 1
OK so there are rounding problems in JS.
Is there any way I can normalize a set of numbers to lie in the range [0,1)?
It may help to examine the steps that JavaScript performs of each of your expressions.
In .99999999999999999===1:
The source text .99999999999999999 is converted to a Number. The closest Number is 1, so that is the result. (The next closest Number is 0.99999999999999988897769753748434595763683319091796875, which is 1–2–53.)
Then 1 is compared to 1. The result is true.
In (1-Number.MIN_VALUE) === 1:
Number.MIN_VALUE is 2–1074, about 5e–304.
1–2–1074 is extremely close to one. The exact value cannot be represented as a Number, so the nearest value is used. Again, the nearest value is 1.
Then 1 is compared to 1. The result is true.
In Math.ceil(Number.MIN_VALUE):
Number.MIN_VALUE is 2–1074, about 5e–304.
The ceiling function of that value is 1.
In Math.floor(.999999999999):
The source text .999999999999 is converted to a Number. The closest Number is 0.99999999999900002212172012150404043495655059814453125, so that is the result.
The floor function of that value is 0.
In Math.floor(.99999999999999999):
The source text .99999999999999999 is converted to a Number. The closest Number is 1, so that is the result.
The floor function of 1 is 1.
There are only two surprising things here, at most. One is that the numerals in the source text are converted to internal Number values. But this should not be surprising. Of course text has to be converted to internal representations of numbers, and the Number type cannot perfectly store all the infinitely many numbers. So it has to round. And of course numbers very near 1 round to 1.
The other possibly surprising thing is that 1-Number.MIN_VALUE is 1. But this is actually the same issue: The exact result is not representable, but it is very near 1, so 1 is used.
The Math.floor function works correctly. It never introduces any error, and you do not have to do anything to guarantee that it will round down. It always does.
However, since you want to normalize numbers, it seems likely you are going to divide numbers at some point. When you divide, there may be rounding problems, because many results of division are not exactly representable, so they must be rounded.
However, that is a separate problem, and you have not given enough information in this question to address the specific calculations you plan to do. You should open a separate question for it.
Javascript will treat any number between 0.999999999999999994 and 1 as 1, so just subtract .000000000000000006.
Of course that's not as easy as it sounds, since .000000000000000006 is evaluated as 0 in Javascript, so you could do something like:
function trueFloor(x)
{
x = x * 100;
if(x > .0000000000000006)
x = x - .0000000000000006;
x = Math.floor(x/100);
return x;
}
EDIT: Or at least you'd think you could. Apparently JS casts .99999999999999999 to 1 before passing it to a function, so you'd have to try something like:
trueFloor("0.99999999999999999")
function trueFloor(str)
{
x=str.substring(0,9) + 0;
return Math.floor(x); //=> 0
}
Not sure why you'd need that level of precision, but in theory, I guess it works. You can see a working fiddle here
As long as you cast your insanely precise float as a string, that's probably your best bet.
Please understand one thing: this...
.999999999999999999
... is just a Number literal. Just as
.999999999999999998
.999999999999999997
.999999999999999996
...
... you see the pattern.
How JavaScript treats these literals is completely another story. And yes, this treatment is limited by the number of bits that can be used to store a Number value.
The number of possible floating point literals is infinite by definition - no matter how small is the range set for them. For example, take the ones shown above: how many of numbers very close to 1 you may express? Right, it's infinite: just keep appending 9 to the line.
But the container for each Number value is quite finite: it has 64 bits. That means, it can store 2^64 different values (Infinite, -Infinite and NaN among them) - and that's all.
You want to work with such literals anyway? Use Strings to store them, not Numbers - and some BigMath JS library (take your pick) to work with those values - as Strings, again.
But from your question it looks like you're not, as you talked about array of Numbers - Number values, that is. And in no way there can be .999999999999999999 stored there, as there is no such Number value in JavaScript.
What are some simple ways to hash a 32-bit integer (e.g. IP address, e.g. Unix time_t, etc.) down to a 16-bit integer?
E.g. hash_32b_to_16b(0x12345678) might return 0xABCD.
Let's start with this as a horrible but functional example solution:
function hash_32b_to_16b(val32b) {
return val32b % 0xffff;
}
Question is specifically about JavaScript, but feel free to add any language-neutral solutions, preferably without using library functions.
The context for this question is generating unique IDs (e.g. a 64-bit ID might be composed of several 16-bit hashes of various 32-bit values). Avoiding collisions is important.
Simple = good. Wacky+obfuscated = amusing.
The key to maximizing the preservation of entropy of some original 32-bit 'signal' is to ensure that each of the 32 input bits has an independent and equal ability to alter the value of the 16-bit output word.
Since the OP is requesting a bit-size which is exactly half of the original, the simplest way to satisfy this criteria is to xor the upper and lower halves, as others have mentioned. Using xor is optimal because—as is obvious by the definition of xor—independently flipping any one of the 32 input bits is guaranteed to change the value of the 16-bit output.
The problem becomes more interesting when you need further reduction beyond just half-the-size, say from a 32-bit input to, let's say, a 2-bit output. Remember, the goal is to preserve as much entropy from the source as possible, so solutions which involve naively masking off the two lowest bits with (i & 3) are generally heading in the wrong direction; doing that guarantees that there's no way for any bits except the unmasked bits to affect the result, and that generally means there's an arbitrary, possibly valuable part of the runtime signal which is being summarily discarded without principle.
Following from the earlier paragraph, you could of course iterate with xor three additional times to produce a 2-bit output with the desired property of being equally-influenced by each/any of the input bits. That solution is still optimally correct of course, but involves looping or multiple unrolled operations which, as it turns out, aren't necessary!
Fortunately, there is a nice technique of only two operations which gives the same optimal result for this situation. As with xor, it not only ensures that, for any given 32-bit value, twiddling any input bit will result in a change to the 2-bit output, but also that, given a uniform distribution of input values, the distribution of 2-bit output values will also be perfectly uniform. In the current example, the method divides the 4,294,967,296 possible input values into exactly 1,073,741,824 each of the four possible 2-bit hash results { 0, 1, 2, 3 }.
The method I mention here uses specific magic values that I discovered via exhaustive search, and which don't seem to be discussed very much elsewhere on the internet, at least for the particular use under discussion here (i.e., ensuring a uniform hash distribution that's maximally entropy-preserving). Curiously, according to this same exhaustive search, the magic values are in fact unique, meaning that for each of target bit-widths { 16, 8, 4, 2 }, the magic value I show below is the only value that, when used as I show here, satisfies the perfect hashing criteria outlined above.
Without further ado, the unique and mathematically optimal procedure for hashing 32-bits to n = { 16, 8, 4, 2 } is to multiply by the magic value corresponding to n (unsigned, discarding overflow), and then take the n highest bits of the result. To isolate those result bits as a hash value in the range [0 ... (2ⁿ - 1)], simply right-shift (unsigned!) the multiplication result by 32 - n bits.
The "magic" values, and C-like expression syntax are as follows:
Method
Maximum-entropy-preserving hash for reducing 32 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------ ----------- -----------------------
16 0x80008001 16 (i * 0x80008001) >> 16
8 0x80808081 24 (i * 0x80808081) >> 24
4 0x88888889 28 (i * 0x88888889) >> 28
2 0xAAAAAAAB 30 (i * 0xAAAAAAAB) >> 30
Maximum-entropy-preserving hash for reducing 64 bits to. . .
Target Bits Multiplier Right Shift Expression [1, 2]
----------- ------------------ ----------- -------------------------------
32 0x8000000080000001 32 (i * 0x8000000080000001) >> 32
16 0x8000800080008001 48 (i * 0x8000800080008001) >> 48
8 0x8080808080808081 56 (i * 0x8080808080808081) >> 56
4 0x8888888888888889 60 (i * 0x8888888888888889) >> 60
2 0xAAAAAAAAAAAAAAAB 62 (i * 0xAAAAAAAAAAAAAAAB) >> 62
Notes:
Use unsigned multiply and discard any overflow (64-bit multiply is not needed).
If isolating the result using right-shift (as shown), be sure to use an unsigned shift operation.
Further discussion
I find this all this quite cool. In practical terms, the key information-theoretical requirement is the guarantee that, for any m-bit input value and its corresponding n-bit hash value result, flipping any one of the m source bits always causes some change in the n-bit result value. Now although there are 2ⁿ possible result values in total, one of them is already "in-use" (by the result itself) since "switching" to that one from any other result would be no change at all. This leaves 2ⁿ - 1 result values that are eligible to be used by the entire set of m input values flipped by a single bit.
Let's consider an example; in fact, to show how this technique might seem to border on spooky or downright magical, we'll consider the more extreme case where m = 64 and n = 2. With 2 output bits there are four possible result values, { 0, 1, 2, 3 }. Assuming an arbitrary 64-bit input value 0x7521d9318fbdf523, we obtain its 2-bit hash value of 1:
(0x7521d9318fbdf523 * 0xAAAAAAAAAAAAAAAB) >> 62 // result --> '1'
So the result is 1 and the claim is that no value in the set of 64 values where a single-bit of 0x7521d9318fbdf523 is toggled may have that same result value. That is, none of those 64 other results can use value 1 and all must instead use either 0, 2, or 3. So in this example it seems like every one of the 2⁶⁴ input values—to the exclusion of 64 other input values—will selfishly hog one-quarter of the output space for itself. When you consider the sheer magnitude of these interacting constraints, can a simultaneously satisfying solution overall even exist?
Well sure enough, to show that (exactly?) one does, here are the hash result values, listed in order, for inputs that flipping a single bit of 0x7521d9318fbdf523 (one at a time), from MSB (position 63) down to LSB (0).
3 2 0 3 3 3 3 3 3 0 0 0 3 0 3 3 0 3 3 3 0 0 3 3 3 0 0 3 3 0 3 3 // continued…
0 0 3 0 0 3 0 3 0 0 0 3 0 3 3 3 0 3 0 3 3 3 3 3 3 0 0 0 3 0 0 3 // notice: no '1' values
As you can see, there are no 1 values, which entails that every bit in the source "as-is" must be contributing to influence the result (or, if you prefer, the de facto state of each-and-every bit in 0x7521d9318fbdf523 is essential to keeping the entire overall result from being "not-1"). Because no matter what single-bit change you make to the 64-bit input, the 2-bit result value will no longer be 1.
Keep in mind that the "missing-value" table shown above was dumped from the analysis of just the one randomly-chosen example value 0x7521d9318fbdf523; every other possible input value has a similar table of its own, each one eerily missing its owner's actual result value while yet somehow being globally consistent across its set-membership. This property essentially corresponds to maximally preserving the available entropy during the (inherently lossy) bit-width reduction task.
So we see that every one of the 2⁶⁴ possible source values independently imposes, on exactly 64 other source values, the constraint of excluding one of the possible result values. What defies my intuition about this is that there are untold quadrillions of these 64-member sets, each of whose members also belongs to 63 other, seemingly unrelated bit-twiddling sets. Yet somehow despite this most confounding puzzle of interwoven constraints, it is nevertheless trivial to exploit the one (I surmise) resolution which simultaneously satisfies them all exactly.
All this seems related to something you may have noticed in the tables above: namely, I don't see any obvious way to extend the technique to the case of compressing down to a 1-bit result. In this case, there are only two possible result values { 0, 1 }, so if any/every given (e.g.) 64-bit input value still summarily excludes its own result from being the result for all 64 of its single-bit-flip neighbors, then that now essentially imposes the other, only remaining value on those 64. The math breakdown we see in the table seems to be signalling that a simultaneous result under such conditions is a bridge too far.
In other words, the special 'information-preserving' characteristic of xor (that is, its luxuriously reliable guarantee that, as opposed to and, or, etc., it c̲a̲n̲ and w̲i̲l̲l̲ always change a bit) not surprisingly exacts a certain cost, namely, a fiercely non-negotiable demand for a certain amount of elbow room—at least 2 bits—to work with.
I think this is the best you're going to get. You could compress the code to a single line but the var's are there for now as documentation:
function hash_32b_to_16b(val32b) {
var rightBits = val32b & 0xffff; // Left-most 16 bits
var leftBits = val32b & 0xffff0000; // Right-most 16 bits
leftBits = leftBits >>> 16; // Shift the left-most 16 bits to a 16-bit value
return rightBits ^ leftBits; // XOR the left-most and right-most bits
}
Given the parameters of the problem, the best solution would have each 16-bit hash correspond to exactly 2^16 32-bit numbers. It would also IMO hash sequential 32-bit numbers differently. Unless I'm missing something, I believe this solution does those two things.
I would argue that security cannot be a consideration in this problem, as the hashed value is just too few bits. I believe that the solution I gave provides even distribution of 32-bit numbers to 16-bit hashes
This depends on the nature of the integers.
If they can contain some bit-masks, or can differ by powers of two, then simple XORs will have high probability of collisions.
You can try something like (i>>16) ^ ((i&0xffff) * p) with p being a prime number.
Security-hashes like MD5 are all good, but they are obviously an overkill here. Anything more complex than CRC16 is overkill.
I would say just apply a standard hash like sha1 or md5 and then grab the last 16 bits of that.
Assuming that you expect the least significant bits to 'vary' the most, I think you're probably going to get a good enough distribution by just using the lower 16-bits of the value as a hash.
If the numbers you're going to hash won't have that kind of distribution, then the additional step of xor-ing in the upper 16 bits might be helpful.
Of course this suggestion is if you're intending to use the hash merely for some sort of lookup/storage scheme and aren't looking for the crypto-related properties of non-guessability and non-reversability (which the xor-ing suggestions don't really buy you either).
Something simple like this....
function hash_32b_to_16b(val32b) {
var h = hmac(secretKey, sha512);
var v = val32b;
for(var i = 0; i < 4096; ++i)
v = h(v);
return v % 0xffff;
}
I heard that you could right-shift a number by .5 instead of using Math.floor(). I decided to check its limits to make sure that it was a suitable replacement, so I checked the following values and got the following results in Google Chrome:
2.5 >> .5 == 2;
2.9999 >> .5 == 2;
2.999999999999999 >> .5 == 2; // 15 9s
2.9999999999999999 >> .5 == 3; // 16 9s
After some fiddling, I found out that the highest possible value of two which, when right-shifted by .5, would yield 2 is 2.9999999999999997779553950749686919152736663818359374999999¯ (with the 9 repeating) in Chrome and Firefox. The number is 2.9999999999999997779¯ in IE.
My question is: what is the significance of the number .0000000000000007779553950749686919152736663818359374? It's a very strange number and it really piqued my curiosity.
I've been trying to find an answer or at least some kind of pattern, but I think my problem lies in the fact that I really don't understand the bitwise operation. I understand the idea in principle, but shifting a bit sequence by .5 doesn't make any sense at all to me. Any help is appreciated.
For the record, the weird digit sequence changes with 2^x. The highest possible values of the following numbers that still truncate properly:
for 0: 0.9999999999999999444888487687421729788184165954589843749¯
for 1: 1.9999999999999999888977697537484345957636833190917968749¯
for 2-3: x+.99999999999999977795539507496869191527366638183593749¯
for 4-7: x+.9999999999999995559107901499373838305473327636718749¯
for 8-15: x+.999999999999999111821580299874767661094665527343749¯
...and so forth
Actually, you're simply ending up doing a floor() on the first operand, without any floating point operations going on. Since the left shift and right shift bitwise operations only make sense with integer operands, the JavaScript engine is converting the two operands to integers first:
2.999999 >> 0.5
Becomes:
Math.floor(2.999999) >> Math.floor(0.5)
Which in turn is:
2 >> 0
Shifting by 0 bits means "don't do a shift" and therefore you end up with the first operand, simply truncated to an integer.
The SpiderMonkey source code has:
switch (op) {
case JSOP_LSH:
case JSOP_RSH:
if (!js_DoubleToECMAInt32(cx, d, &i)) // Same as Math.floor()
return JS_FALSE;
if (!js_DoubleToECMAInt32(cx, d2, &j)) // Same as Math.floor()
return JS_FALSE;
j &= 31;
d = (op == JSOP_LSH) ? i << j : i >> j;
break;
Your seeing a "rounding up" with certain numbers is due to the fact the JavaScript engine can't handle decimal digits beyond a certain precision and therefore your number ends up getting rounded up to the next integer. Try this in your browser:
alert(2.999999999999999);
You'll get 2.999999999999999. Now try adding one more 9:
alert(2.9999999999999999);
You'll get a 3.
This is possibly the single worst idea I have ever seen. Its only possible purpose for existing is for winning an obfusticated code contest. There's no significance to the long numbers you posted -- they're an artifact of the underlying floating-point implementation, filtered through god-knows how many intermediate layers. Bit-shifting by a fractional number of bytes is insane and I'm surprised it doesn't raise an exception -- but that's Javascript, always willing to redefine "insane".
If I were you, I'd avoid ever using this "feature". Its only value is as a possible root cause for an unusual error condition. Use Math.floor() and take pity on the next programmer who will maintain the code.
Confirming a couple suspicions I had when reading the question:
Right-shifting any fractional number x by any fractional number y will simply truncate x, giving the same result as Math.floor() while thoroughly confusing the reader.
2.999999999999999777955395074968691915... is simply the largest number that can be differentiated from "3". Try evaluating it by itself -- if you add anything to it, it will evaluate to 3. This is an artifact of the browser and local system's floating-point implementation.
If you wanna go deeper, read "What Every Computer Scientist Should Know About Floating-Point Arithmetic": https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Try this javascript out:
alert(parseFloat("2.9999999999999997779553950749686919152736663818359374999999"));
Then try this:
alert(parseFloat("2.9999999999999997779553950749686919152736663818359375"));
What you are seeing is simple floating point inaccuracy. For more information about that, see this for example: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems.
The basic issue is that the closest that a floating point value can get to representing the second number is greater than or equal to 3, whereas the closes that the a float can get to the first number is strictly less than three.
As for why right shifting by 0.5 does anything sane at all, it seems that 0.5 is just itself getting converted to an int (0) beforehand. Then the original float (2.999...) is getting converted to an int by truncation, as usual.
I don't think your right shift is relevant. You are simply beyond the resolution of a double precision floating point constant.
In Chrome:
var x = 2.999999999999999777955395074968691915273666381835937499999;
var y = 2.9999999999999997779553950749686919152736663818359375;
document.write("x=" + x);
document.write(" y=" + y);
Prints out: x = 2.9999999999999996 y=3
The shift right operator only operates on integers (both sides). So, shifting right by .5 bits should be exactly equivalent to shifting right by 0 bits. And, the left hand side is converted to an integer before the shift operation, which does the same thing as Math.floor().
I suspect that converting 2.9999999999999997779553950749686919152736663818359374999999
to it's binary representation would be enlightening. It's probably only 1 bit different
from true 3.
Good guess, but no cigar.
As the double precision FP number has 53 bits, the last FP number before 3 is actually
(exact): 2.999999999999999555910790149937383830547332763671875
But why it is
2.9999999999999997779553950749686919152736663818359375
(and this is exact, not 49999... !)
which is higher than the last displayable unit ? Rounding. The conversion routine (String to number) simply is correctly programmed to round the input the the next floating point number.
2.999999999999999555910790149937383830547332763671875
.......(values between, increasing) -> round down
2.9999999999999997779553950749686919152736663818359375
....... (values between, increasing) -> round up to 3
3
The conversion input must use full precision. If the number is exactly the half between
those two fp numbers (which is 2.9999999999999997779553950749686919152736663818359375)
the rounding depends on the setted flags. The default rounding is round to even, meaning that the number will be rounded to the next even number.
Now
3 = 11. (binary)
2.999... = 10.11111111111...... (binary)
All bits are set, the number is always odd. That means that the exact half number will be rounded up, so you are getting the strange .....49999 period because it must be smaller than the exact half to be distinguishable from 3.
I suspect that converting 2.9999999999999997779553950749686919152736663818359374999999 to its binary representation would be enlightening. It's probably only 1 bit different from true 3.
And to add to John's answer, the odds of this being more performant than Math.floor are vanishingly small.
I don't know if JavaScript uses floating-point numbers or some kind of infinite-precision library, but either way, you're going to get rounding errors on an operation like this -- even if it's pretty well defined.
It should be noted that the number ".0000000000000007779553950749686919152736663818359374" is quite possibly the Epsilon, defined as "the smallest number E such that (1+E) > 1."