In JavaScript, how do I ensure floating point numbers stay under 32bits? - javascript

Obviously numbers in JavaScript aren't explicitly typed, but are represented as types by the interpreter. I just saw a thing about Google's V8 JS engine that said it's greatly optimized for 32 bit numbers, but found it odd many JS programmers would have a need for doubles even with floating point. The only examples I could think of personally is if I'm dividing two integers, which I do often in order to normalize screen coordinates between 0 and 1, and the interpreter is truncating the result at 64 bits instead of 32. This also seems unlikely to me, but then again I don't know how else someone needing such precision would specify it. So now I'm wondering...is there a way to ensure the quotient of two (not gigantic) integers is under 32 bits in length?

I just saw a thing about Google's V8 JS engine that said it's greatly optimized for 32 bit numbers
This only means that V8 does internally store those numbers as integers when it can deduce that they will stay in the respective range. This is common for counters or array indices, for example.
Is there a way to ensure the quotient of two (not gigantic) integers is under 32 bits in length?
No - all arithmetic operations are carried out as if they were 64 bit floating point numbers (like all numbers in JS). They only thing you can do is to truncate the result back to a 32 bit integer. You'll use the bitwise right shift operator for that which internally casts its operands to integers:
var q = (a / b) >>> 0;
See What is the JavaScript >>> operator and how do you use it? for details.

Related

What's the maximum precision (after the decimal point) of a float in Javascript

An algorithm I'm using needs to squeeze as many levels of precision as possible from a float number in Javascript. I don't mind whether the precision comes from a number that is very large or with a lot of numbers after the decimal point, I just literally need as many numerals in it as possible.
(If you care why, it is for a drag n' drop ranking algorithm which has to deal with a lot of halvings before rebalancing itself. I do also know there are better string-based algorithms but the numerical approach suits my purposes)
The MDN Docs say that:
The JavaScript Number type is a double-precision 64-bit binary format IEEE 754 value, like double in Java or C#. This means it can represent fractional values, but there are some limits to what it can store. A Number only keeps about 17 decimal places of precision; arithmetic is subject to rounding.
How should I best use the "17 decimal places of precision"?
Does the 17 decimal places mean "17 numerals in total, inclusive of those before and after the decimal place"
e.g. (adding underscores to represent thousand-separators for readability)
# 17 numerals: safe
111_222_333_444_555_66
# 17 numerals + decimal point: safe
111_222_333_444_555_6.6
1.11_222_333_444_555_66
# 18 numerals: unsafe
111_222_333_444_555_666
# 18 numerals + decimal point: unsafe
1.11_222_333_444_555_666
111_222_333_444_555_66.6
I assume that the precision of the number determines the number of numerals that you can use and that the position of the decimal point in those numerals is effectively academic.
Am I thinking about the problem correctly?
Does the presence of the decimal point have any bearing on the calculation or is it simply a matter of the number of numerals present
Should I assume that 17 numerals is safe / 18 is unsafe?
Does this vary by browser (not just today but over say, a 10 year window, should one assume that browser precision may increase)?
Short answer: you can probably squeeze out 15 "safe" digits, and it doesn't matter where you place your decimal point.
It's anyone's guess how the JavaScript standard is going to evolve and use other number representations.
Notice how the MDN doc says "about 17 decimals"? Right, it's because sometimes you can represent that many digits, and sometimes less. It's because the floating point representation doesn't map 1-to-1 to our decimal system.
Even numbers with seemingly less information will give rounding errors.
For example
0.1 + 0.2 => 0.30000000000000004
console.log(0.1 + 0.2);
However, in this case we have a lot of margin in the precision, so you can just ask for the precision you want to get rid of the rounding error
console.log((0.1 + 0.2).toPrecision(1));
For a larger illustration of this, consider the following snippet:
for(let i=0;i<22;i++) {
console.log(Number.MAX_SAFE_INTEGER / (10 ** i));
}
You will see a lot of rounding errors on digit 16. However, there would be cases where even the 16th decimal shows a rounding error. If you look here
https://en.wikipedia.org/wiki/IEEE_754
it states that binary 64 has 15.95 decimal digits. That's why I'd guess that 15 digits is the max precision you will get out of this.
You'd have to do your operations, and before you save back the number to any representational form, you'd have to do .toPrecision(15).
Finally this has some good explanations. https://floating-point-gui.de/formats/fp/
BTW, I got curious by reading this question so I read up as I wrote this answer. There are many people with better knowledge of this than me.
Does the presence of the decimal point have any bearing on the calculation or is it simply a matter of the number of numerals present
Kinda. To answer that, you'll need to look into how 64bit "double precision" floating point numbers are represented in memory. The "number of numerals" roughly translates into "length of the mantissa", which is indeed fixed and independent from the position of the point. However: it's binary digits and a binary point, not decimal digits and the decimal point. They do not correspond to each other directly. And then there's stuff like subnormal numbers.
Should I assume that 17 numerals is safe / 18 is unsafe?
No. In fact, only 15 decimal numerals would be "safe" if that's the representation you're starting with and want to exactly represent as a double.
Does this vary by browser (not just today but over say, a 10 year window, should one assume that browser precision may increase)?
No, it doesn't vary. The JavaScript number type will always be 64bit doubles.
Am I thinking about the problem correctly?
No.
You say you're considering this in the context of a drag'n'drop ranking algorithm, and you don't want do this string-based. However, thinking about decimal places in numbers is essentially thinking about string representation of numbers. Don't do that - either go all the way to strings, or treat numbers as binary.
Since you also mention "rebalancing", I assume you want to use numbers to encode the position of each item in a binary tree. That's a reasonable approach, but you really need to consider the binary representation of the number for that. And you really should use integers there, not floating-point numbers, as the logic would be much more complex otherwise. Start by deciding how many bits you want to use. There are some limitations for each, so choose wisely:
31/32 bit are what JS bitwise operators for numbers work on. Supported by all browsers easily.
53 bit are the range of integers you can exactly represent with floating-point numbers. Integer arithmetic will work as expected up to that size. Bitwise operations require extra code.
Fixed multiples of 8 (say, 64 bit) are what you can represent with typed arrays. Bitwise operations can be done part-wise, arithmetic operations require extra code. Or use a BigUint64Array that gives you 64 bits as a bigint to calculate with/operate on, but is not supported in old browsers.
Arbitrary precision can be achieved with bigint numbers, which support both bitwise and arithmetic operations, but again don't work in old browsers. Polyfills and bigint libraries are available though.

How does V8 store integers like 5?

How does V8 store integers in memory?
For example the integer 5?
I know it stores it the heap, but how exactly does it store it?
Things like metadata and the actual value itself.
Is there a constant added to the int before storing it?
V8 uses a pointer tagging scheme to distinguish small integers and heap object pointers. 5 would be stored as a Smi type, which is not heap allocated in V8.
You can check out the source code for the Smi class to learn more.
On 32-bit platforms, Smis are a 31 bit signed int with a 0 set for the bottom bit.
On 64-bit platforms, Smis are a 32 bit signed int, 31 bits of 0 padding and a 0 for the bottom bit.
Pointers to heap objects have a 1 set for the bottom bit so that V8 can tell the difference between pointers and Smis without extra metadata.
In Javascript, all numbers are stored as 64bit floating point values. C and C++ call this type double. There is no distinct "integer" type.
To some degree, you can use integer values naivly and get the result you expect, without having to fear rounding errors. These integers are so called "safe" integers.
All integers in the range [-(2^53 - 1), +(2^53 - 1)] are "safe" integers, as described here. This means that if you add, subtract or multiply integers in that range, and the result is within that range too, then the calculation is without rounding errors.
Of course, all values in Javascript/V8 are somehow "boxed", because a variable doesn't have a type (except small integers which use tagged pointers). If you have a variable x that is 5.25, it has to know that it is a "number" and that that number is 5.25. So it will take more than 8 bytes of space. You will have to look up the source code of v8 to find out more.

Description of numeric type in javascript

I am looking to describe how numbers are stored in javascript to a lay person. Would the following statement be accurate:
Very large numbers in javascript are often approximated.
However,precision should be guaranteed to 16 digits.
For example, 123455.373849 can always be represented accurately,
but the number 9,007,199,254,740,991,293 may not be.
Is there a better way to explain it, or any inaccuracies in the above statement?
16 digits? No, not really. Up to 53bit integers can be represented accurately and every number that can be represented as (53bit) * 2 ** (10bit).
Also, there are no 64bit integers in JavaScript, there are 64bit floating point numbers (and only 53bit of that hold the integer part), and BigInts that can have far more bits.
Very large numbers in javascript are often approximated.
Kind of, very large integers can only be approximated (or you use BigInts), however even small non integers, e.g. 0.1 can also not be represented exactly.
For example, 123455.373849 can always be represented accurately
No, probably not.
but the number 9,007,199,254,740,991,293 may not be.
Yup, thats far beyond 2 ** 53 - 1.

How to implement parseFloat

Wondering how a low-level implementation of parseFloat such as how it works in JavaScript would be implemented.
All the examples I've seen of typecasting resort to using it at some point, such as this, this, or this. On the other hand, there is this file which is quite large (from here).
Wondering if it is just a very complicated function or there is a straightforward implementation. Wondering just generally how it works if it is too complicated.
Perhaps this is closer to it.
The essential mathematics of parseFloat is very simple, requiring no more than elementary-school arithmetic. If we have a decimal numeral, we can easily convert it to binary by:
Divide the integer part by two. The remainder (zero or one) becomes a bit in a binary numeral we are building. The quotient replaces the integer part, and we repeat until the integer part is zero. For example, starting with 13, we divide to get a quotient of 6 and a remainder of 1. Then we divide 6 to get a quotient of 3 and a remainder of 0. Then 1 and 1, then 0 and 1, and we are done. The bits we produced, in reverse order, were 1101, and that is the binary numeral for 13.
Multiply the sub-integer part by two. The integer part becomes another bit in the binary numeral. Repeat with the sub-integer part until it is zero or we have enough bits to determine the result. For example, with .1875, we multiply by two to get .375, which has an integer part of 0. Doubling again produces .75, which again has an integer part of 0. Next we get 1.5, which has an integer part of 1. Now when the sub-integer part, .5, is doubled, we get 1 with a sub-integer part of 0. The new bits are .0011.
To determine a floating-point number, we need as many bits as fit in the significand (starting with the leading 1 bit from the binary numeral), and, for rounding purposes, we need to know the next bit and whether any bits after that are non-zero. (The information about the extra bits tells us whether the difference between the source value and the bits that fit in the significand is zero, not zero but less than 1/2 of the lowest bit that fits, exactly 1/2 of the lowest bit, or more than 1/2 of the lowest bit. This information is enough to decide whether to round up or down in any of the usual rounding modes.)
The information above tells you when to stop multiplying in the second part of the algorithm. As soon as you have all the significand bits, plus one more, plus you have either one non-zero bit or the sub-integer part is zero, you have all the information you need and can stop.
Then you construct a floating-point value by rounding the bits according to whatever rounding rule you are using (often round-to-nearest-ties-to-even), putting the bits into the significand of a floating-point object, and setting the exponent to record the position of the leading bit of the binary numeral.
There are some embellishments for checking for overflow or underflow or handling subnormal values. However, the basic arithmetic is simply elementary-school arithmetic.
Problems arise because the above uses arbitrary-size arrays and because it does not support scientific notation where an “e” is used to introduce a decimal exponent, as in “2.79e34”. The above algorithm requires that we maintain all the space needed to multiply and divide decimal numerals of any length given to us. Usually, we do not want to do that, and we also want faster algorithms. Note that supporting scientific notation with the above algorithm would also require arbitrary-size arrays. To fill out the decimal numeral for “2.79e34”, we have to fill an array with “27900000000000000000000000000000000”.
So algorithms are developed to do the conversion in smarter ways. Instead of doing exact calculations, we may do precise calculations but carefully analyze the errors produced to ensure they are too small to prevent us from getting the right answer. Also, data may be prepared in advance, such as tables with information about powers of ten, so that we have approximate values of powers of ten already in binary without having to compute them each time a conversion is performed.
The complications of converting decimal to binary floating-point arise out of this desire for algorithms that are fast and use limited resources. Allowing some errors causes a need for mathematical proofs to ensure the computations are correct, and trying to make the routines fast and resource-efficient lead people to think of clever techniques to use, which become tricky and require proof.

JavaScript 64 bit numeric precision

Is there a way to represent a number with higher than 53-bit precision in JavaScript? In other words, is there a way to represent 64-bit precision number?
I am trying to implement some logic in which each bit of a 64-bit number represents something. I lose the lower significant bits when I try to set bits higher than 2^53.
Math.pow(2,53) + Math.pow(2,0) == Math.pow(2,53)
Is there a way to implement a custom library or something to achieve this?
Google's Closure library has goog.math.Long for this purpose.
The GWT team have added a long emulation support so java longs really hold 64 bits. Do you want 64 bit floats or whole numbers ?
I'd just use either an array of integers or a string.
The numbers in javascript are doubles, I think there is a rounding error involved in your equation.
Perhaps I should have added some technical detail. Basically the GWT long emulation uses a tuple of two numbers, the first holding the high 32 bits and the second the low 32 bits of the 64 bit long.
The library of course contains methods to add stuff like adding two "longs" and getting a "long" result. Within your GWT Java code it just looks like two regular longs - one doesn't need to fiddle or be aware of the tuple. By using this approach GWT avoids the problem you're probably alluding to, namely "longs" dropping the lower bits of precision which isn't acceptable in many cases.
Whilst floats are by definition imprecise / approximations of a value, a whole number like a long isn't. GWT always holds a 64 bit long - maths using such longs never use precision. The exception to this is overflows but that accurately matches what occurs in Java etc when you add two very large long values which require more than 64 bits - eg 2^32-1 + 2^32-1.
To do the same for floating point numbers will require a similar approach. You will need to have a library that uses a tuple.
The following code might work for you; I haven't tested it however yet:
BigDecimal for JavaScript
Yes, 11 bit are reserved for exponent, only 52 bits containt value also called fraction.
Javascript allows bitwise operations on numbers but only first 32 bits are used in those operations according to Javascript standard specification.
I do not understand misleading GWT/Java/long answers in Javascript/double question though? Javascript is not Java.
Why would anyone need 64 bit precision in javascript ?
Longs sometimes hold ID of stuff in a DB so its important not to lose some of the lower bits... but floating point numbers are most of the time used for calculations. To use floats to hold monetary or similar exacting values is plain wrong. If you truely need 64 bit precision do the maths on the server where its faster and so on.

Categories

Resources