Description of numeric type in javascript - javascript

I am looking to describe how numbers are stored in javascript to a lay person. Would the following statement be accurate:
Very large numbers in javascript are often approximated.
However,precision should be guaranteed to 16 digits.
For example, 123455.373849 can always be represented accurately,
but the number 9,007,199,254,740,991,293 may not be.
Is there a better way to explain it, or any inaccuracies in the above statement?

16 digits? No, not really. Up to 53bit integers can be represented accurately and every number that can be represented as (53bit) * 2 ** (10bit).
Also, there are no 64bit integers in JavaScript, there are 64bit floating point numbers (and only 53bit of that hold the integer part), and BigInts that can have far more bits.
Very large numbers in javascript are often approximated.
Kind of, very large integers can only be approximated (or you use BigInts), however even small non integers, e.g. 0.1 can also not be represented exactly.
For example, 123455.373849 can always be represented accurately
No, probably not.
but the number 9,007,199,254,740,991,293 may not be.
Yup, thats far beyond 2 ** 53 - 1.

Related

What's the maximum precision (after the decimal point) of a float in Javascript

An algorithm I'm using needs to squeeze as many levels of precision as possible from a float number in Javascript. I don't mind whether the precision comes from a number that is very large or with a lot of numbers after the decimal point, I just literally need as many numerals in it as possible.
(If you care why, it is for a drag n' drop ranking algorithm which has to deal with a lot of halvings before rebalancing itself. I do also know there are better string-based algorithms but the numerical approach suits my purposes)
The MDN Docs say that:
The JavaScript Number type is a double-precision 64-bit binary format IEEE 754 value, like double in Java or C#. This means it can represent fractional values, but there are some limits to what it can store. A Number only keeps about 17 decimal places of precision; arithmetic is subject to rounding.
How should I best use the "17 decimal places of precision"?
Does the 17 decimal places mean "17 numerals in total, inclusive of those before and after the decimal place"
e.g. (adding underscores to represent thousand-separators for readability)
# 17 numerals: safe
111_222_333_444_555_66
# 17 numerals + decimal point: safe
111_222_333_444_555_6.6
1.11_222_333_444_555_66
# 18 numerals: unsafe
111_222_333_444_555_666
# 18 numerals + decimal point: unsafe
1.11_222_333_444_555_666
111_222_333_444_555_66.6
I assume that the precision of the number determines the number of numerals that you can use and that the position of the decimal point in those numerals is effectively academic.
Am I thinking about the problem correctly?
Does the presence of the decimal point have any bearing on the calculation or is it simply a matter of the number of numerals present
Should I assume that 17 numerals is safe / 18 is unsafe?
Does this vary by browser (not just today but over say, a 10 year window, should one assume that browser precision may increase)?
Short answer: you can probably squeeze out 15 "safe" digits, and it doesn't matter where you place your decimal point.
It's anyone's guess how the JavaScript standard is going to evolve and use other number representations.
Notice how the MDN doc says "about 17 decimals"? Right, it's because sometimes you can represent that many digits, and sometimes less. It's because the floating point representation doesn't map 1-to-1 to our decimal system.
Even numbers with seemingly less information will give rounding errors.
For example
0.1 + 0.2 => 0.30000000000000004
console.log(0.1 + 0.2);
However, in this case we have a lot of margin in the precision, so you can just ask for the precision you want to get rid of the rounding error
console.log((0.1 + 0.2).toPrecision(1));
For a larger illustration of this, consider the following snippet:
for(let i=0;i<22;i++) {
console.log(Number.MAX_SAFE_INTEGER / (10 ** i));
}
You will see a lot of rounding errors on digit 16. However, there would be cases where even the 16th decimal shows a rounding error. If you look here
https://en.wikipedia.org/wiki/IEEE_754
it states that binary 64 has 15.95 decimal digits. That's why I'd guess that 15 digits is the max precision you will get out of this.
You'd have to do your operations, and before you save back the number to any representational form, you'd have to do .toPrecision(15).
Finally this has some good explanations. https://floating-point-gui.de/formats/fp/
BTW, I got curious by reading this question so I read up as I wrote this answer. There are many people with better knowledge of this than me.
Does the presence of the decimal point have any bearing on the calculation or is it simply a matter of the number of numerals present
Kinda. To answer that, you'll need to look into how 64bit "double precision" floating point numbers are represented in memory. The "number of numerals" roughly translates into "length of the mantissa", which is indeed fixed and independent from the position of the point. However: it's binary digits and a binary point, not decimal digits and the decimal point. They do not correspond to each other directly. And then there's stuff like subnormal numbers.
Should I assume that 17 numerals is safe / 18 is unsafe?
No. In fact, only 15 decimal numerals would be "safe" if that's the representation you're starting with and want to exactly represent as a double.
Does this vary by browser (not just today but over say, a 10 year window, should one assume that browser precision may increase)?
No, it doesn't vary. The JavaScript number type will always be 64bit doubles.
Am I thinking about the problem correctly?
No.
You say you're considering this in the context of a drag'n'drop ranking algorithm, and you don't want do this string-based. However, thinking about decimal places in numbers is essentially thinking about string representation of numbers. Don't do that - either go all the way to strings, or treat numbers as binary.
Since you also mention "rebalancing", I assume you want to use numbers to encode the position of each item in a binary tree. That's a reasonable approach, but you really need to consider the binary representation of the number for that. And you really should use integers there, not floating-point numbers, as the logic would be much more complex otherwise. Start by deciding how many bits you want to use. There are some limitations for each, so choose wisely:
31/32 bit are what JS bitwise operators for numbers work on. Supported by all browsers easily.
53 bit are the range of integers you can exactly represent with floating-point numbers. Integer arithmetic will work as expected up to that size. Bitwise operations require extra code.
Fixed multiples of 8 (say, 64 bit) are what you can represent with typed arrays. Bitwise operations can be done part-wise, arithmetic operations require extra code. Or use a BigUint64Array that gives you 64 bits as a bigint to calculate with/operate on, but is not supported in old browsers.
Arbitrary precision can be achieved with bigint numbers, which support both bitwise and arithmetic operations, but again don't work in old browsers. Polyfills and bigint libraries are available though.

How to implement parseFloat

Wondering how a low-level implementation of parseFloat such as how it works in JavaScript would be implemented.
All the examples I've seen of typecasting resort to using it at some point, such as this, this, or this. On the other hand, there is this file which is quite large (from here).
Wondering if it is just a very complicated function or there is a straightforward implementation. Wondering just generally how it works if it is too complicated.
Perhaps this is closer to it.
The essential mathematics of parseFloat is very simple, requiring no more than elementary-school arithmetic. If we have a decimal numeral, we can easily convert it to binary by:
Divide the integer part by two. The remainder (zero or one) becomes a bit in a binary numeral we are building. The quotient replaces the integer part, and we repeat until the integer part is zero. For example, starting with 13, we divide to get a quotient of 6 and a remainder of 1. Then we divide 6 to get a quotient of 3 and a remainder of 0. Then 1 and 1, then 0 and 1, and we are done. The bits we produced, in reverse order, were 1101, and that is the binary numeral for 13.
Multiply the sub-integer part by two. The integer part becomes another bit in the binary numeral. Repeat with the sub-integer part until it is zero or we have enough bits to determine the result. For example, with .1875, we multiply by two to get .375, which has an integer part of 0. Doubling again produces .75, which again has an integer part of 0. Next we get 1.5, which has an integer part of 1. Now when the sub-integer part, .5, is doubled, we get 1 with a sub-integer part of 0. The new bits are .0011.
To determine a floating-point number, we need as many bits as fit in the significand (starting with the leading 1 bit from the binary numeral), and, for rounding purposes, we need to know the next bit and whether any bits after that are non-zero. (The information about the extra bits tells us whether the difference between the source value and the bits that fit in the significand is zero, not zero but less than 1/2 of the lowest bit that fits, exactly 1/2 of the lowest bit, or more than 1/2 of the lowest bit. This information is enough to decide whether to round up or down in any of the usual rounding modes.)
The information above tells you when to stop multiplying in the second part of the algorithm. As soon as you have all the significand bits, plus one more, plus you have either one non-zero bit or the sub-integer part is zero, you have all the information you need and can stop.
Then you construct a floating-point value by rounding the bits according to whatever rounding rule you are using (often round-to-nearest-ties-to-even), putting the bits into the significand of a floating-point object, and setting the exponent to record the position of the leading bit of the binary numeral.
There are some embellishments for checking for overflow or underflow or handling subnormal values. However, the basic arithmetic is simply elementary-school arithmetic.
Problems arise because the above uses arbitrary-size arrays and because it does not support scientific notation where an “e” is used to introduce a decimal exponent, as in “2.79e34”. The above algorithm requires that we maintain all the space needed to multiply and divide decimal numerals of any length given to us. Usually, we do not want to do that, and we also want faster algorithms. Note that supporting scientific notation with the above algorithm would also require arbitrary-size arrays. To fill out the decimal numeral for “2.79e34”, we have to fill an array with “27900000000000000000000000000000000”.
So algorithms are developed to do the conversion in smarter ways. Instead of doing exact calculations, we may do precise calculations but carefully analyze the errors produced to ensure they are too small to prevent us from getting the right answer. Also, data may be prepared in advance, such as tables with information about powers of ten, so that we have approximate values of powers of ten already in binary without having to compute them each time a conversion is performed.
The complications of converting decimal to binary floating-point arise out of this desire for algorithms that are fast and use limited resources. Allowing some errors causes a need for mathematical proofs to ensure the computations are correct, and trying to make the routines fast and resource-efficient lead people to think of clever techniques to use, which become tricky and require proof.

In JavaScript, how do I ensure floating point numbers stay under 32bits?

Obviously numbers in JavaScript aren't explicitly typed, but are represented as types by the interpreter. I just saw a thing about Google's V8 JS engine that said it's greatly optimized for 32 bit numbers, but found it odd many JS programmers would have a need for doubles even with floating point. The only examples I could think of personally is if I'm dividing two integers, which I do often in order to normalize screen coordinates between 0 and 1, and the interpreter is truncating the result at 64 bits instead of 32. This also seems unlikely to me, but then again I don't know how else someone needing such precision would specify it. So now I'm wondering...is there a way to ensure the quotient of two (not gigantic) integers is under 32 bits in length?
I just saw a thing about Google's V8 JS engine that said it's greatly optimized for 32 bit numbers
This only means that V8 does internally store those numbers as integers when it can deduce that they will stay in the respective range. This is common for counters or array indices, for example.
Is there a way to ensure the quotient of two (not gigantic) integers is under 32 bits in length?
No - all arithmetic operations are carried out as if they were 64 bit floating point numbers (like all numbers in JS). They only thing you can do is to truncate the result back to a 32 bit integer. You'll use the bitwise right shift operator for that which internally casts its operands to integers:
var q = (a / b) >>> 0;
See What is the JavaScript >>> operator and how do you use it? for details.

JavaScript floating point

I was wondering what the floating point limitations of JavaScript were. I have the following code that doesn't return a detailed floating point result as in:
2e-2 is the same as 0.02
var numero = 1E-12;
document.write(numero);
returns 1e-12.
What is the max exponential result that JavaScript can handle?
JavaScript is an implementation of ECMAScript, specified in Ecma-262 and ISO/IEC 16262. Ecma-262 specifies that IEEE 754 64-bit binary floating point is used.
In this format, the smallest positive number is 2–1074 (slightly over 4.94e–324), and the largest finite number is 21024–2971 (slightly under 1.798e308). Infinity can be represented, so, in this sense, there is no upper limit to the value of a number in JavaScript.
Numbers in this format have at most 53 bits in their significands (fraction parts). (Numbers under 2–1022 are subnormal and have fewer bits.) The limited number of bits means that many numbers are not exactly representable, including the .02 in your example. Consequently, the results of arithmetic operations are rounded to the nearest representable values, and errors in chains of calculations may cancel or may accumulate, even catastrophically.
The format also includes some special entities called NaNs (for Not a Number). NaNs may be used to indicate that a number has not been initialized, for special debugging purposes, or to represent the result of an operation for which no number is suitable (such as the square root of –1).
The maximum that is less than zero..?
There is a detailed discussion here. Basically, a 0.00000x will be displayed in exponential (5 zeroes after the decimal).
Of course, you could test this for yourself ;) particularly to see if this behaviour is reliable cross-browser.
Personally, I don't think you should be concerned, or rely, on this behaviour. When you come to display a number just format it appropriately.

Convert large number in javascript

After pasting the number t=3.7333333258105216E16 in jsconsole.com or in Web Inspector, I get 37333333258105220.
parseFloat(3.7333333258105216E16) gives the same result.
What is the reason ?
JavaScript represents numbers as floats. This storage format consists of 64 bits. One bit is for the sign, 11 bits are for the power of 10 to multiply the number by, and 52 bits are for the number.
Because of the above, numbers can be acurate to the 1/2^52, or 1 / 4,503,599,627,370,496. Thus, numbers are accurate to within this fraction. Check out this wikipedia page for more information on floating point numbers.
I tested this out by trying to add one to 4,503,599,627,370,495. It gets to 4,503,599,627,370,496, but does not get past it. Here's the fiddle for testing.
You are encountering floating-point roundoff. JavaScript numbers are implemented as double precision, 64-bit floats according to the IEEE 754 standard.
You can't always accurately represent a floating point decimal number in binary. It is losing precision at the end of the number so it can fit in 64 bits.

Categories

Resources