Adding to Number.MAX_VALUE - javascript

The answer to this question may be painfully obvious but I can't find it in the Mozilla docs nor on Google from a cursory search.
If you have some code like this
Number.MAX_VALUE + 1; // Infinity, right?
Number.MIN_VALUE - 1; // -Infinity, right?
Then I would expect adding anything to Number.MAX_VALUE would push it over to Infinity. The result is just Number.MAX_VALUE spat right back at me.
However, when playing around in the Chrome JS console, I noticed that it didn't actually become Infinity until I added/subtracted enough:
Number.MAX_VALUE + Math.pow(100,1000); // now we hit Infinity
Number.MIN_VALUE - Math.pow(100,1000); // -Infinity at last
What is the explanation for this "buffer" between Number.MAX_VALUE and Infinity?

Standardwise...
In ECMAScript, addition of two nonzero finite numbers is implemented as (ECMA-262 §11.6.3 "Applying the Additive Operators to Numbers"):
the sum is computed and rounded to the nearest representable value using IEEE 754 round-to-nearest mode. If the magnitude is too large to represent, the operation overflows and the result is then an infinity of appropriate sign.
IEEE-754's round-to-nearest mode specifies that (IEEE-754 2008 §4.3.1 "Rounding-direction attributes to nearest")
In the following two rounding-direction attributes, an infinitely precise result with magnitude at least bemax ( b − ½ b1-p ) shall round to ∞ with no change in sign; here emax and p are determined by the destination format (see 3.3). With:
roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered
roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered.
ECMAScript does not specify which of the round-to-nearest, but it doesn't matter here because both gives the same result. The number in ECMAScript is "double", in which
b = 2
emax = 1023
p = 53,
so the result must be at least 21024 - 2970 ~ 1.7976931348623158 × 10308 in order to round to infinity. Otherwise it will just round to MAX_VALUE, because that is the closer than Infinity.
Notice that MAX_VALUE = 21024 - 2971, so you need to add at least 2971 - 2970 = 2970 ~ 9.979202 × 10291 in order to get infinity. We could check:
>>> Number.MAX_VALUE + 9.979201e291
1.7976931348623157e+308
>>> Number.MAX_VALUE + 9.979202e291
Infinity
Meanwhile, your Math.pow(100,1000) ~ 26643.9 is well beyond 21024 - 2970. It is already infinity.

If you look at Number.MAX_VALUE.toString(2), you'll see that the binary representation of MAX_VALUE is 53 ones followed by 971 zeros. This because IEEE 754 floating points are made of a mantissa coefficient multiplied by a power of 2 (so the other half of the floating point number is the exponent). With MAX_VALUE, both the mantissa and the exponent are maxed out, so you see a bunch of ones bit-shifted up a lot.
In short, you need to increase MAX_VALUE enough to actually affect the mantissa, otherwise your additional value gets lost and rounded out.
Math.pow(2, 969) is the lowest power of 2 that will not tip MAX_VALUE into Infinity.

Related

Javascript Round error

I have a problem with rounding numbers.
x = 0.175;
console.log(x.toFixed(2));
// RESULT: 0.17
x = 1.175;
console.log(x.toFixed(2));
// RESULT: 1.18
x = 2.175;
console.log(x.toFixed(2));
// RESULT: 2.17
Why is (X!=1).175 not rounded to X.18?
The problem here is that 0.175 is a repeating decimal in binary (specifically, after a short prefix, it settles down to a repeating 0011 pattern). When represented in a finite floating point representation, this repeating pattern gets truncated. When you change the integer part from 0 to 1 to 2, you are adding one additional bit each time to the integer part of the number, which pushes off one trailing bit. Depending on what bit value gets pushed off, that can change the rounded value enough to affect the visible result. Note that after 2.175, the next change in rounding behavior doesn't occur until 8.175 (after two more low-order bits have been pushed off the representation).
This is the Reason behind this...
Squeezing infinitely many real numbers are into a finite number of bits requires an approximate representation.
Although there are infinitely many integers, in most programs the result of integer computations can be stored in 32 bits.
In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation.
x = 0.175;
console.log(x.toFixed(20));
// RESULT: 0.17
x = 1.175;
console.log(x.toFixed(20));
// RESULT: 1.18
x = 2.175;
console.log(x.toFixed(20));
// RESULT: 2.17
This rounding error is the characteristic feature of floating-point computation.
Source : http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
JavaScript has plenty of rounding problems, it's the result of binary machines trying to represent fractions in a decimal number system. There's always inaccuracies. Sometimes, a 5 is rounded up and other times it is rounded down. It's talked about in these articles or topics:
http://www.jacklmoore.com/notes/rounding-in-javascript/
Avoiding problems with JavaScript's weird decimal calculations
How to deal with floating point number precision in JavaScript?
Even a more precise control of floating point representation in JavaScript doesn't fix the issue:
> x=2175e-3; x.toFixed(2);
"2.17"
> x=1175e-3; x.toFixed(2);
"1.18"
In cases where it's super important to get predictable results, at least one of these articles suggest using a technique "epsilon estimation," which actually is the heart of several definitions in calculus. To learn that fix is to probably learn a lot more than you bargained for.
JavaScript Numbers are Always 64-bit Floating Point.
Unlike many other programming languages, JavaScript does not define different types of numbers, like integers, short, long, floating-point etc.
JavaScript numbers are always stored as double precision floating point numbers, following the international IEEE 754 standard.
The maximum number of decimals is 17, but floating point arithmetic is not always 100% accurate:

javascript: why is returning so much decimals? After a multiply [duplicate]

I know a little bit about how floating-point numbers are represented, but not enough, I'm afraid.
The general question is:
For a given precision (for my purposes, the number of accurate decimal places in base 10), what range of numbers can be represented for 16-, 32- and 64-bit IEEE-754 systems?
Specifically, I'm only interested in the range of 16-bit and 32-bit numbers accurate to +/-0.5 (the ones place) or +/- 0.0005 (the thousandths place).
For a given IEEE-754 floating point number X, if
2^E <= abs(X) < 2^(E+1)
then the distance from X to the next largest representable floating point number (epsilon) is:
epsilon = 2^(E-52) % For a 64-bit float (double precision)
epsilon = 2^(E-23) % For a 32-bit float (single precision)
epsilon = 2^(E-10) % For a 16-bit float (half precision)
The above equations allow us to compute the following:
For half precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^10. Any X larger than this limit leads to the distance between floating point numbers greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 1. Any X larger than this maximum limit leads to the distance between floating point numbers greater than 0.0005.
For single precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^23. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^13. Any X larger than this lmit leads to the distance between floating point numbers being greater than 0.0005.
For double precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^52. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^42. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.0005.
For floating-point integers (I'll give my answer in terms of IEEE double-precision), every integer between 1 and 2^53 is exactly representable. Beyond 2^53, integers that are exactly representable are spaced apart by increasing powers of two. For example:
Every 2nd integer between 2^53 + 2 and 2^54 can be represented exactly.
Every 4th integer between 2^54 + 4 and 2^55 can be represented exactly.
Every 8th integer between 2^55 + 8 and 2^56 can be represented exactly.
Every 16th integer between 2^56 + 16 and 2^57 can be represented exactly.
Every 32nd integer between 2^57 + 32 and 2^58 can be represented exactly.
Every 64th integer between 2^58 + 64 and 2^59 can be represented exactly.
Every 128th integer between 2^59 + 128 and 2^60 can be represented exactly.
Every 256th integer between 2^60 + 256 and 2^61 can be represented exactly.
Every 512th integer between 2^61 + 512 and 2^62 can be represented exactly.
.
.
.
Integers that are not exactly representable are rounded to the nearest representable integer, so the worst case rounding is 1/2 the spacing between representable integers.
The precision quoted form Peter R's link to the MSDN ref is probably a good rule of thumb, but of course reality is more complicated.
The fact that the "point" in "floating point" is a binary point and not decimal point has a way of defeating our intuitions. The classic example is 0.1, which needs a precision of only one digit in decimal but isn't representable exactly in binary at all.
If you have a weekend to kill, have a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic. You'll probably be particularly interested in the sections on Precision and Binary to Decimal Conversion.
First off, neither IEEE-754-2008 nor -1985 have 16-bit floats; but it is a proposed addition with a 5-bit exponent and 10-bit fraction. IEE-754 uses a dedicated sign bit, so the positive and negative range is the same. Also, the fraction has an implied 1 in front, so you get an extra bit.
If you want accuracy to the ones place, as in you can represent each integer, the answer is fairly simple: The exponent shifts the decimal point to the right-end of the fraction. So, a 10-bit fraction gets you ±211.
If you want one bit after the decimal point, you give up one bit before it, so you have ±210.
Single-precision has a 23-bit fraction, so you'd have ±224 integers.
How many bits of precision you need after the decimal point depends entirely on the calculations you're doing, and how many you're doing.
210 = 1,024
211 = 2,048
223 = 8,388,608
224 = 16,777,216
253 = 9,007,199,254,740,992 (double-precision)
2113 = 10,384,593,717,069,655,257,060,992,658,440,192 (quad-precision)
See also
Double-precision
Half-precision
See IEEE 754-1985:
Note (1 + fraction). As #bendin point out, using binary floating point, you cannot express simple decimal values such as 0.1. The implication is that you can introduce rounding errors by doing simple additions many many times or calling things like truncation. If you are interested in any sort of precision whatsoever, the only way to achieve it is to use a fixed-point decimal, which basically is a scaled integer.
If I understand your question correctly, it depends on your language.
For C#, check out the MSDN ref. Float has a 7 digit precision and double 15-16 digit precision.
It took me quite a while to figure out that when using doubles in Java, I wasn't losing significant precision in calculations. floating point actually has a very good ability to represent numbers to quite reasonable precision. The precision I was losing was immediately upon converting decimal numbers typed by users to the binary floating point representation that is natively supported. I've recently started converting all my numbers to BigDecimal. BigDecimal is much more work to deal with in the code than floats or doubles, since it's not one of the primitive types. But on the other hand, I'll be able to exactly represent the numbers that users type in.

Floating-point error mess

I have been trying to figure this floating-point problem out in javascript.
This is an example of what I want to do:
var x1 = 0
for(i=0; i<10; i++)
{
x1+= 0.2
}
However in this form I will get a rounding error, 0.2 -> 0.4 -> 0.600...001 doing that.
I have tried parseFloat, toFixed and Math.round suggested in other threads but none of it have worked for me. So are there anyone who could make this work, because I feel that I have run out of options.
You can almost always ignore the floating point "errors" while you're performing calculations - they won't make any difference to the end result unless you really care about the 17th significant digit or so.
You normally only need to worry about rounding when you display those values, for which .toFixed(1) would do perfectly well.
Whatever happens you simply cannot coerce the number 0.6 into exactly that value. The closest IEEE 754 double precision is exactly 0.59999999999999997779553950749686919152736663818359375, which when displayed within typical precision limits in JS is displayed as 0.5999999999999999778
Indeed JS can't even tell that 0.5999999999999999778 !== (e.g) 0.5999999999999999300 since their binary representation is the same.
To better understand how the rounding errors are accumulating, and get more insight on what is happenning at lower level, here is a small explanantion:
I will assume that IEEE 754 double precision standard is used by underlying software/hardware, with default rounding mode (round to nearest even).
1/5 could be written in base 2 with a pattern repeating infinitely
0.00110011001100110011001100110011001100110011001100110011...
But in floating point, the significand - starting at most significant 1 bit - has to be rounded to a finite number of bits (53)
So there is a small rounding error when representing 0.2 in binary:
0.0011001100110011001100110011001100110011001100110011010
Back to decimal representation, this rounding error corresponds to a small excess 0.000000000000000011102230246251565404236316680908203125 above 1/5
The first operation is then exact because 0.2+0.2 is like 2*0.2 and thus does not introduce any additional error, it's like shifting the fraction point:
0.0011001100110011001100110011001100110011001100110011010
+ 0.0011001100110011001100110011001100110011001100110011010
---------------------------------------------------------
0.0110011001100110011001100110011001100110011001100110100
But of course, the excess above 2/5 is doubled 0.00000000000000002220446049250313080847263336181640625
The third operation 0.2+0.2+0.2 will result in this binary number
0.011001100110011001100110011001100110011001100110011010
+ 0.0011001100110011001100110011001100110011001100110011010
---------------------------------------------------------
0.1001100110011001100110011001100110011001100110011001110
But unfortunately, it requires 54 bits of significand (the span between leading 1 and trailing 1), so another rounding error is necessary to represent the result as a double:
0.10011001100110011001100110011001100110011001100110100
Notice that the number was rounded upper, because by default floats are rounded to nearest even in case of perfect tie. We already had an error by excess, so bad luck, successive errors did cumulate rather than annihilate...
So the excess above 3/5 is now 0.000000000000000088817841970012523233890533447265625
You could reduce a bit this accumulation of errors by using
x1 = i / 5.0
Since 5 is represented exactly in float (101.0 in binary, 3 significand bits are enough), and since that will also be the case of i (up to 2^53), there is a single rounding error when performing the division, and IEEE 754 then guarantees that you get the nearest possible representation.
For example 3/5.0 is represented as:
0.10011001100110011001100110011001100110011001100110011
Back to decimal, the value is represented by default 0.00000000000000002220446049250313080847263336181640625 under 3/5
Note that both errors are very tiny, but in second case 3/5.0, four times smaller in magnitude than 0.2+0.2+0.2.
Depending on what you're doing, you may want to do fixed-point arithmetic instead of floating point. For example, if you are doing financial calculations in dollars with amounts that are always multiples of $0.01, you can switch to using cents internally, and then convert to (and from) dollars only when displaying values to the user (or reading input from the user). For more complicated scenarios, you can use a fixed-point arithmetic library.

Add a bunch of floating-point numbers with JavaScript, what is the error bound on the sum?

When I add a bunch of floating-point numbers with JavaScript, what is the error bound on the sum? What error bound should be used to check if two sums are equal?
In a simple script, I add a bunch of floating-point numbers and compare sums. I notice that sometimes the result is not correct (two sums that should be equal are not). I am pretty weak at numerical analysis, but even after reviewing Is floating point math broken? and What Every Computer Scientist Should Know About Floating-Point Arithmetic and Comparing Floating Point Numbers, 2012 Edition I am confused about how best to compare floating-point sums in JavaScript.
First, I was confused by: The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded (as if they were computed exactly then rounded to the nearest floating-point number). If JavaScript is based on the IEEE standard, how can 0.1 + 0.2 != 0.3?
I think I answered this for myself: It's easier for me to think about an example in base 10. If 1/3 is approximated 0.333...333 and 2/3 is approximated 0.666...667, 1/3 + 1/3 = 0.666...666 is exactly rounded (it is the exact sum of two approximations) but != 0.666...667. Intermediate results of exactly rounded operations are still rounded, which can still introduce error.
How big is machine epsilon? JavaScript floating-point numbers are apparently 64-bits, and apparently IEEE double precision format machine epsilon is about 1e-16?
When I add a bunch (n) of floating-point numbers (naive summation, without pairwise or Kahan summation), what is the error bound on the sum? Intuitively it is proportional to n. The worst-case example I can think of (again in base 10) is 2/3 - 1/3 - 1/3 + 2/3 - 1/3 - 1/3 + etc. I think each iteration will increment the error term by 1 ULP while the sum remains zero, so both the error term and relative error will grow without bound?
In the section "Errors in Summation" Goldberg is more precise (error term is bounded by n * machine epsilon * sum of the absolute values) but also points out that if the sum is being done in an IEEE double precision format, machine epsilon is about 1e-16, so n * machine epsilon will be much less than 1 for any reasonable value of n (n much less than 1e16). How can this error bound be used to check if two floating-point sums are equal? What relationship between the sums, 1, 1e-16, n, etc. must be true if they are equal?
Another intuition: If the bunch of numbers are all positive (mine are) then although the error term can grow without bound, the relative error will not, because the sum must grow at the same time. In base 10, the worst-case example I can think of (in which the error term grows fastest while the sum grows slowest) is if 1.000...005 is approximated 1.000...000. Repeatedly adding this number will increment the error term by 1/2 ULP (of the summand, 0.000...005) while incrementing the sum by 1 first place unit. The worst relative error is 4.5 ULP (0.000...045, when the sum is 9.000...000) which is (base - 1) / 2 ULP which is 1/2 ULP in base 2?
If two floating-point sums are equal, then their absolute difference must be less than twice the error bound, which is 1 ULP in base 2? So in JavaScript, Math.abs(a - b) < a * 1e-16 + b * 1e-16?
Comparing Floating Point Numbers, 2012 Edition describes another technique for comparing floating-point numbers, also based on relative error. In JavaScript, is it possible to find the number of representable numbers between two floating-point numbers?
The maximum possible error in the sum of n numbers added consecutively is proportional to n2, not to n.
The key reason for this is that each addition may have some error proportional to its sum, and those sums keep growing as more additions are made. In the worse case, the sums grow in proportion to n (if you add n x’s together, you get nx). So, in the end, there are n sums that have grown in proportion to n, yielding a total possible error proportional to n2.
JavaScript is specified by the ECMA Language Specification, which says that IEEE-754 64-bit binary floating-point is used and round-to-nearest mode is used. I do not see any provision allowing extra precision as some languages do.
Suppose all numbers have magnitude at most b, where b is some representable value. If your numbers have a distribution that can be characterized more specifically, then an error bound tighter than described below might be derived.
When the exact mathematical result of an operation is y, and there is no overflow, then the maximum error in IEEE-754 binary floating-point with round-to-nearest mode is 1/2 ULP(y), where ULP(y) is the distance between the two representable values just above and below y in magnitude (using y itself as the “above” value if it is exactly representable). This is the maximum error because y is always either exactly on the midpoint between two bordering values or is on one side or the other, so the distance from y to one of the bordering values is at most the distance from the midpoint to a bordering value.
(In IEEE-754 64-bit binary, the ULP of all numbers less than 2-1022 in magnitude is 2-1074. The ULP of all larger powers of two is 2-52 times the number; e.g., 2-52 for 1. The ULP for non-powers of two is the ULP of the largest power of two smaller than the number, e.g., 2-52 for any number above 1 and below 2.)
When the first two numbers in a series are added, the exact result is at most 2b, so the error in this first addition is at most 1/2 ULP(2b). When the third number is added, the result is at most 3b, so the error in this addition is at most 1/2 ULP(3b). The total error so far is at most 1/2 (ULP(2b) + ULP(3b)).
At this point, the addition could round up, so the partial sum so far could be slightly more than 3b, and the next sum could be slightly more than 4b. If we want to compute a strict bound on the error, we could use an algorithm such as:
Let bound = 0.
For i = 2 to n:
bound += 1/2 ULP(i*b + bound).
That is, for each of the additions that will be performed, add an error bound that is 1/2 the ULP of the largest conceivable result given the actual values added plus all the previous errors. (The pseudo-code above would need to be implemented extended precision or with rounding upward in order to retain mathematical rigor.)
Thus, given only the number of numbers to be added and a bound on their magnitudes, we can pre-compute an error bound without knowing their specific values in advance. This error bound will grow in proportion to n2.
If this potential error is too high, there are ways to reduce it:
Instead of adding numbers consecutively, they can be split in half, and the sums of the two halves can be added. Each of the halves can be recursively summed in this way. When this is done, the maximum magnitudes of the partial sums will be smaller, so the bounds on their errors will be smaller. E.g., with consecutive additions of 1, we have sums 2, 3, 4, 5, 6, 7, 8, but, with this splitting, we have parallel sums of 2, 2, 2, 2, then 4, 4, then 8.
We can sort the numbers and keep the sums smaller by adding numbers that cancel each other out (complementary positive and negative numbers) or adding smaller numbers first.
The Kahan summation algorithm can be employed to get some extended precision without much extra effort.
Considering one particular case:
Consider adding n non-negative numbers, producing a calculated sum s. Then the error in s is at most (n-1)/2 • ULP(s).
Proof: Each addition has error at most 1/2 ULP(x), where x is the calculated value. Since we are adding non-negative values, the accumulating sum never decreases, so it is never more than s, and its ULP is at most the ULP of s. So the n-1 additions produce at most n-1 errors of ULP(s)/2.

parseInt rounds incorrectly

I stumbled upon this issue with parseInt and I'm not sure why this is happening.
console.log(parseInt("16980884512690999")); // gives 16980884512691000
console.log(parseInt("169808845126909101"));​ // gives 169808845126909100
I clearly not hitting any number limits in JavaScript limits
(Number.MAX_VALUE = 1.7976931348623157e+308)
Running Win 7 64 bit if that matters.
What am I overlooking?
Fiddle
Don't confuse Number.MAX_VALUE with maximum accurate value. All numbers in javascript are stored as 64 bit floating point, which means you can get high (and low) numbers, but they'll only be accurate to a certain point.
Double floating points (i.e. Javascript's) have 53 bits of significand precision, which means the highest/lowest "certainly accurate" integer in javascript is +/-9007199254740992 (2^53). Numbers above/below that may turn out to be accurate (the ones that simply add 0's on the end, because the exponent bits can be used to represent that).
Or, in the words of ECMAScript: "Note that all the positive and negative integers whose magnitude is no greater than 2^53 are representable in the Number type (indeed, the integer 0 has two representations, +0 and −0)."
Update
Just to add a bit to the existing question, the ECMAScript spec requires that if an integral Number has less than 22 digits, .toString() will output it in standard decimal notation (e.g. 169808845126909100000 as in your example). If it has 22 or more digits, it will be output in normalized scientific notation (e.g. 1698088451269091000000 - an additional 0 - is output as 1.698088451269091e+21).
From this answer
All numbers in Javascript are 64 bit "double" precision IEE754
floating point.
The largest positive whole number that can therefore be accurately
represented is 2^53. The remaining bits are reserved for the exponent.
2^53 = 9007199254740992

Categories

Resources