What is the size (in memory) of a number?

What is the size (in memory) of a number? - javascript

What is the size of a number in JavaScript?
For example, I know a single char in C is 1 byte. The size of an int is sizeof(int). The size on an int64_t is 64 bits, and so on.
What (and how to find it) is the size of a number (decimal, float) in JavaScript?

You can't determine memory size of number value size in JS. It is engine-specific and can be different between different values in same engine. ECMAScript standard (e.g. ECMA-262) only defines observable behavior of numbers, but as long as behavior matches specification in the end, different JS VMs use all kinds of different number types under the hood for optimization purposes.
Standard sets no limits on what engines can use and defines no method to retrieve those implementation details. Nor any other part of spec relies on anything except observable behavior again. You can check out engine-specific details in its documentation or try engine-specific internals debugging tools, but you can't get this size data from JS code itself.

Aside from what's mentioned in other answers, the reality is that modern engines use various optimizations, including storing numbers in various different methods (types...) depending on usage. This is one of the main ideas behind things like asm.js, and just to provide a simple example:
var i = 0;
while(i < 5) {
console.log('hello');
i++;
}
The engine can infer that i is an integer and optimize it's usage.

Related

Is an array of ints actually implemented as an array of ints in JavaScript / V8?

There is claim in this article that an array of ints in JavaScript is implemented by a C++ array of ints.
However; According to MDN unless you specifically use BigInts, in JavaScript all numbers are repressed as doubles.
If I do:
cont arr = [0, 1, 2, 3];
What is the actual representation in the V8 engine?
The code for V8 is here on github, but I don't know where to look:

(V8 developer here.)
"C++ array of ints" is a bit of a simplification, but the key idea described in that article is correct, and an array [0, 1, 2, 3] will be stored as an array of "Smis".
What's a "Smi"? While every Number in JavaScript must behave like an IEEE754 double, V8 internally represents numbers as "small integer" (31 bits signed integer value + 1 bit tag) when it can, i.e. when the number has an integral value in the range -2**30 to 2**30-1, to improve efficiency. Engines can generally do whatever they want under the hood, as long as things behave as if the implementation followed the spec to the letter. So when the spec (or MDN documentation) says "all Numbers are doubles", what it really means from the engine's (or an engine developer's) point of view is "all Numbers must behave as if they were doubles".
When an array contains only Smis, then the array itself keeps track of that fact, so that values loaded from such arrays know their type without having to check. This matters e.g. for a[i] + 1, where the implementation of + doesn't have to check whether a[i] is a Smi when it's already known that a is a Smi array.
When the first number that doesn't fit the Smi range is stored in the array, it'll be transitioned to an array of doubles (strictly speaking still not a "C++ array", rather a custom array on the garbage-collected heap, but it's similar to a C++ array, so that's a good way to explain it).
When the first non-Number is stored in an array, what happens depends on what state the array was in before: if it was a "Smi array", then it only needs to forget the fact that it contains only Smis. No rewriting is needed, as Smis are valid object pointers thanks to their tag bit. If the array was a "double array" before, then it does have to be rewritten, so that each element is a valid object pointer. All the doubles will be "boxed" as so-called "heap numbers" (objects on the managed heap that only wrap a double value) at this point.
In summary, I'd like to point out that in the vast majority of cases, there's no need to worry about any of these internal implementation tricks, or even be aware of them. I certainly understand your curiosity though! Also, array representations are one of the more common reasons why microbenchmarks that don't account for implementation details can easily be misleading by suggesting results that won't carry over to a larger app.
Addressing comments:
V8 does sometimes even use int16 or lower.
Nope, it does not. It may or may not start doing so in the future; though if anything does change, I'd guess that untagged int32 is more likely to be introduced than int16; also if anything does change about the implementation then of course the observable behavior would not change.
If you believe that your application would benefit from int16 storage, you can use an Int16Array to enforce that, but be sure to measure whether that actually benefits you, because quite likely it won't, and may even decrease performance depending on what your app does with its arrays.
It may start to be a double when you make it a decimal
Slightly more accurately: there are several reasons why an array of Smis needs to be converted to an array of doubles, such as:
storing a fractional value in it, e.g. 0.5
storing a large value in it, e.g. 2**34
storing NaN or Infinity or -0 in it

How to test an MD5 implementation?

I am considering using a JS MD5 implementation.
But I noticed that there are only a few tests. Is there a good way of verifying that implementation is correct?
I know I can try it with a few different values and see if it works, but that only means it is correct for some inputs. I would like to see if it is correct for all inputs.

The corresponding RFC has a good description of the algorithm, an example implementation in C, and a handful of test values at the end. All three together let you make a good guess about the quality of the examined implementation and that's all you can get: a good guess.
Testing an applications with an infinite or at least a very large input set as a black box is hard, impossible even in most cases. So you have to check if the code implements the algorithm correctly. The algorithm is described in RFC-3121 (linked to above). This description is sufficient for an implementation. The algorithm itself is well known (in the scientific sense, i.e.: many papers have been written about it and many flaws have been found) and simple enough to skip the formal part, just inspect the implementation.
Problems to expect with MD5 in JavaScript: input of one or more zero bytes (you can check the one and two bytes long inputs thoroughly), endianess (should be no problem but easy to check) and the problem of the unsigned integer used for bit-manipulation in JavaScript (">>" vs. ">>>" but also easy to check for). I would also test with a handful of data with all bits set.
The algorithm needs padding, too, you can check it with all possible input of length shorter than the limit.
Oh, and for all of you dismissing the MD5-hash: it still has its uses as a fast non-cryptographic hash with a low collision-rate and a good mixing (some call the effect of the mixing "avalanche", one bit change in the input changes many bits in the output). I still use it for larger, non-cryptographic Bloom-filters. Yes, one should use a special hash fitting the expected input but constructing such a hash function is a pain in the part of the body Nature gave us to sit on.

JavaScript numbers, all the same size in memory?

I'm reading the Number Type section of the book Professional JavaScript for Web Developers. It seems to say that all ECMAScript numbers are binary64 floating point, which is corroborated by this MDN article. But the book author also says:
Because storing floating-point values uses twice as much memory as storing integer values, ECMAScript always looks for ways to convert values into integers.
I expected numbers to each occupy the same amount of memory: 64 bits. And the MDN article says, "There is no specific type for integers". Anyone know what the book author meant? How can integers take up less memory when they're stored as 64-bit floats (if I have that right)? You'll find the whole section at the link above (free sample of the book).

JavaScript doesn't have any other number type than double precision floating point (except in ECMAScript 6 typed arrays), but the underlying implementation may choose to store the numbers in any way it likes as long as the JavaScript code behaves the same.
JavaScript is compiled nowadays, which means that it can be optimised in many ways that are not obvious in the language.
If a local variable in a function only ever takes on an integer value and isn't exposed outside the function in any way, then it could actually be implemented using an integer type when the code is compiled.
The implementation varies in different browsers. Currently it seems to make a huge difference in MS Edge, a big difference in Firefox, and no difference at all in Chrome: http://jsperf.com/int-vs-double-implementation (Note: jsperf thinks that MS Edge is Chrome 42.)
Further research:
The JS engines Spidermonkey (Firefox), V8 (Chrome, Opera), JavaScriptCore (Safari), Chakra (IE) and Rhino (and possibly others, but those are harder to find implementation details about) use different ways of using integer types or storing numbers as integers when possible. Some quotes:
"To have an efficient representation of numbers and JavaScript
objects, V8 represents both of us with a 32 bits value. It uses a bit
to know if it is an object (flag = 1) or an integer (flag = 0) called
here SMall Integer or SMI because of its 31 bits."
http://thibaultlaurens.github.io/javascript/2013/04/29/how-the-v8-engine-works/
"JavaScript does not have a built-in notion of an integer value, but
for efficiency JavaScriptCore will represent most integers as int32
rather than as double."
http://trac.webkit.org/wiki/JavaScriptCore
"[...] non-double values are a 32-bit type tag and a 32-bit payload,
which is normally either a pointer or a signed 32-bit integer."
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Internals
"In Windows 10 and Microsoft Edge, we’ve started optimizing Chakra’s
parser and the JIT compiler to identify non const variable
declarations of integers that are defined globally and are never
changed during the course of the execution time of the program."
https://blogs.windows.com/msedgedev/2015/05/20/delivering-fast-javascript-performance-in-microsoft-edge/

Because storing floating-point values uses twice as much memory as storing integer values, ECMAScript always looks for ways to convert values into integers.
This paragraph is complete nonsense. Ignore it!
Numbers are numbers. ECMAScript makes no distinction whatsoever between floating-point and integer numeric values.
Even within most JS runtimes, all numeric values are stored as double-precision floating point.

Not sure I fully understood your question, but "There is no specific type for integers" means that JavaScript doesn't recognize separate types for integers and floats, but they are both typed as Numbers. The int/float separation happens "behind the curtains", and that's what they meant by "ECMAScript always looks for ways to convert values into integers".
The bottom line is that you don't have to worry about it, unless you specifically need your variables to mimic integers or floats for use in other languages, in which case it's probably (did I say probably?) the best to pass them as strings (because you'd have trouble passing, say, 5.0 as a float because JS would immediatelly convert it to 5, exactly because of the "ECMAScript always looks for ways to convert values into integers" part).
alert(5.0); // don't expect a float from this

How would you explain Javascript Typed Arrays to someone with no programming experience outside of Javascript?

I have been messing with Canvas a lot lately, developing some ideas I have for a web-based game. As such I've recently run into Javascript Typed Arrays. I've done some reading for example at MDN and I just can't understand anything I'm finding. It seems most often, when someone is explaining Typed Arrays, they use analogies to other languages that are a little beyond my understanding.
My experience with "programming," if you can call it that (and not just front-end scripting), is pretty much limited to Javascript. I do feel as though I understand Javascript pretty well outside of this instance, however. I have deeply investigated and used the Object.prototype structure of Javascript, and more subtle factors such as variable referencing and the value of this, but when I look at any information I've found about Typed Arrays, I'm just lost.
With this frame-of-reference in mind, can you describe Typed Arrays in a simple, usable way? The most effective depicted use-case, for me, would be something to do with Canvas image data. Also, a well-commented Fiddle would be most appreciated.

In typed programming languages (to which JavaScript kinda belongs) we usually have variables of fixed declared type that can be dynamically assigned values.
With Typed Arrays it's quite the opposite.
You have a fixed chunk of data (represented by ArrayBuffer) that you do not access directly. Instead this data is accessed by views. Views are created at run time and they effectively declare some portion of the buffer to be of a certain type. These views are sub-classes of ArrayBufferView. The views define the certain continuous portion of this chunk of data as elements of an array of a certain type. Once the type is declared browser knows the length and content of each element, as well as a number of such elements. With this knowledge browsers can access individual elements much more efficiently.
So we dynamically assigning a type to a portion of what actually is just a buffer. We can assign multiple views to the same buffer.
From the Specs:
Multiple typed array views can refer to the same ArrayBuffer, of different types,
lengths, and offsets.
This allows for complex data structures to be built up in the ArrayBuffer.
As an example, given the following code:
// create an 8-byte ArrayBuffer
var b = new ArrayBuffer(8);
// create a view v1 referring to b, of type Int32, starting at
// the default byte index (0) and extending until the end of the buffer
var v1 = new Int32Array(b);
// create a view v2 referring to b, of type Uint8, starting at
// byte index 2 and extending until the end of the buffer
var v2 = new Uint8Array(b, 2);
// create a view v3 referring to b, of type Int16, starting at
// byte index 2 and having a length of 2
var v3 = new Int16Array(b, 2, 2);
The following buffer and view layout is created:
This defines an 8-byte buffer b, and three views of that buffer, v1,
v2, and v3. Each of the views refers to the same buffer -- so v1[0]
refers to bytes 0..3 as a signed 32-bit integer, v2[0] refers to byte
2 as a unsigned 8-bit integer, and v3[0] refers to bytes 2..3 as a
signed 16-bit integer. Any modification to one view is immediately
visible in the other: for example, after v2[0] = 0xff; v21 = 0xff;
then v3[0] == -1 (where -1 is represented as 0xffff).
So instead of declaring data structures and filling them with data, we take data and overlay it with different data types.

I spend all my time in javascript these days, but I'll take a stab at quick summary, since I've used typed arrays in other languages, like Java.
The closest thing I think you'll find in the way of comparison, when it comes to typed arrays, is a performance comparison. In my head, Typed Arrays enable compilers to make assumptions they can't normally make. If someone is optimizing things at the low level of a javascript engine like V8, those assumptions become valuable. If you can say, "Data will always be of size X," (or something similar), then you can, for instance, allocate memory more efficiently, which lets you (getting more jargon-y, now) reduce how many times you go to access memory and it's not in a CPU cache. Accessing CPU cache is much faster than having to go to RAM, I believe. When doing things at a large scale, those time savings add up quick.
If I were to do up a jsfiddle (no time, sorry), I'd be comparing the time it takes to perform certain operations on typed arrays vs non-typed arrays. For example, I imagine "adding 100,000 items" being a performance benchmark I'd try, to compare how the structures handle things.
What I can do is link you to: http://jsperf.com/typed-arrays-vs-arrays/7
All I did to get that was google "typed arrays javascript performance" and clicked the first item (I'm familiar with jsperf, too, so that helped me decide).

Comparing large strings in JavaScript with a hash

I have a form with a textarea that can contain large amounts of content (say, articles for a blog) edited using one of a number of third party rich text editors. I'm trying to implement something like an autosave feature, which should submit the content through ajax if it's changed. However, I have to work around the fact that some of the editors I have as options don't support an "isdirty" flag, or an "onchange" event which I can use to see if the content has changed since the last save.
So, as a workaround, what I'd like to do is keep a copy of the content in a variable (let's call it lastSaveContent), as of the last save, and compare it with the current text when the "autosave" function fires (on a timer) to see if it's different. However, I'm worried about how much memory that could take up with very large documents.
Would it be more efficient to store some sort of hash in the lastSaveContent variable, instead of the entire string, and then compare the hash values? If so, can you recommend a good javascript library/jquery plugin that implements an appropriate hash for this requirement?

In short, you're better off just storing and comparing the two strings.
Computing a proper hash is not cheap. For example, check out the pseudo code or an actual JavaScript implementation for computing the MD5 hash of a string. Furthermore, all proper hash implementations will require enumerating the characters of the string anyway.
Furthermore, in the context of modern computing, a string has to be really, really long before comparing it against another string is slow. What you're doing here is effectively a micro-optimization. Memory won't be an issue, nor will the CPU cycles to compare the two strings.
As with all cases of optimizing: check that this is actually a problem before you solve it. In a quick test I did, computing and comparing 2 MD5 sums took 382ms. Comparing the two strings directly took 0ms. This was using a string that was 10000 words long. See http://jsfiddle.net/DjM8S.
If you really see this as an issue, I would also strongly consider using a poor-mans comparison; and just comparing the length of the 2 strings, to see if they have changed or not, rather than actual string comparisons.
..

An MD5 hash is often used to verify the integrity of a file or document; it should work for your purposes. Here's a good article on generating an MD5 hash in Javascript.

I made a JSperf rev that might be useful here for performance measuring. Please add different revisions and different types of checks to the ones I made!
http://jsperf.com/long-string-comparison/2
I found two major results
When strings are identical performance is murdered; from ~9000000 ops/s to ~250 ops/sec (chrome)
The 64bit version of IE9 is much slower on my PC, results from the same tests:
+------------+------------+
| IE9 64bit | IE9 32bit |
+------------+------------+
| 4,270,414 | 8,667,472 |
| 2,270,234 | 8,682,461 |
+------------+------------+
Sadly, jsperf logged both results as simply "IE 9".
Even a precursory look at JS MD5 performance tells me that it is very, very slow (at least for large strings, see http://jsperf.com/md5-shootout/18 - peaks at 70 ops/sec). I would want to go as far as to try AJAXing the hash calculation or the comparison to the backend but I don't have time to test, sorry!

Develop Reference

JavaScript is the programming language of the Web.