node.js arrays vs. objects from utilized memory point of view - javascript

Wanted to share a simple experiment I ran, using node.js v6.11.0 under Win 10.
Goal. Compare arrays vs. objects in terms of memory occupied.
Code. Each function reference, twoArrays, matrix and objects create two arrays of same size, containing random numbers. They organize the data a bit differentely.
reference creates two arrays of fixed size and fills them with numbers.
twoArrays fills two arrays via push (so the interpreter doesn't know the final size).
objects creates one array via push, each element is an object containing two numbers.
matrix creates a two-row matrix, also using push.
const SIZE = 5000000;
let s = [];
let q = [];
function rand () {return Math.floor(Math.random()*10)}
function reference (size = SIZE) {
s = new Array(size).fill(0).map(a => rand());
q = new Array(size).fill(0).map(a => rand());
}
function twoArrays (size = SIZE) {
s = [];
q = [];
let i = 0;
while (i++ < size) {
s.push(rand());
q.push(rand());
}
}
function matrix (size = SIZE) {
s = [];
let i = 0;
while (i++ < size) s.push([rand(), rand()]);
}
function objects (size = SIZE) {
s = [];
let i = 0;
while (i++ < size) s.push({s: rand(), q: rand()});
}
Result. After running each function separately in a fresh environment, and after calling global.gc() few times, the Node.js environment was occupying the following memory sizes:
reference: 84 MB
twoArrays: 101 MB
objects: 249 MB
matrix: 365 MB
theoretical: assuming that each number takes 8 bytes, the size should be 5*10^6*2*8 ~ 80 MB
We see, that reference resulted in a lightest memory structure, which is kind of obvious.
twoArrays is taking a bit more of memory. I think this is due to the fact that the arrays there are dynamic and the interpreter allocates memory in chunks, as soon as next push operation is exceeding preallocated space. Hence the final memory allocation is done for more than 5^10 numbers.
objects is interesting. Although each object is kind of fixed, it seems that the interpreter doesn't think so, and allocates much more space for each object then necessary.
matrix is also quite interesting - obviously, in case of explicit array definition in the code, the interpreter allocates more memory than required.
Conclusion. If your aim is a high-performance application, try to use arrays. They are also fast and have just O(1) time for random access. If the nature of your project requires objects, you can quite often simulate them with arrays as well (in case number of properties in each object is fixed).
Hope this is usefull, would like to hear what people think or maybe there are links to some more thorough experiments...

Related

Create array with length n, initialize all values with 0 besides one which index matches a certain condition, in O(1)?

I want to create an array of type number with length n. All values inside the array should be 0 except the one which index matches a condition.
Thats how i currently do it:
const data: number[] = [];
for (let i = 0; i < n; i++) {
if (i === someIndex) {
data.push(someNumber);
} else {
data.push(0);
}
}
So lets say n = 4, someIndex = 2, someNumber = 4 would result in the array [0, 0, 4, 0].
Is there a way to do it in O(1) instead of O(n)?
Creating an array of size n in O(1) time is theoretically possible depending on implementation details - in principle, if an array is implemented as a hashtable then its length property can be set without allocating or initialising space for all of its elements. The ECMAScript specification for the Array(n) constructor doesn't mandate that Array(n) should do anything which necessarily takes more than O(1) time, although it also doesn't mandate that the time complexity is O(1).
In practice, Array(n)'s time complexity depends on the browser, though verifying this is a bit tricky. The performance.now() function can be used to measure the time elapsed between the start and end of a computation, but the precision of this function is artificially reduced in many browsers to protect against CPU-timing attacks like Spectre. To get around this, we can call the constructor repetitions times, and then divide the time elapsed by repetitions to get a more precise measurement per constructor call.
My timing code is below:
function timeArray(n, repetitions=100000) {
var startTime = performance.now();
for(var i = 0; i < repetitions; ++i) {
var arr = Array(n);
arr[n-1] = 'foo';
}
var endTime = performance.now();
return (endTime - startTime) / repetitions;
}
for(var n = 10000; n <= 1000000; n += 10000) {
console.log(n, timeArray(n));
}
Here's my results from Google Chrome (version 74) and Firefox (version 72); on Chrome the performance is clearly O(n) and on Firefox it's clearly O(1) with a quite consistent time of about 0.01ms on my machine.
I measured using repetitions = 1000 on Chrome, and repetitions = 100000 on Firefox, to get accurate enough results within a reasonable time.
Another option proposed by #M.Dietz in the comments is to declare the array like var arr = []; and then assign at some index (e.g. arr[n-1] = 'foo';). This turns out to take O(1) time on both Chrome and Firefox, both consistently under one nanosecond:
That suggests the version using [] is better to use than the version using Array(n), but still the specification doesn't mandate that this should take O(1) time, so there may be other browsers where this version takes O(n) time. If anybody gets different results on another browser (or another version of one of these browsers) then please do add a comment.
You need to assign n values, and so there is that amount of work to do. The work increases linearly with increasing n.
Having said that, you can hope to make your code a bit faster by making use of .fill:
const data: number[] = Array(n).fill(0);
data[someIndex] = someNumber;
But don't be mistaken; this is still O(n): .fill may be faster, but it still requires to fill the whole array with zeroes, which means a corresponding size of memory needs to be initialised, so that operation has linear time complexity.
If however you drop the requirement that zeroes need to be assigned, then you can only store the someNumber:
const data: number[] = Array(n);
data[someIndex] = someNumber;
This way you actually do not allocate the memory for the whole array, so this code snippet runs in constant time. Any access to an index different from someIndex will give you a value of undefined. You may trap that condition and translate that to a zero on-the-fly:
let value = i in data ? data[i] : 0;
Obviously, if you are going to access all indices of the array like that, you'll have again a linear time complexity.

Memory overhead of typed arrays vs strings

I am trying to reduce the memory usage of a JavaScript web application that stores a lot of information in memory in the form of a large number of small strings. When I changed the code to use Uint8Array instead of String, I noticed that memory usage went up.
For example, consider the following code that creates many small strings:
// (1000000 strings) x (10 characters)
var a=[];
for (let i=0; i<1000000; i++)
a.push("a".repeat(10).toUpperCase());
If you put it in an empty page and let the memory usage settle for a few seconds, it settles at 70 MiB on Google Chrome. On the other hand, the following code:
// (1000000 arrays) x (10 bytes)
var a=[];
for (let i=0; i<1000000; i++)
a.push(new Uint8Array(10));
uses 233 MiB of memory. An empty page without any code uses about 20 MiB. On the other hand, if I create a small number of large strings/arrays, the difference becomes smaller and in the case of a single string/array with 10000000 characters/entries, the memory usage is virtually identical.
So why do typed arrays have such a large memory overhead?
V8 developer here. Your conclusion makes sense: If you compare characters in a string to elements in a Uint8Array, the string will have less overhead. TypedArrays are great at providing fast access to typed elements; however having a large number of small TypedArrays is not memory efficient.
The difference is in the object header size for strings and typed arrays.
For a string, the object header is:
hidden class pointer
hash
length
payload
where the payload is rounded up to pointer size alignment, so 16 bytes in this case.
For a Uint8Array, you need the following:
hidden class pointer
properties pointer (unused)
elements pointer (see below)
array buffer pointer (see below)
offset into array buffer
byte length
length of view into array buffer
length (user-visible)
embedder field #1
embedder field #2
array buffer: hidden class pointer
array buffer: properties pointer (unused)
array buffer: elements pointer (see below)
array buffer: byte length
array buffer: backing store
array buffer: allocation base
array buffer: allocation length
array buffer: bit field (internal flags)
array buffer: embedder field #1
array buffer: embedder field #2
elements object: hidden class pointer
elements object: length (of the backing store)
elements object: base pointer (of the backing store)
elements object: offset to data start
elements object: payload
where, again, the payload is rounded up to pointer size alignment, so consumes 16 bytes here.
In summary, each string consumes 5*8 = 40 bytes, each typed array consumes 26*8 = 208 bytes. That does seem like a lot of overhead; the reason is due to the various flexible options that TypedArrays provide (they can be overlapping views into ArrayBuffers, which can be allocated directly from JavaScript, or shared with WebGL and whatnot, etc).
(It's not about "optimizing memory allocation" nor being "better at garbage collecting strings" -- since you're holding on to all the objects, GC does not play a role.)
The typed arrays are not supposed to be used that way.
If you want high memory efficiency, use just one typed array to hold all of your integer numbers. Instead of use a huge number of arrays to hold your integer numbers due to low level reasons.
Those low level reasons are related to how much overhead is need to hold one object in memory, and that quantity depends on a few aspects like immutability and garbage collection. In this case hold one typed array has higher overhead than hold one simple string. Thats why you should pay that price one time only
You should take advantage of:
var a = []; for (let i=0; i<1000000; i++) a.push("1");
var b = new Uint8Array(10000000); for (let i=0; i<1000000; i++) a[i] = 1;
// 'b' is more memory efficient than 'a', just pay the price of Uint8Array one time
// and save the wasted memory in string allocation overhead

Are there ideal array sizes in JavaScript?

I've seen little utility routines in various languages that, for a desired array capacity, will compute an "ideal size" for the array. These routines are typically used when it's okay for the allocated array to be larger than the capacity. They usually work by computing an array length such that the allocated block size (in bytes) plus a memory allocation overhead is the smallest exact power of 2 needed for a given capacity. Depending on the memory management scheme, this can significantly reduce memory fragmentation as memory blocks are allocated and then freed.
JavaScript allows one to construct arrays with predefined length. So does the concept of "ideal size" apply? I can think of four arguments against it (in no particular order):
JS memory management systems work in a way that would not benefit from such a strategy
JS engines already implement such a sizing strategy internally
JS engines don't really keep arrays as contiguous memory blocks, so the whole idea is moot (except for typed arrays)
The idea applies, but memory management is so engine-dependent that no single "ideal size" strategy would be workable
On the other hand, perhaps all of those arguments are wrong and a little utility routine would actually be effective (as in: make a measurable difference in script performance).
So: Can one write an effective "ideal size" routine for JavaScript arrays?
Arrays in javascript are at their core objects. They merely act like arrays through an api. Initializing an array with an argument merely sets the length property with that value.
If the only argument passed to the Array constructor is an integer between 0 and 232-1 (inclusive), this returns a new JavaScript array with length set to that number. -Array MDN
Also, there is no array "Type". An array is an Object type. It is thus an Array Object ecma 5.1.
As a result, there will be no difference in memory usage between using
var one = new Array();
var two = new Array(1000);
aside from the length property. When tested in a loop using chrome's memory timeline, this checks out as well. Creating 1000 of each of those both result in roughly 2.2MB of allocation on my machine.
one
two
You'll have to measure performance because there are too many moving parts. The VM and the engine and browser. Then, the virtual memory (the platform windows/linux, the physical available memory and mass storage devices HD/SSD). And, obviously, the current load (presence of other web pages or if server-side, other applications).
I see little use in such an effort. Any ideal size for performance may just not be ideal anymore when another tab loads in the browser or the page is loaded on another machine.
Best thing I see here to improve is development time, write less and be quicker on deploying your website.
I know this question and the answer was about memory usage. BUT although there might be no difference in the allocated memory size between calling the two constructors (with and without the size parameter), there is a difference in performance when filling the array. Chrome engine obviously performs some pre-allocation, as suggested by this code run in the Chrome profiler:
<html>
<body>
<script>
function preAlloc() {
var a = new Array(100000);
for(var i = 0; i < a.length; i++) {
a[i] = i;
}
}
function noAlloc() {
var a = [];
var length = 100000;
for(var i = 0; i < length; i++) {
a[i] = i;
}
}
function repeat(func, count) {
var i = 0;
while (i++ < count) {
func();
}
}
</script>
</body>
Array performance test
<script>
// 2413 ms scripting
repeat(noAlloc, 10000);
repeat(preAlloc, 10000);
</script>
</html>
The profiler shows that the function with no size parameter took 28 s to allocate and fill 100,000 items array for 1000 times and the function with the size parameter in the array constructor took under 7 seconds.

Which way of storing and operating on bitfields in javascript is the fastest? (200k+ bits)

I am profiling my javascript code intended to be used on embedded browser on Android (PhoneGap).
Basically I need a very large bitfield (200k+ bits) for my calculations.
I've tried to put them into array of unsigned integers with each item storing 32 bits - this indeed reduced memory usage but made execution time drastically too slow (over 30 seconds for simple iterating and reversing all bits in the bitfield on modern PC!)
Than I made good old fashion array of bools. This increased memory usage (but still it was less than 15 mega on Android for entire PhoneGap framework around my code). Profiling showed me that initial step in my algorithm - setting all elements of the bitfield to 1 (simple for- loop) - takes half of the execution time (~1.5 seconds on PC, more than few minutes on Android). I can rewrite my code so default value would be 0 not 1 (reverse all conditions), but I still don't know how to set such large array to 0'es fast.
Edit adding my code, as requested:
var count = 200000;
var myArr = [];
myArr.length = count;
for(var i = 0; i < count ; i++)
myArr[i] = true;
Could someone point me how can I clear very large array, or is there any faster way to store and operate on large bitfields in javascript?
See if this is a faster way to create the array:
var myArray = [true];
var desiredLength = 200000;
while (myArray.length < desiredLength) {
myArray = myArray.concat(myArray);
}
if (myArray.length > desiredLength) {
myArray.splice(desiredLength);
}
I've added a few more test cases to the jsperf page that Asad linked in his comment. By far the fastest in my browser (Chrome 23.0.1271.101 on Mac OS X 10.8.2) is this one:
var count = 200000;
var myArr = [];
for (var i = 0; i < count; i++) {
myArr.push(true);
}
Why pre-fill the array in the first place! Use undefined to your advantage. Remember that undefined acts as a falsey value. So it will act exactly like 0/false when you do a boolean check.
var myArray = new Array(200000);
if (myArray[1]) {
//I am a truthy value
} else {
//I am a falsey value
}
So when you initialize the array this way, there is no reason to prefill! That means no extra processing and take advantage of the sparse Array!

array of arrays or an array of objects

What is the more efficient (in terms of memory consumed) way to store an array of length 10,000 with 4 integer properties:
Option 1:Array of objects
var array = [];
array[0] = {p1:1, p2:1, p3:1, p4:1}
or
Option 2: Four arrays of integers
var p1 = [], p2 = [], p3 = [], p4 = [];
p1[0] = 1;
p2[0] = 1;
p3[0] = 1;
p4[0] = 1;
Option 2. 4 objects (arrays are objects too) vs. 10001 objects.
Four arrays of 10,000 elements is probably better in terms of memory, because you're only storing four complex objects (arrays) and then 40,000 integers -- where the other way you are storing 10,000 arrays and 40,000 integers (4 per array).
My guess would be that, purely from a bits and bytes standpoint, a single multi-dimensional array would have the smallest footprint:
var p = [];
p[0] = [1,1,1,1];
I actually tested both options with the Google Chrome task manager, which shows information about the open tabs (Shift+ESC), and while this not might be 100% accurate, it does show significant differences:
For the first option, creating an array with 10,000 elements, each being an object with 4 properties as you specified, the memory usage jumped by about 10MB after initiating the array.
The second option, creating 4 arrays with 10,000 elements each, made the memory usage jump by about 5MB.
Some of that memory usage jump might be related to the actual processing of the creation and internal browser stuff, but the point is - as expected - creating objects is adding more overhead for the data you are storing.

Categories

Resources