Here are some heapdumps from a node server. The following screenshot of the statistics tab from the Memory Devtool of Chrome shows a total of 750871 kB of memory usage... however when adding up the different type of "objects" (code + Strings + Js arrays + Typed arrays + systems objects) it does NOT reach 750871 kB. Where did the other 550000 kB of memory get allocated to ???
Related
I'm trying to understand garbage collection behaviour I'm seeing with Chrome / V8 on Windows 10. The scenario is that I have a small program that receives ~ 1MiB of image data from a websocket at a rate of about 60Hz. I'm using Chrome Version 81.0.4044.113 (Official Build) (64-bit)) and Windows 10 Pro 1903.
Minimal receiving code looks like this:
var connection = new WebSocket('ws://127.0.0.1:31333');
connection.onmessage = message => {
var dataCopy = new Uint8Array(message.data, 0);
};
Profiling in Chrome shows a sawtooth of allocations rising until a major garbage collection occurs, repeating at regular intervals. The allocations are all exactly 176 bytes, which doesn't really match up with the expected 1 MiB.
profile heap graph
I found an excellent overview of V8 GC here. If I understand correctly it seems a little surprising that I'm seeing major GC events when a minor scavenge type GC could probably pick up those allocations. Additionally, as mentioned above, the allocations seen while profiling don't have the expected size of 1MiB.
Further research indicates that there's a "large object space" as described in this SO question. Unfortunately the wiki mentioned has moved since the question was asked and I can't find any references to "large object space" at the new location. I suspect the 1MiB allocation is probably big enough to qualify as a large object and if so I would like to confirm what the actual behaviour around those is.
So my questions are:
Why do I see this behaviour with major GC's happening regularly?
Why are the allocations smaller than expected?
If it's related to large object handling are there any official resources that explain how large objects are handled in Chrome / V8 and what the limits around them are?
In the end I filed a bug for V8 here and the answer is that Major GCs are required because the message object is allocated on Blink's heap which requires V8 to perform a Major GC to cooperatively reclaim the memory. The 176 byte objects are likely pointers to the ArrayBuffer on the heap. There is an ongoing project to make Blink's GC generational which will eventually change this behavior.
A section of my Node.js application involves receiving a string as input from the user and storing it in a JSON file. JSON itself obviously has no limit on this, but is there any upper bound on the amount of text that Node can process into JSON?
Note that I am not using MongoDB or any other technology for the actual insertion - this is native stringification and saving to a .json file using fs.
V8 (the JavaScript engine node is built upon) until very recently had a hard limit on heap size of about 1.9 GB.
Node v0.10 is stuck on an older version of V8 (3.14) due to breaking V8 API changes around native addons. Node 0.12 will update to the newest V8 (3.26), which will break many native modules, but opens the door for the 1.9 GB heap limit to be raised.
So as it stands, a single node process can keep no more than 1.9 GB of JavaScript code, objects, strings, etc combined. That means the maximum length of a string is under 1.9 GB.
You can get around this by using Buffers, which store data outside of the V8 heap (but still in your process's heap). A 64-bit build of node can pretty much fill all your RAM as long as you never have more than 1.9 GB of data in JavaScript variables.
All that said, you should never come anywhere near this limit. When dealing with this much data, you must deal with it as a stream. You should never have more than a few megabytes (at most) in memory at one time. The good news is node is especially well-suited to dealing with streaming data.
You should ask yourself some questions:
What kind of data are you actually receiving from the user?
Why do you want to store it in JSON format?
Is it really a good idea to stuff gigabytes into JSON? (The answer is no.)
What will happen with the data later, after it is stored? Will your code read it? Something else?
The question you've posted is actually quite vague in regard to what you're actually trying to accomplish. For more specific advice, update your question with more information.
If you expect the data to never be all that big, just throw a reasonable limit of 10 MB or something on the input, buffer it all, and use JSON.stringify.
If you expect to deal with data any larger, you need to stream the input straight to disk. Look in to transform streams if you need to process/modify the data before it goes to disk. For example, there are modules that deal with streaming JSON.
The maximum string size in "vanilla" nodeJS (v0.10.28) is in the ballpark of 1GB.
If your are in a hurry, you can test the maximum supported string size with a self doubling string. The system tested has 8GB of RAM, mostly unused.
x = 'x';
while (1){
x = ''+x+x; // string context
console.log(x.length);
}
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
FATAL ERROR: JS Allocation failed - process out of memory
Aborted (core dumped)
In another test I got to 1,000,000,000 with a one char at a time for loop.
Now a critic might say, "wait, what about JSON. the question is about JSON!" and I would shout THERE ARE NO JSON OBJECTS IN JAVASCRIPT the JS types are Object, Array, String, Number, etc.... and as JSON is a String representation this question boils down to what is the longest allowed string. But just to double check, let's add a JSON.stringify call to address the JSON conversion.
Code
x = 'x';
while (1){
x = ''+x+x; // string context
console.log(JSON.stringify({a:x}).length);
}
Expectations: the size of the JSON string will start greater than 2, because the first object is going to stringify to '{"a":"xx"}' for 10 chars. It won't start to double until the x string in property a gets bigger. It will probably fail around 256M since it probably makes a second copy in stringification. Recall a stringification is independent of the original object.
Result:
10
12
16
24
40
72
136
264
520
1032
2056
4104
8200
16392
32776
65544
131080
262152
524296
1048584
2097160
4194312
8388616
16777224
33554440
67108872
134217736
268435464
Pretty much as expected....
Now these limits are probably related to the C/C++ code that implements JS in the nodeJS project, which at this time I believe is the same V8 code used in Chrome browsers.
There is evidence from blog posts of people recompiling nodeJS to get around memory limits in older versions. There are also a number of nodejs command line switches. I have not tested the effect of any of this.
The maximum length of a string in node.js is defined by the underlying Javascript Engine "V8". In V8 the maximum length is independent of the heap size. The size of a string is actually constrained by the limits defined by optimized object layout. See https://chromium-review.googlesource.com/c/v8/v8/+/2030916 which is a recent (Feb 2020) change to the maximum length of a string in V8. The commit message explains the different lengths over time. The limit has gone from about 256MB to 1GB then back to 512MB (on 64-bit V8 platforms).
This is a good question, but I think the upper limit you need to be worried about doesn't involve the max JSON string size.
In my opinion the limit you need to worry about is how long do you wish to block the request thread while it's processing the user's request.
Any string size over 1MB will take the user a few seconds to upload and 10 of Megabytes could take minutes. After receiving the request, the server will take a few hundred milliseconds to seconds to parse into a data structure leading to a very poor user experience (Parsing JSON is very expensive)
The bandwidth and server processing times will overshadow any limit JSON may have on string size.
I'm looking at a node.js library (gen-readlines) that reads large flat files via a generator - i.e. a file is read in 'chunks' of 65 536 bytes at a time via a generator.
Not having a CS background I didn't think much about this until someone mentioned that a disk reads 65 536 bytes of data at a time.
Questions:
Is this true of all disks (both metallic and SSD)?
8 bytes == 64 bit. What is the relationship between a 64 bit processor and a disk read of 64bits * 1024 bytes read sizes?
i.e. what is the significance of 64KB in terms of Disc IO?
Considering how high-level JavaScript is, can I really instruct a generator to yield bytes after exactly one disc read? Or is the number specified as a buffer size in the library I've linked to completely arbitrary when thinking in terms of JavaScript...
Is this true of all disks (both metallic and SSD)?
No, it depends on how the disk is formatted, the cluster size IIRC. It is a fairly common value in today's world, but smaller cluster sizes aren't uncommon. They are typically multiples of 4k (in the last decade or more). When I was young and the world was new, 512 bytes was normal. :-) 64k is likely to be big enough for even a disk formatted with a large cluster size.
But there's a lot more to it than the basic unit of disk allocation. For one thing, there's very likely multiple levels of caching — in the disk drive's built-in controller, in the disk controller on the motherboard, in the OS... Today's disks (or even yesterday's, or the day before's) are not stupid platters we have to try to micro-manage with code.
8 bytes == 64 bit. What is the relationship between a 64 bit processor and a disk read of 64bits * 1024 bytes read sizes?
Other than that they're both powers of 2, I don't think there is one.
Considering how high-level JavaScript is, can I really instruct a generator to yield bytes after exactly one disc read?
That's not really the key question. The key question is whether the code in the generator function (or any function) can read exactly 64k at a time.
The answer is yes, and that code does:
let bytesRead = fs.readSync(fd, readChunk, 0, bufferSize, position);
...where bufferSize is 64k. readSync is a low-level call.
In summary: 64k is likely to be large enough to hold even the largest minimum allocation unit of a disk; and if it's too big, no problem, it's still not outrageous and multiple allocation units can be read into it. But I'd want to see well-crafted benchmarks before I believed it made a significant difference. I can see the logic, but with the layers between even Node's C++ code inside readSync and the actual physical reading of the disk...
While disk read may be aligned, the OS makes it transparent for the most part; as you mentioned that you're reading sequentially, it doesn't matter what buffer size you're using. There are no relationship between 64 bit and 64KB alignment (I have only heard of 4K align anyway).
You may want to create a buffer of size of power of 2; just for better aligning with memory allocator. JavaScript abstracts most of the memory allocations, so it doesn't necessary improves performance when you have a 64K or 4K buffer (in normal sense, it should be sufficiently big to reduce syscall overhead).
Do the IO in your favorite style, as long as it is buffered. The buffer size doesn't matter too much if it's 4K or 64K (but too small buffer is bad as unbuffered), but whether the IO is buffered or not, matters very much.
1- no, it depends on the firmware of the storage device, on the drive controller, and on the operating system. Newer HDDs use 4 KiB sectors, thus such a disk reads at least 4 KiB at a time.
2- there is no relation between the processor's register or bus size and the disk I/O chunks.
3- data rates depend on both data size and I/O latency overhead (overhead due to I/O processing, for instance system call processing). Bigger data chunks means less I/O for the same data size, means less I/O overhead.
4- from the point of view at the JavaScript high layer, you do not need to worry about these low-level behaviours. Everything will work correctly, since there are many caches at several levels.
Leaving a page open for 2 minutes and recording with Chrome dev tools, I get a saw tooth pattern BUT the JS heap does not return back to it's original level - rather, for each garbage collection it remains a bit higher until it eventually crashes:
Conventional wisdom suggests taking 2 heap snapshots over a period of time and comparing them to isolate the problem. Before a heap snapshot, a garbage collection automatically takes place. Expected results would be that heap snapshot number 1 shows a baseline of ~19 MB of heap, and snapshot 2 shows at least 22 MB after 2 minutes. Instead, snapshot 2 actually shows less heap
What should I do now to find the leak?
It might have just been a fluke. Try taking multiple snapshots. Like, one every 10 seconds, ten times.
Try Allocation Timelines and Allocation Profiles, too. Allocation Timelines show you when memory is getting allocated, in a realtime graph. Profiles show you what functions allocate the most memory.
A section of my Node.js application involves receiving a string as input from the user and storing it in a JSON file. JSON itself obviously has no limit on this, but is there any upper bound on the amount of text that Node can process into JSON?
Note that I am not using MongoDB or any other technology for the actual insertion - this is native stringification and saving to a .json file using fs.
V8 (the JavaScript engine node is built upon) until very recently had a hard limit on heap size of about 1.9 GB.
Node v0.10 is stuck on an older version of V8 (3.14) due to breaking V8 API changes around native addons. Node 0.12 will update to the newest V8 (3.26), which will break many native modules, but opens the door for the 1.9 GB heap limit to be raised.
So as it stands, a single node process can keep no more than 1.9 GB of JavaScript code, objects, strings, etc combined. That means the maximum length of a string is under 1.9 GB.
You can get around this by using Buffers, which store data outside of the V8 heap (but still in your process's heap). A 64-bit build of node can pretty much fill all your RAM as long as you never have more than 1.9 GB of data in JavaScript variables.
All that said, you should never come anywhere near this limit. When dealing with this much data, you must deal with it as a stream. You should never have more than a few megabytes (at most) in memory at one time. The good news is node is especially well-suited to dealing with streaming data.
You should ask yourself some questions:
What kind of data are you actually receiving from the user?
Why do you want to store it in JSON format?
Is it really a good idea to stuff gigabytes into JSON? (The answer is no.)
What will happen with the data later, after it is stored? Will your code read it? Something else?
The question you've posted is actually quite vague in regard to what you're actually trying to accomplish. For more specific advice, update your question with more information.
If you expect the data to never be all that big, just throw a reasonable limit of 10 MB or something on the input, buffer it all, and use JSON.stringify.
If you expect to deal with data any larger, you need to stream the input straight to disk. Look in to transform streams if you need to process/modify the data before it goes to disk. For example, there are modules that deal with streaming JSON.
The maximum string size in "vanilla" nodeJS (v0.10.28) is in the ballpark of 1GB.
If your are in a hurry, you can test the maximum supported string size with a self doubling string. The system tested has 8GB of RAM, mostly unused.
x = 'x';
while (1){
x = ''+x+x; // string context
console.log(x.length);
}
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
FATAL ERROR: JS Allocation failed - process out of memory
Aborted (core dumped)
In another test I got to 1,000,000,000 with a one char at a time for loop.
Now a critic might say, "wait, what about JSON. the question is about JSON!" and I would shout THERE ARE NO JSON OBJECTS IN JAVASCRIPT the JS types are Object, Array, String, Number, etc.... and as JSON is a String representation this question boils down to what is the longest allowed string. But just to double check, let's add a JSON.stringify call to address the JSON conversion.
Code
x = 'x';
while (1){
x = ''+x+x; // string context
console.log(JSON.stringify({a:x}).length);
}
Expectations: the size of the JSON string will start greater than 2, because the first object is going to stringify to '{"a":"xx"}' for 10 chars. It won't start to double until the x string in property a gets bigger. It will probably fail around 256M since it probably makes a second copy in stringification. Recall a stringification is independent of the original object.
Result:
10
12
16
24
40
72
136
264
520
1032
2056
4104
8200
16392
32776
65544
131080
262152
524296
1048584
2097160
4194312
8388616
16777224
33554440
67108872
134217736
268435464
Pretty much as expected....
Now these limits are probably related to the C/C++ code that implements JS in the nodeJS project, which at this time I believe is the same V8 code used in Chrome browsers.
There is evidence from blog posts of people recompiling nodeJS to get around memory limits in older versions. There are also a number of nodejs command line switches. I have not tested the effect of any of this.
The maximum length of a string in node.js is defined by the underlying Javascript Engine "V8". In V8 the maximum length is independent of the heap size. The size of a string is actually constrained by the limits defined by optimized object layout. See https://chromium-review.googlesource.com/c/v8/v8/+/2030916 which is a recent (Feb 2020) change to the maximum length of a string in V8. The commit message explains the different lengths over time. The limit has gone from about 256MB to 1GB then back to 512MB (on 64-bit V8 platforms).
This is a good question, but I think the upper limit you need to be worried about doesn't involve the max JSON string size.
In my opinion the limit you need to worry about is how long do you wish to block the request thread while it's processing the user's request.
Any string size over 1MB will take the user a few seconds to upload and 10 of Megabytes could take minutes. After receiving the request, the server will take a few hundred milliseconds to seconds to parse into a data structure leading to a very poor user experience (Parsing JSON is very expensive)
The bandwidth and server processing times will overshadow any limit JSON may have on string size.