How is Node JS Buffer data stored behind the scenes? - javascript

According to the Node JS Buffer Documentation, "A Buffer is similar to an array of integers but corresponds to a raw memory allocation outside the V8 heap". No further information is given.
The question is how the data is stored in RAM. Does the node JS buffer use a special way of allocating space on the heap? Is that subject to the same garbage collection as V8's heap? Am I safe to assume that any change to the data in a buffer actually changes the data in RAM, and that no residual remnants of the data is left for snoopers?
Sorry about the very broad question, but I can't seem to find any material on how this actually works. The reason I am asking is because I want to make sure that the variables I use in my application don't stick around in memory for longer than they need to.
Docs:
https://nodejs.org/api/buffer.html#buffer_class_buffer
Cheers!

The nodejs source code for implementing buffers is http://github.com/joyent/node/blob/master/lib/buffer.js and http://github.com/joyent/node/blob/master/src/node_buffer.cc. If you really want to know the details of how it works, you can study the actual code.
As for your questions...
The question is how the data is stored in RAM. Does the node JS buffer
use a special way of allocating space on the heap?
Per the source code, there are both heap allocations and a memory pool involved in memory allocated to buffer objects. When each is used depends upon details of how it is used.
Is that subject to the same garbage collection as V8's heap?
Yes, garbage collection works for Buffer objects. It is possible for objects that are implemented with native code to participate in garbage collection if they follow a strict set of rules in their implementation.
Am I safe to assume that any change to the data in a buffer actually
changes the data in RAM, and that no residual remnants of the data is
left for snoopers?
Yes, when you change data in a buffer, it does actually change data in RAM (there is no other place to store the change besides RAM unless it was only stored on disk which is not the case).
As for "no residual remnants of the data left for snoopers", that is hard to say. It is very common in programming that when heap elements or pooled elements grow or shrink that their data may be copied into a different piece of RAM and the old, now freed or recycled block of RAM may still have a copy of some or all of the original data. Unless specifically cleared, newly allocated RAM may very well have been used for something else before and may still contain information from that previous use.
On the other hand, writing to a Buffer object writes directly to the RAM that is allocated/assigned to that Buffer object (as long as you don't cause its size to change) so if you want to decrease the odds that your data would be left around in RAM, you can overwrite the data in your Buffer object when you are done with it.

Related

Will Uint32Array and other typed arrays already claim space when initialised?

I'm trying to come up with an accurate measure of how much memory a particular datastructure will consume. Big part of that is ~ 1Million Uint32Array and 1M BigUint64Array both of max 200 elements.
Is my thinking correct that typed arrays that are initialised to a particular size already consume memory even without elements having been inserted, and that inserts don't change the amount of memory allocated?
If so, I can quickly get a sense of memory needed.
Yes. Constructing a new typed array initialises it with a buffer of the respective size, which allocates the memory.
In theory, an engine could defer the allocation until the buffer is actually used to store values, but I haven't seen such an optimisation anywhere. It probably happens on the OS level anyway when paging the virtual memory.

How do images impact the browser memory relative to JS?

I am curious about how media such as images relate to memory in the browser and the JavaScript heap. Explicitly, how many JS objects would be equivalent to a 2MB image?
Disclaimer - I acknowledge that question is too vague for a precise answer. I will provide some arbitrary constraints but ultimately I’m looking for a general way to look at this problem. Suppose I am building a chat app and considering keeping all the messages ever sent in memory, how might I calculate a tipping point for a reasonable number of messages to keep in memory? How do I factor in that some messages are images?
You may assume
The browser is chrome.
The image is in jpg format and loaded using new Image() in JS and then inserted into DOM.
Each object has about three to five key-value pairs. Key-value pairs are strings or numbers ranging from small ints to strings of about 10 ascii chars.
Suppose the 2MB image is visible on screen.
My intuition tells me that the memory cost of an image of 2MB is way less than the cost of JSObjects * (2MB / size_of_obj_in_bytes), because:
The bytes of the image are not in JS land and therefore somehow compressed and optimised away.
The objects will not exist in silo but references will be created throughout user code creating more memory when passed through functions.
There will be lots of garbage collection overhead.
I don't know for certain that I'm right and I certainly don't know by how much or even how to begin measuring this. And since you can't optimize what you can't measure...
Disclaimer 2 - Premature optimization is the root of all evil etc etc. Just curious about digging deeper into the internals.

Is there a way to get the address of a variable or object in javascript? [duplicate]

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.
If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?
It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.
You can using a side-channel, but you can't do anything useful with it other than attacking browser security!
The closest to virtual addresses are ArrayBuffers.
If one virtual address within an ArrayBuffer is identified,
the remaining addresses are also known, as both the addresses
of the memory and the array indices are linear.
Although virtual addresses are not themselves physical memory addresses, there are ways to translate virtual address into a physical memory address.
Browser engines allocate ArrayBuffers always page
aligned. The first byte of the ArrayBuffer is therefore at the
beginning of a new physical page and has the least significant
12 bits set to ‘0’.
If a large chunk of memory is allocated, browser engines typically
use mmap to allocate this memory, which is optimized to
allocate 2 MB transparent huge pages (THP) instead of 4 KB
pages.
As these physical pages are mapped on
demand, i.e., as soon as the first access to the page occurs,
iterating over the array indices results in page faults at the
beginning of a new page. The time to resolve a page fault is
significantly higher than a normal memory access. Thus, you can knows the index at which a new 2 MB page starts. At
this array index, the underlying physical page has the 21 least
significant bits set to ‘0’.
This answer is not trying to provide a proof of concept because I don’t have time for this, but I may be able to do so in the future. This answer is an attempt to point the right direction to the person asking the question.
Sources,
http://www.misc0110.net/files/jszero.pdf
https://download.vusec.net/papers/anc_ndss17.pdf
I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

How can I determine what objects are being collected by the garbage collector?

I have significant garbage collection pauses. I'd like to pinpoint the objects most responsible for this collection before I try to fix the problem. I've looked at the heap snapshot on Chrome, but (correct me if I am wrong) I cannot seem to find any indicator of what is being collected, only what is taking up the most memory. Is there a way to answer this empirically, or am I limited to educated guesses?
In chrome profiles takes two heap snapshots, one before doing action you want to check and one after.
Now click on second snapshot.
On the bottom bar you will see select box with option "summary". Change it to "comparision".
Then in select box next to it select snaphot you want to compare against (it should automaticaly select snapshot1).
As the results you will get table with data you need ie. "New" and "Deleted" objects.
With newer Chrome releases there is a new tool available that is handy for this kind of task:
The "Record Heap Allocations" profiling type. The regular "Heap SnapShot" comparison tool (as explained in Rafał Łużyński answers) cannot give you that kind of information because each time you take a heap snapshot, a GC run is performed, so GCed objects are never part of the snapshots.
However with the "Record Heap Allocations" tool constantly all allocations are being recorded (that's why it may slow down your application a lot when it is recording). If you are experiencing frequent GC runs, this tool can help you identify places in your code where lots of memory is allocated.
In conjunction with the Heap SnapShot comparison you will see that most of the time a lot more memory is allocated between two snapshots, than you can see from the comparison. In extreme cases the comparison will yield no difference at all, whereas the allocation tool will show you lots and lots of allocated memory (which obviously had to be garbage collected in the meantime).
Unfortunately the current version of the tool does not show you where the allocation took place, but it will show you what has been allocated and how it is was retained at the time of the allocation. From the data (and possibly the constructors) you will however be able to identify your objects and thus the place where they are being allocated.
If you're trying to choose between a few likely culprits, you could modify the object definition to attach themselves to the global scope (as list under document or something).
Then this will stop them from being collected. Which may make the program faster (they're not being reclaimed) or slower (because they build up and get checked by the mark-and-sweep every time). So if you see a change in performance, you may have found the problem.
One alternative is to look at how many objects are being created of each type (set up a counter in the constructor). If they're getting collected a lot, they're also being created just as frequently.
Take a look at https://developers.google.com/chrome-developer-tools/docs/heap-profiling
especially Containment View
The Containment view is essentially a "bird's eye view" of your
application's objects structure. It allows you to peek inside function
closures, to observe VM internal objects that together make up your
JavaScript objects, and to understand how much memory your application
uses at a very low level.
The view provides several entry points:
DOMWindow objects — these are objects considered as "global" objects
for JavaScript code; GC roots — actual GC roots used by VM's garbage
collector; Native objects — browser objects that are "pushed" inside
the JavaScript virtual machine to allow automation, e.g. DOM nodes,
CSS rules (see the next section for more details.) Below is the
example of what the Containment view looks like:

How can I get the memory address of a JavaScript variable?

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.
If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?
It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.
You can using a side-channel, but you can't do anything useful with it other than attacking browser security!
The closest to virtual addresses are ArrayBuffers.
If one virtual address within an ArrayBuffer is identified,
the remaining addresses are also known, as both the addresses
of the memory and the array indices are linear.
Although virtual addresses are not themselves physical memory addresses, there are ways to translate virtual address into a physical memory address.
Browser engines allocate ArrayBuffers always page
aligned. The first byte of the ArrayBuffer is therefore at the
beginning of a new physical page and has the least significant
12 bits set to ‘0’.
If a large chunk of memory is allocated, browser engines typically
use mmap to allocate this memory, which is optimized to
allocate 2 MB transparent huge pages (THP) instead of 4 KB
pages.
As these physical pages are mapped on
demand, i.e., as soon as the first access to the page occurs,
iterating over the array indices results in page faults at the
beginning of a new page. The time to resolve a page fault is
significantly higher than a normal memory access. Thus, you can knows the index at which a new 2 MB page starts. At
this array index, the underlying physical page has the 21 least
significant bits set to ‘0’.
This answer is not trying to provide a proof of concept because I don’t have time for this, but I may be able to do so in the future. This answer is an attempt to point the right direction to the person asking the question.
Sources,
http://www.misc0110.net/files/jszero.pdf
https://download.vusec.net/papers/anc_ndss17.pdf
I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

Categories

Resources