Javascript memory representation

Javascript memory representation - javascript

Is there any way to dump the environment records at some point during the execution of a Javascript program ?
I want to detect if two variables, or object properties are pointing to the same address, thus potentially producing "side-effects".
I think one way to do it, is to get the bindings allocation address from an environment record.
Any tools are welcome.
Thanks.

In Firefox/Spidermonkey the thing you're looking for is called GC/CC logs. You can dump it from the browser (e.g. via about:memory) or from the command-line JS shell.
When you do, you'll find that a typical JS program has a rather large and complex graph of objects and properties, so finding the aliasing cases that you're interested in will be hard.
If, on the other hand, you have the list of object references you're interested in, checking with === is enough. (See also Equality comparisons and sameness on MDN.

Related

any way to get javascript variable memory address?

is there any way to get the memory address of the JavaScript variable , cause when comparing things like [1,2,3] === [1,2,3] which is looks like true but it is false , and also this memory address will help me to understand mutable and immutability of strings too

JavaScript itself is meant to be implementation-agnostic, so concepts like memory addresses are intentionally absent from the language itself. Outside of the language, you can use the browser's debugging tools to take a memory snapshot, and that might contain the information. Note, however, that there is no real guarantee that an object will retain its address.

Is there a way to get the address of a variable or object in javascript? [duplicate]

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.

If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?

It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.

You can using a side-channel, but you can't do anything useful with it other than attacking browser security!
The closest to virtual addresses are ArrayBuffers.
If one virtual address within an ArrayBuffer is identified,
the remaining addresses are also known, as both the addresses
of the memory and the array indices are linear.
Although virtual addresses are not themselves physical memory addresses, there are ways to translate virtual address into a physical memory address.
Browser engines allocate ArrayBuffers always page
aligned. The first byte of the ArrayBuffer is therefore at the
beginning of a new physical page and has the least significant
12 bits set to ‘0’.
If a large chunk of memory is allocated, browser engines typically
use mmap to allocate this memory, which is optimized to
allocate 2 MB transparent huge pages (THP) instead of 4 KB
pages.
As these physical pages are mapped on
demand, i.e., as soon as the first access to the page occurs,
iterating over the array indices results in page faults at the
beginning of a new page. The time to resolve a page fault is
significantly higher than a normal memory access. Thus, you can knows the index at which a new 2 MB page starts. At
this array index, the underlying physical page has the 21 least
significant bits set to ‘0’.
This answer is not trying to provide a proof of concept because I don’t have time for this, but I may be able to do so in the future. This answer is an attempt to point the right direction to the person asking the question.
Sources,
http://www.misc0110.net/files/jszero.pdf
https://download.vusec.net/papers/anc_ndss17.pdf

I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

How can I determine what objects are being collected by the garbage collector?

I have significant garbage collection pauses. I'd like to pinpoint the objects most responsible for this collection before I try to fix the problem. I've looked at the heap snapshot on Chrome, but (correct me if I am wrong) I cannot seem to find any indicator of what is being collected, only what is taking up the most memory. Is there a way to answer this empirically, or am I limited to educated guesses?

In chrome profiles takes two heap snapshots, one before doing action you want to check and one after.
Now click on second snapshot.
On the bottom bar you will see select box with option "summary". Change it to "comparision".
Then in select box next to it select snaphot you want to compare against (it should automaticaly select snapshot1).
As the results you will get table with data you need ie. "New" and "Deleted" objects.

With newer Chrome releases there is a new tool available that is handy for this kind of task:
The "Record Heap Allocations" profiling type. The regular "Heap SnapShot" comparison tool (as explained in Rafał Łużyński answers) cannot give you that kind of information because each time you take a heap snapshot, a GC run is performed, so GCed objects are never part of the snapshots.
However with the "Record Heap Allocations" tool constantly all allocations are being recorded (that's why it may slow down your application a lot when it is recording). If you are experiencing frequent GC runs, this tool can help you identify places in your code where lots of memory is allocated.
In conjunction with the Heap SnapShot comparison you will see that most of the time a lot more memory is allocated between two snapshots, than you can see from the comparison. In extreme cases the comparison will yield no difference at all, whereas the allocation tool will show you lots and lots of allocated memory (which obviously had to be garbage collected in the meantime).
Unfortunately the current version of the tool does not show you where the allocation took place, but it will show you what has been allocated and how it is was retained at the time of the allocation. From the data (and possibly the constructors) you will however be able to identify your objects and thus the place where they are being allocated.

If you're trying to choose between a few likely culprits, you could modify the object definition to attach themselves to the global scope (as list under document or something).
Then this will stop them from being collected. Which may make the program faster (they're not being reclaimed) or slower (because they build up and get checked by the mark-and-sweep every time). So if you see a change in performance, you may have found the problem.
One alternative is to look at how many objects are being created of each type (set up a counter in the constructor). If they're getting collected a lot, they're also being created just as frequently.

Take a look at https://developers.google.com/chrome-developer-tools/docs/heap-profiling
especially Containment View
The Containment view is essentially a "bird's eye view" of your
application's objects structure. It allows you to peek inside function
closures, to observe VM internal objects that together make up your
JavaScript objects, and to understand how much memory your application
uses at a very low level.
The view provides several entry points:
DOMWindow objects — these are objects considered as "global" objects
for JavaScript code; GC roots — actual GC roots used by VM's garbage
collector; Native objects — browser objects that are "pushed" inside
the JavaScript virtual machine to allow automation, e.g. DOM nodes,
CSS rules (see the next section for more details.) Below is the
example of what the Containment view looks like:

lightweight javascript to javascript parser

How would I go about writing a lightweight javascript to javascript parser. Something simple that can convert some snippets of code.
I would like to basically make the internal scope objects in functions public.
So something like this
var outer = 42;
window.addEventListener('load', function() {
var inner = 42;
function magic() {
var in_magic = inner + outer;
console.log(in_magic);
}
magic();
}, false);
Would compile to
__Scope__.set('outer', 42);
__Scope__.set('console', console);
window.addEventListener('load', constructScopeWrapper(__Scope__, function(__Scope__) {
__Scope__.set('inner', 42);
__Scope__.set('magic',constructScopeWrapper(__Scope__, function _magic(__Scope__) {
__Scope__.set('in_magic', __Scope__.get('inner') + __Scope__.get('outer'));
__Scope__.get('console').log(__Scope__.get('in_magic'));
}));
__Scope__.get('magic')();
}), false);
Demonstation Example
Motivation behind this is to serialize the state of functions and closures and keep them synchronized across different machines (client, server, multiple servers). For this I would need a representation of [[Scope]]
Questions:
Can I do this kind of compiler without writing a full JavaScript -> (slightly different) JavaScript compiler?
How would I go about writing such a compiler?
Can I re-use existing js -> js compilers?

I don't think your task is easy or short given that you want to access and restore all the program state. One of the issues is that you might have to capture the program state at any moment during a computation, right? That means the example as shown isn't quite right; that captures state sort of before execution of that code (except that you've precomputed the sum that initializes magic, and that won't happen before the code runs for the original JavaScript). I assume you might want to capture the state at any instant during execution.
The way you've stated your problem, is you want a JavaScript parser in JavaScript.
I assume you are imagining that your existing JavaScript code J, includes such a JavaScript parser and whatever else is necessary to generate your resulting code G, and that when J starts up it feeds copies of itself to G, manufacturing the serialization code S and somehow loading that up.
(I think G is pretty big and hoary if it can handle all of Javascript)
So your JavaScript image contains J, big G, S and does an expensive operation (feed J to G) when it starts up.
What I think might serve you better is a tool G that processes your original JavaScript code J offline, and generates program state/closure serialization code S (to save and restore that state) that can be added to/replace J for execution. J+S are sent to the client, who never sees G or its execution. This decouples the generation of S from the runtime execution of J, saving on client execution time and space.
In this case, you want a tool that will make generation of such code S easiest. A pure JavaScript parser is a start but isn't likely enough; you'll need symbol table support to know which function code is connected a function call F(...), and which variable definition in which scope corresponds to assignments or accesses to a variable V. You may need to actually modify your original code J to insert points of access where the program state can be captured. You may need flow analysis to find out where some values went. Insisting all of this in JavaScript narrows your range of solutions.
For these tasks, you will likely find a program transformation tool useful. Such tools contain parsers for the langauge of interest, build ASTs representing the program, enable the construction of identifier-to-definition maps ("symbol tables"), can carry out modifications to the ASTs representing insertion of access points, or synthesis of ASTs representing your demonstration example, and then regenerate valid JavaScript code containing the modified J and the additions S.
Of all the program transformation systems that I know about (which includes all the ones at the Wikipedia site), none are implemented in JavaScript.
Our DMS Software Reengineering Toolkit is such a program transformation system offering all the features I just described. (Yes, its big and hoary; it has to be to handle the complexities of real computer languages). It has a JavaScript front end that contains a complete JavaScript parser to ASTs, and the machinery to regenerate JavaScript code from modified or synthesized ASTs. (Also big and hoary; good thing that hoary + hoary is still just hoary). Should it be useful, DMS also provides support for building control and dataflow analysis.

If you want something with a simple interface, you could try node-burrito: https://github.com/substack/node-burrito
It generates an AST using the uglify-js parser and then recursively walks the nodes. All you have to do is give a single callback which tests each node. You can alter the ones you need to change, and it outputs the resulting code.

I'd try to look for an existing parser to modify. Perhaps you could adapt JSLint/JSHint?

There is a problem with the rewriting above, you're not hoisting the initialization of magic to the top of the scope.
There's a number of projects out there that parse JavaScript.
Crock's Pratt parser which works well on JavaScript that fits within "The good parts" and less well on other JS.
The es-lab parser based on ometa which handles the full grammar including a lot of corner cases that Crock's parser misses. It may not perform as well as Crock's.
narcissus parser and evaluator. I don't have much experience with this.
There are also a number of high-quality lexers for JavaScript that let you manipulate JS at the token level. This can be tougher than it sounds though since JavaScript is not lexically regular, and predicting semicolon insertion is difficult without a full parse.
My es5-lexer is a carefully constructed and efficient lexer for EcmaScript 5 that provides the ability to tokenize JavaScript. It is heuristic where JavaScript's grammar is not lexically regular but the heuristic is very good and it provides a means to transform a token stream so that an interpreter is guaranteed to interpret it the way the lexer interpreted the tokens so if you don't trust your input, you can still be sure that the interpretation underlying the security transformations is sound even if not correct according to the spec for some bizarre inputs.

Your problem seams to be in same family of problems as what is solved with the JS Opfuscators and JS Compressors -- they as well as you need to be able to parse and reformat the JS to an equivalent script;
There was a good discussion on obfuscators here and the possible solution to your problem could be to leverage the parse and generator part from one of the FOSS versions.
One callout, your example code does not take into account the scopes of the variables you want to set/get and that will eventually become a problem that you will have to solve.
Addition
Given the scope problem for closure defined functions, you are probably unlikely to be able to solve this problem as a static parsing problem, as the scope variables outside the closure will have to be imported/exported to resolve/save and re-instantiate scope. Hence you may need to dig into the evaluation engine itself, and perhaps get the V8 engine and make a hack to the interpreter itself -- that is assuming that you do not need this to be generic cross all script engines and that you can tie it down to a single implementation which you control.

How can I get the memory address of a JavaScript variable?

Is it possible to find the memory address of a JavaScript variable? The JavaScript code is part of (embedded into) a normal application where JavaScript is used as a front end to C++ and does not run on the browser. The JavaScript implementation used is SpiderMonkey.

If it would be possible at all, it would be very dependent on the javascript engine. The more modern javascript engine compile their code using a just in time compiler and messing with their internal variables would be either bad for performance, or bad for stability.
If the engine allows it, why not make a function call interface to some native code to exchange the variable's values?

It's more or less impossible - Javascript's evaluation strategy is to always use call by value, but in the case of Objects (including arrays) the value passed is a reference to the Object, which is not copied or cloned. If you reassign the Object itself in the function, the original won't be changed, but if you reassign one of the Object's properties, that will affect the original Object.
That said, what are you trying to accomplish? If it's just passing complex data between C++ and Javascript, you could use a JSON library to communicate. Send a JSON object to C++ for processing, and get a JSON object to replace the old one.

I think it's possible, but you'd have to:
download the node.js source code.
add in your function manually (like returning the memory address of a pointer, etc.)
compile it and use it as your node executable.

Develop Reference

JavaScript is the programming language of the Web.