How v8 handle stack allocated variable in closure? - javascript

I read a lot of articles that say "v8 uses stack for allocating primitive like numbers".
I also ready about the CG works only for the heap allocation.
But if I combine the stack allocated variables with the closure, who is in changed to free the stack allocated variable?
For example:
function foo() {
const b = 5;
return function bar(x) {
return x * b
}
}
// This invocation allocate in the stack the variable `b`
// in the head the code of `bar`
const bar = foo()
// here the `b` should be freed
// here `b` is used, so should not be free
bar()
how it works?
How can bar function point to b if b lives in the stack?
How is the [[Environment]] built here?

(V8 developer here.)
I don't know where this myth is coming from that "primitives are allocated on the stack". It's generally false: the regular case in JavaScript is that everything is allocated on the heap, primitive or not.
There may be implementation-specific special cases where some heap allocations can be optimized out and/or replaced by stack allocations, but that's the exception, not the rule; and it's never directly observable (i.e. never changes behavior, only performance), because that's the general rule for all internal optimizations.
To dive deeper, we need to distinguish two concepts: variables themselves, and the things they point at.
A variable can be thought of as a pointer. In other words, it's not in itself the "container" or "space" where an object is allocated; instead it's a reference that points at an object that's allocated elsewhere. All variables have the same size (1 pointer), the things they point at can vary wildly in size. One illustrating consequence is that the same variable can point at different things over time, for example you could have a loop over an array where element = array[i] points at each array element in turn.
In modern, high-performance JS engines, function-local variables are usually stored on the stack (regardless of what they point at!). That's because this is both fast and convenient. So while this is technically still an implementation-specific exception to the rule that everything is allocated on the heap, it's a fairly common exception.
As you rightly observe, storing variables on the stack doesn't work if they need to survive the function that created them. Therefore, JavaScript engines perform analysis passes to find out which variables are referenced from nested closures, and store these variables on the heap right away, in order to allow them to stay around as long as they are needed.
I wouldn't be surprised if an engine that prefers simplicity over performance chose to always store all variables on the heap, so it wouldn't have to distinguish several cases.
Regarding the value that the variable points at: that's always on the heap, regardless of its type or primitive-ness (with exceptions to the rule, see below).
var a = true --> true is on the heap.
var b = "hello" --> "hello" is on the heap.
var c = 42.2 --> 42.2 is on the heap.
var d = 123n --> 123n is on the heap.
var e = new Object(); --> the object is on the heap.
Again, there are engine-specific cases where heap allocations can be optimized out under the right circumstances. For example, V8 (inspired by some other VMs) has a well-known trick where it can store small integers ("Smis") directly in the pointer using a tag bit, so in this case the pointer doesn't actually point at a value, the pointer is the value so to speak. An alternative trick is called "NaN-boxing", it's used e.g. by Spidermonkey and has the effect that all JS Numbers can be stored directly in the pointer (or technically the other way round: everything's a Number in this approach, and pointers are stored as special numbers).
As another example, once a function gets hot enough for optimization, an optimizing compiler may be able to figure out that a given object isn't accessible outside the function and hence doesn't need to be allocated at all; if necessary some of the object's properties will be held in registers or on the stack for the part of the function where they are needed.
So, to summarize the above:
"All primitives are allocated on the stack" is incorrect. Most primitives are allocated on the heap.
Sometimes, an engine can avoid allocations (of both primitives and objects), which may or may not mean that the respective value is briefly held on the stack (it could also be eliminated entirely, or only ever held in registers). Such optimizations never change observable behavior; in cases where doing the optimization would affect behavior, the optimization can't be applied.
Variables, regardless of what they refer to, are stored on the heap or on the stack or not at all, depending on the requirements of the situation.

Related

Does a Javascript closure retain the entire parent lexical environment or only the subset of values the closure references? [duplicate]

This question already has answers here:
About closure, LexicalEnvironment and GC
(3 answers)
Closed 3 years ago.
Consider the following example:
function makeFunction() {
let x = 3;
let s = "giant string, 100 MB in size";
return () => { console.log(x); };
}
// Are both x and s held in memory here
// or only x, because only x was referred to by the closure returned
// from makeFunction?
let made = makeFunction();
// Suppose there are no further usages of makeFunction after this point
// Let's assume there's a thorough GC run here
// Is s from makeFunction still around here, even though made doesn't use it?
made();
So if I close around just one variable from a parent lexical environment, is that variable kept around or is every sibling variable in its lexical environment also kept around?
Also, what if makeFunction was itself nested inside another outer function, would that outer lexical environment be retained even though neither makeFunction nor makeFunction's return value referred to anything in that outer lexical environment?
I'm asking for performance reasons - do closures keep a bunch of stuff around or only what they directly refer to? This impacts memory usage and also resource usage (e.g. open connections, handles, etc.).
This would be mostly in a NodeJS context, but could also apply in the browser.
V8 developer here. This is a bit complicated ;-)
The short answer is: closures only keep around what they need.
So in your example, after makeFunction has run, the string referred to by s will be eligible for garbage collection. Due to how garbage collection works, it's impossible to predict when exactly it'll be freed; "at the next garbage collection cycle". Whether makeFunction runs again doesn't matter; if it does run again, a new string will be allocated (assuming it was dynamically computed; if it's a literal in the source then it's cached). Whether made has already run or will run again doesn't matter either; what matters is that you have a variable referring to it so you could run it (again). Engines generally can't predict which functions will or won't be executed in the future.
The longer answer is that there are some footnotes. For one thing, as comments already pointed out, if your closure uses eval, then everything has to be kept around, because whatever source snippet is eval'ed could refer to any variable. (What one comment mentioned about global variables that could be referring to eval is not true though; there is a semantic difference for "global eval", a.k.a. "indirect eval": it cannot see local variables. Which is usually considered an advantage for both performance and debuggability -- but even better is to not use eval at all.)
The other footnote is that somewhat unfortunately, the tracking is not as fine-grained as it could be: each closure will keep around what any closure needs. We have tried fixing this, but as it turns out finer-grained tracking causes more memory consumption (for metadata) and CPU consumption (for doing the work) and is therefore usually not worth it for real code (although it can have massive impact on artificial tests stressing precisely this scenario). To give an example:
function makeFunction() {
let x = 3;
let s = "giant string, 100 MB in size";
let short_lived = function() { console.log(s.length); }
// short_lived(); // Call this or don't, doesn't matter.
return function long_lived() { console.log(x); };
}
let long_lived = makeFunction();
With this modified example, even though long_lived only uses x, short_lived does use s (even if it's never called!), and there is only one bucket for "local variables from makeFunction that are needed by some closure", so that bucket keeps both x and s alive. But as I said earlier: real code rarely runs into this issue, so this is usually not something you have to worry about.
Side note:
and also resource usage (e.g. open connections, handles, etc.)
As a very general statement (i.e., in any language or runtime environment, regardless of closures or whatnot), it's usually advisable not to rely on garbage collection for resource management. I recommend to free your resources manually and explicitly as soon as it is appropriate to free them.

Uninitialized variable memory allocation

JavaScript Example:
Suppose I do this:
var i;
And never use i anywhere else in the program. Will it be allocated any memory?
Or if I use, say i=2; after some lines.... will it be allocated memory at this point, or is the memory allocated during the creation of i?
C# example:
Suppose I do this:
dynamic i;
And never use i anywhere else in the program. Will it be allocated any memory (and if it will be, when? During compilation?)?
Or if I use, say i=2; after some lines.... will it be allocated memory at this point, or is the memory allocated during the creation of i, or is it allocated during compilation?
Also, would there be any other differences regarding memory allocation in the two examples above except the differences that arise due to the fact that JavaScript is an interpreted language and C# is a compiled language?
In C#, the expression:
var i;
can't be compiled in the first place; if we consider instead:
int i; // or dynamic i;
then that can be compiled and may or may not be retained, but it depends on whether it is a field (object variable) versus a local (method variable). Fields are not removed; however, the compiler is free to remove local variables as it sees fit. Whether it chooses to do so can depend on a lot of things, but most notably: whether you are doing an optimized release build, versus a debug build. Even if a local variable is clearly both written and read, the compiler can still remove it if it chooses - of course, the value will still exist on the stack, but not in a reserved location.
When the Javascript interpreter parses var i; and then executes the containing scope, it has to store the fact somewhere that the i variable is now defined in the current scope. Futures references in this scope will access this particular variable in this scope. Though implementation details are left to the implementor, the variable i is likely added to a particular scope object and thus has to consume some memory.
It is possible that if the variable is not referenced and it is in a contained scope without the use of things like eval() that the JS engine may be able to optimize it away. Whether or not it actually thinks it can do that and actually does so would have to be discovered by testing or studying of the source code.
Individual variables like this would likely consume only very small amounts of memory. For this to be of major consequence, you would likely have to have thousands of these.

What methods are there to prevent the Javascript gc from running as little as possible?

I'm learning Javascript, and in the various texts the authors will speak of javascript using a mark and sweep gc to deallocate objects from memory. They will also speak of how if you set the value a variable references to null it will delete the reference to that value, allowing the allocated space to be set for gc. This SO answer says that you can remove the allocated memory and the reference by setting the value the variable contains to null and then to undefined, effectively removing the allocated space from the heap (if I understood it correctly).
So my question is this: Is it possible to write javascript in such a way that you can eliminate gc?
(If it is implementation specific I would like to know if it is possible on v8, though if this is possible on rhino or other js implementations that would be of immense use too)
Judging by projects like LLJS my request isn't too unreasonable, but I'm not entirely sure how the memory module does it.
I've always found it helpful if I explain why I'm asking so here it goes. I really like compilers, and I wanted to write a compile-to-js language that leveraged a static inferred typing system similar to SML. The reason why I wanted to write my own was because I wanted to utilize region inference to determine exactly when objects and variables come out of scope (as much as possible) and upon leaving scope remove it from the heap, thereby eliminating as much gc as possible. This is mostly a research project (read: because I can) so any resources on memory optimization in javascript would also be greatly appreciated!
EDIT: I guess another way to phrase it would be "Is it possible to write js in such a way that the gc will deterministically never run (as much as possible)? If so what techniques would be involved?"
I'm not looking per se for delete because that marks the element for deletion thereby invoking what I wanted to (try to) avoid, I was curious if the implementation's gc would run if I removed all references (and the value) associated with the variable.
Alternatively, paraphrasing from the referenced SO Answer:
x = foo;
x = null;
x;
Is x still on the heap?
It's not entirely clear what you're looking for.
The standard Javascript implementations have NO way of manually deallocating memory. You can remove a property with the delete operator, but that just removes the property from the object. It doesn't even free any contents that the property points to (that is left for garbage collection if there are no other references to that data).
Javascript was designed from the ground up to be a garbage collected language. It frees things from physical memory only when the garbage collector runs and that garbage collector finds objects that are unreachable (e.g. there are no references to those objects still in use). The language does not contain commands to free memory.
It is possible (in some JS implementations) to call the GC yourself rather than wait for the JS engine to run it, but it's still using GC to decide what to free.
Responding to the additional things you added to your answer.
To the best I know, javascript only cleans things up when the GC runs. Until then objects are marked such that the GC can see that there are no references to them anywhere, but they aren't actually freed until the GC checks and notices this. Further, local variables in a function scope are themselves a type of object and those are not freed until the GC runs and notices that there are no references to the function scope (in JS, a closure can maintain a reference to a function scope even after the function has completed).
So, in your code example:
x = foo; x = null; x;
x is still alive and occupying some space because it's still in scope and code could still reach it. It's contents will be null which presumably takes no extra space beyond the variable itself, but the space for the variable itself won't be freed until the function context it is in is found to be reference free by the garbage collector.
JS is a garbage collected language. That's when things are actually freed from the heap. There are no instructions in the language to free things anytime sooner.
The delete keyword will trigger garbage collection by the browser. Be aware that it deletes entire chains of objects unless you nullify object references.
var o = {...};
delete o;

Variables vs Expressions as Javascript Arguments

Are there any performance issues with passing an argument as an expression, instead of first making it a variable?
someFunction( x+2 );
vs.
var total = x+2;
someFunction( total );
And how about functions?
someFunction( someOtherFunction() );
No. And, more important, this sort of micro-optimization is (almost certainly) meaningless.
Having said that, if you were to use the result of the expression more than once, then there might be some completely imperceptible and totally not-worth-worrying-about benefit to saving the result of the calculation.
Write it to be readable. Don't worry about this stuff.
Just the obvious: Making a variable creates a variable. This costs memory and consumes some time when executing. Afterwards, it either will need time to garbage collect it, or not free the memory if your function leaks.
However, you won't notice any differences. The performance is not measurable at that level. Rule of thumb: Use variables when you really need them or when they improve readabilty of your code.
Though the difference is minimal, the answer is really implementation-specific; JavaScript engines almost certainly differ in how they allocate things. However, I can tell you that most likely, the differences are similar to what they would be in most other languages of which I can examine the memory and processor registers in the debugger. Let's examine one scenario:
var sum = x+2;
someFunction(sum);
This allocates memory to hold sum, which hangs around as long as the function is in scope. If the function ends up being a closure, this could be forever. In a recursive function this could be significant.
someFunction(x+2);
In most languages, this will compute x+2 on the stack and pass the result to someFunction. No memory is left hanging around.
The answer would be exactly the same for a function return value.
So in summary:
The exact answer depends on the JavaScript engine's implementation.
Most likely you won't notice a performance difference.
You may want to use variables when the result is re-used, or, when you want to examine the result easily in the debugger.
It's mostly a matter of personal preference.
Creating a local variable whose scope does not extend beyond the current function does not incur any cost compared to not creating one and writing the expression directly as the argument to the function. In fact, nothing tells you that writing someFunction(x*2) won't be transformed to code that binds a variable to the result of x*2 internally by the javascript compiler - indeed, many compilers and JITs use SSA as one of their intermediate representations, in which form a variable is always bound to the result of every sub expression. See the relevant Wikipedia entry. Whether in a closure or not makes no difference.
The only two relevant questions that should concern you to make a choice between introducing a new variable and writing the expression as an argument directly are:
Readability: does naming the result of the expression make clearer what the expression is computing;
Cost of evaluating the expression: if you will be writing the expression more than once, then binding a variable to the result will you to reuse avoid recomputing the result everytime. This is only relevant if your expression is expected to take a long time to compute.
If you only need to write the expression once inside a function definition then binding a variable to the result may well make the result live in memory longer than is strictly necessary, but this is nearly always completely irrelevant: most function calls are very short lived, in many cases the result does not take up much memory and the memory allocated on the stack will be reclaimed upon function exit and the memory allocated on the heap will be reclaimed by garbage collector soon thereafter.

objects in javascript

Primitive values are stored in a stack in javascript but objects are stored in a heap. I understand why to store primitives in stack but any reason why objects are stored in heaps?
Actually, in JavaScript even primitives are stored in the heap rather than on a stack (see note below the break below, though). When control enters a function, an execution context (an object) for that call to the function is created, which has a variable object. All vars and arguments to the function (plus a couple of other things) are properties of that anonymous variable object, exactly like other properties of named objects. A call stack is used, but the spec doesn't require the stack be used for "local" variable storage, and JavaScript's closures would make using a stack a'la C, C++, etc. for that impractical. Details in the spec.
Instead, a chain (linked list) is used. When you refer to an unqualified symbol, the interpreter checks the variable object for the current execution context to see if it has a property for that name. If so, it gets used; if not, the next variable object in the scope chain is checked (note that this is in the lexical order, not the call order like a call stack), and so on until the global execution context is reached (the global execution context has a variable object just like any other execution context does). The variable object for the global EC is the only one we can directly access in code: this points to it in global scope code (and in any function called without this being explicitly set). (On browsers, we have another way of accessing it directly: The global variable object has a property called window that it uses to point to itself.)
Re your question of why objects are stored in the heap: Because they can be created and released independently of one another. C, C++, and others that use a stack for local variables can do so because variables can (and should) be destroyed when the function returns. A stack is a nice efficient way to do that. But objects aren't created an destroyed in that straightforward a way; three objects created at the same time can have radically different lifecycles, so a stack doesn't make sense for them. And since JavaScript's locals are stored on objects and those objects have a lifecycle that's (potentially) unrelated to the function returning...well, you get the idea. :-) In JavaScript, the stack is pretty much just for return addresses.
However, it's worth noting that just because things are as described above conceptually, that doesn't mean that an engine has to do it that way under the hood. As long as it works externally as described in the spec, implementations (engines) are free to do what they like. I understand that V8 (Google's JavaScript engine, used in Chrome and elsewhere) does some very clever things, like for instance using the stack for local variables (and even local object allocations within the function) and then only copying those out into the heap if necessary (e.g., because the execution context or individual objects on it survive the call). You can see how in the majority of cases, this would minimize heap fragmentation and reclaim memory used for temporaries more aggressively and efficiently than relying on GC, because the execution context associated with most function calls doesn't need to survive the call. Let's look at an example:
function foo() {
var n;
n = someFunctionCall();
return n * 2;
}
function bar() {
var n;
n = someFunction();
setCallback(function() {
if (n === 2) {
doThis();
}
else {
doThat();
}
});
}
In the above, an engine like V8 that aggressively optimizes can detect that the conceptual execution context for a call to foo never needs to survive when foo returns. So V8 would be free to allocate that context on the stack, and use a stack-based mechanism for cleanup.
In contrast, the execution context created for a call to bar has to stick around after bar returns, because there's a closure (the anonymous function we passed into setCallback) relying on it. So when compiling bar (because V8 compiles to machine code on-the-fly), V8 may well use a different strategy, actually allocating the context object in the heap.
(If either of the above had used eval in any way, by the way, it's likely V8 and other engines don't even attempt any form of optimization, because eval introduces too many optimization failure modes. Yet another reason not to use eval if you don't have to, and you almost never have to.)
But these are implementation details. Conceptually, things are as described above the break.
The size of objects can grow dynamically. Therefore, you would need to adjust their memory requirements. That is why, they are stored in heap.
Both primitive values and objects are always stored in some other object - they are properties of some object.
There is not one primitive value / object that is not a property of another object. (The only exception here is the global object).

Categories

Resources