Understanding javascript closure variable capture in v8 - javascript

I understand the semantics that a closure holds a reference to a variable lengthen it's life cycle, makes primitive variables not limited by calling stack, and thus those variables captured by closures should be specially treated.
I also understand variables in same scope could be differently treated depends on whether it was captured by closures in now-days javascript engine. for example,
function foo(){
var a=2;
var b=new Array(a_very_big_number).join('+');
return function(){
console.log(a);
};
}
var b=foo();
as no one hold a reference to b in foo, there's no need to keep b in memory, thus memory used could be released as soon as foo returns(or even never created under furthur optimization).
My question is, why v8 seems to pack all variables referenced by all closures together in each calling context? for example,
function foo(){
var a=0,b=1,c=2;
var zig=function(){
console.log(a);
};
var zag=function(){
console.log(b);
};
return [zig,zag];
}
both zig and zag seems to hold a reference to a and b, even it's apparent that b is not available to zig. This could be awful when b is very big, and zig persists very long.
But stands on the point of view of the implementation, I can not understand why this is a must. Based on my knowledge, without calling eval, the scope chain can be determined before excution, thus the reference relationship can be determined. The engine should aware that when zig is no longer available, nether do a so the engine mark it as garbage.
Both chrome and firefox seems to obey the rule. Does standard say that any implementation must do this? Or this implementation is more practical, more efficient? I'm quite puzzling.

The main obstacle is mutability. If two closures share the same var then they must do so in a way that mutating it from one closure is visible in the other. Hence it is not possible to copy the values of referenced variables into each closure environment, like functional languages would do (where bindings are immutable). You need to share a pointer to a common mutable heap location.
Now, you could allocate each captured variable as a separate cell on the heap, instead of one array holding all. However, that would often be more expensive in space and time because you'd need multiple allocations and two levels of indirection (each closure points to its own closure environment, which points to each shared mutable variable cell). With the current implementation it's just one allocation per scope and one indirection to access a variable (all closures within a single scope point to the same mutable variable array). The downside is that certain life times are longer than you might expect. It's a trade-off.
Other considerations are implementation complexity and debuggability. With dubious features like eval and expectations that debuggers can inspect the scope chain, the scope-based implementation is more tractable.

The standard doesn't say anything about garbage collection, but gives some clues of what should happen.
Reference : Standard
An outer Lexical Environment may, of course, have its own outer
Lexical Environment. A Lexical Environment may serve as the outer
environment for multiple inner Lexical Environments. For example, if a
Function Declaration contains two nested Function Declarations then
the Lexical Environments of each of the nested functions will have as
their outer Lexical Environment the Lexical Environment of the current
execution of the surrounding function."
Section 13 Function definition
step 4: "Let closure be the result of creating a new Function object as specified in 13.2"
Section 13.2 "a Lexical Environment specified by Scope" (scope = closure)
Section 10.2 Lexical Environments:
"The outer reference of a (inner) Lexical Environment is a reference to the Lexical Environment that logically surrounds the inner Lexical Environment.
So, a function will have access to the environment of the parent.

Related

Closures and access to parent variables no longer on the execution stack

New to closures and the inner workings of JS. I have a somewhat stable grasp of execution contexts and the associated objects within it. And while I know how to identify a closure and what it may yield, I don't quite see where the parent variables reside, once the parent function is popped from the stack.
I'd assume they become properties of the closure's variable object? But you know where that leads.
I don't quite see where the parent variables reside
They reside in scope. How that is expressed in the computer's memory is an implementation detail of the specific JavaScript runtime (which is probably written in C or C++).

How does hoisting work if JavaScript is an interpreted language?

My understanding of an interpreter is that it executes program line by line and we can see the instant results, unlike compiled languages which convert code, then executes it.
My question is, in Javascript, how does interpreter come to know that a variable is declared somewhere in the program and logs it as undefined?
Consider the program below:
function do_something() {
console.log(bar); // undefined (but in my understanding about an interpreter, it should be throwing error like variable not declared)
var bar = 111;
console.log(bar); // 111
}
Is implicitly understood as:
function do_something() {
var bar;
console.log(bar); // undefined
bar = 111;
console.log(bar); // 111
}
How does this work?
This concept of 'var hoisting' is quite a confusing one if you think of it on the surface. You have to delve into how the language itself works. JavaScript, which is an implementation of ECMAScript, is an interpreted language, meaning all the code you write is fed into another program that in turn, interprets the code, calling certain functions based on parts of your source code.
For example, if you write:
function foo() {}
The interpreter, once it meets your function declaration, will call a function of its own called FunctionDeclarationInstantiation that creates the function. Instead of compiling JavaScript into native machine code, the interpreter executes C, C++, and machine code of its own 'on demand' as each part of your JavaScript code is read. It does not necessarily mean line-by-line, all interpreted means it that no compilation into machine code happens. A separate program that executes machine code reads your code and executes that machine code on the fly.
How this has to with var declaration hoisting or any declaration for that matter, is that the interpreter first reads through all your code once without executing any actual code. It analyzes the code and separates it into chunks, called lexical environment. Per the ECMAScript 2015 Language Specification:
8.1 Lexical Environments
A Lexical Environment is a specification type used to define the association of Identifiers to specific variables and functions based upon the lexical nesting structure of ECMAScript code. A Lexical Environment consists of an Environment Record and a possibly null reference to an outer Lexical Environment. Usually a Lexical Environment is associated with some specific syntactic structure of ECMAScript code such as a FunctionDeclaration, a BlockStatement, or a Catch clause of a TryStatement and a new Lexical Environment is created each time such code is evaluated.
An Environment Record records the identifier bindings that are created within the scope of its associated Lexical Environment. It is referred to as the Lexical Environment’s EnvironmentRecord
Before any code is executed, the interpreter goes through your code and for every lexical structure, such as a function declaration, a new block, etc, a new lexical environment is created. And in those lexical environments, an environment record records all the variables declared in that environment, their value, and other information about that environment. That's what allows for JavaScript to manage variable scope, variable lookup chains, this value, etc.
Each lexical environment is associated with a code realm:
8.2 Code Realms
Before it is evaluated, all ECMAScript code must be associated with a Realm. Conceptually, a realm consists of a set of intrinsic objects, an ECMAScript global environment, all of the ECMAScript code that is loaded within the scope of that global environment, and other associated state and resources.
Every section of JavaScript/ECMAScript code you write is associated with a realm before any of the code is actually executed. Each realm consists of the intrinsic values used by the specific section of code associated with the realm, the this object for the realm, a lexical environment for the realm, among other things.
This means each lexical part of your code is analyzed before executing. Then a realm is created that houses all the information on that set of code. The source, what variables are needed to execute it, which variables have been declared, what this is, etc. In the case of var declarations, a realm is created, when you define a function like you did here:
function do_something() {
console.log(bar); // undefined
var bar = 111;
console.log(bar); // 111
}
Here, a FunctionDeclaration creates a new lexical environment, associated with a new realm. When a lexical environment is created, the interpreter analyzes the code and finds all declarations. Those declarations are then first processed at the very beginning of that lexical environment, thus the 'top' of the function:
13.3.2 Variable Statement
A var statement declares variables that are scoped to the running execution context’s VariableEnvironment. Var variables are created when their containing Lexical Environment is instantiated and are initialized to undefined when created.
Thus, whenever a lexical environment is instantiated (created), all the var declarations are created, initialized to undefined. That means they are processed before any code is executed, at the 'top' of the lexical environment:
var bar; //Processed and declared first
console.log(bar);
bar = 111;
console.log(bar);
Then, after all your JavaScript code is analyzed, it is finally executed. Because the declaration was processed first, it is declared (and initialized to undefined) giving you undefined.
Hoist is kind of a misnomer really. Hoist implies that the declarations are moved directly to the top of the current lexical environment, but instead the code is analyzed before execution; nothing is moved.
Note: let and const act in the same way and are also hoisted but this won't work:
function do_something() {
console.log(bar); //ReferenceError
let bar = 111;
console.log(bar);
}
This will give you a ReferenceError for trying to access an uninitialized variable. Even though let and const declarations are hoisted, the specification explicitly states that you cannot access them before they are initialized, unlike var:
13.3.1 Let and Const Declarations
let and const declarations define variables that are scoped to the running execution context’s LexicalEnvironment. The variables are created when their containing Lexical Environment is instantiated but may not be accessed in any way until the variable’s LexicalBinding is evaluated.
Thus, you can't access the variable until it is formally initialized, whether to undefined or any other value. That means you can't seemingly 'access it before it's declared' like you can with var.
"Interpreted" doesn't mean what you think it does.
Actually, "interpreted" here means more like "compiled on demand" and, rather than being compiled line by line (as you thought), it is compiled in units of executable code. Those units are first read into memory and then later, executed.
It's during these phases that the scope of the execution context becomes known, declarations are hoisted and identifiers resolved.
The particulars of the implementations of all this are not standardized and each vendor is free to implement them as they like.

What does a closure REALLY refer to?

I know how to use "closures"... but what does the word closure actually refer to?
On MDN the definition is that it is a function. Here is the page about closures. The first sentence is, "Closures are functions that refer to independent (free) variables." and in the first code example there is a comment highlighting the closure function. However, the second sentence seems to suggest that the closure is really the persistent scope that the inner function resides in. That's what this other stack overflow answer suggest, too (search word "persistent").
So, what is it? The function or the persistent scope?
Technically, a closure is the mechanism behind how persistent scope works in functions - in other words, it's the implementation. That's what it originally meant in terms of Lisp. And in some cases that's still what it means - just look at the explanations of what closures are in various Lisp dialects and they almost all try to explain it in terms of how the compiler/interpreter implements closures. Because that was how people used to explain scope.
The first time I came across a much simpler explanation of closures (that is, explaining the behavior instead of the mechanics) was in javascript. Now that more people are used to the idea of closures, the word itself has grown to mean:
the captured scope of an inner function allowing the function to refer to the variables of the outer function (this is the closest to the original meaning)
the inner function itself (typically in languages that call first-class functions or lambdas "closure")
the free variable captured by the closure
I personally prefer the last meaning because it captures the essence of what closures are: a form of variable sharing, kind of like globals. However, if you want to be pedantic, only the first meaning is actually the technical meaning: "closure" refers to the captured scope.
To answer your question as to whether the closure is the function or the environment, it's both - it's the combination of the function and the environment which it makes use of. It's especially important (and may really only exist) in the context of languages where functions can be defined (often anonymously) and passed around as parameters to other functions (witness, for example, the use of blocks in Smalltalk, anonymous functions in the Lisp/Scheme sphere, and etc). The issue is this: if global function A is created during the execution of function B and A references a parameter named P which was passed to B; and if A is subsequently stored in a variable V, and is then invoked through V after B has returned, A still needs to have access to P (which was a parameter to B, which has long since finished by the time A is executed) in order to do its work; thus, A must have a "closure over P" in order to do its job successfully. The "closure" is a combination of function A itself and the environment set up when A is compiled which stores a copy or reference or what-have-you of P so that A can do its job successfully.
Note that despite having used languages which make extensive use of closures (particularly Smalltalk and some Lisp variants) this is usually so well buried that it's beyond transparent - you don't even notice it. If you notice that there's a closure, IMO it's not well done.
YMMV.
Best of luck.

let statement application consequences

I found such an explanation why cycles with variable declared with var in node is faster than in chrome:
Within a web browser such as Chrome, declaring the variable i outside
of any function’s scope makes it global and therefore binds it as a
property of the window object. As a result, running this code in a web
browser requires repeatedly resolving the property i within the
heavily populated window namespace in each iteration of the for loop.
In Node.js, however, declaring any variable outside of any function’s
scope binds it only to the module’s own scope (not the window object)
which therefore makes it much easier and faster to resolve.
Curious about a let statement in Ecmascript6: does it make calculations faster using more block scope declared variables in loops or it is just a safety measure against name collisions?
The goal with let was to have better scoping mechanism in JavaScript (no more wrapping things in anonymous functions just for the sake of scoping). Any performance gains are just cherry on the top.

objects in javascript

Primitive values are stored in a stack in javascript but objects are stored in a heap. I understand why to store primitives in stack but any reason why objects are stored in heaps?
Actually, in JavaScript even primitives are stored in the heap rather than on a stack (see note below the break below, though). When control enters a function, an execution context (an object) for that call to the function is created, which has a variable object. All vars and arguments to the function (plus a couple of other things) are properties of that anonymous variable object, exactly like other properties of named objects. A call stack is used, but the spec doesn't require the stack be used for "local" variable storage, and JavaScript's closures would make using a stack a'la C, C++, etc. for that impractical. Details in the spec.
Instead, a chain (linked list) is used. When you refer to an unqualified symbol, the interpreter checks the variable object for the current execution context to see if it has a property for that name. If so, it gets used; if not, the next variable object in the scope chain is checked (note that this is in the lexical order, not the call order like a call stack), and so on until the global execution context is reached (the global execution context has a variable object just like any other execution context does). The variable object for the global EC is the only one we can directly access in code: this points to it in global scope code (and in any function called without this being explicitly set). (On browsers, we have another way of accessing it directly: The global variable object has a property called window that it uses to point to itself.)
Re your question of why objects are stored in the heap: Because they can be created and released independently of one another. C, C++, and others that use a stack for local variables can do so because variables can (and should) be destroyed when the function returns. A stack is a nice efficient way to do that. But objects aren't created an destroyed in that straightforward a way; three objects created at the same time can have radically different lifecycles, so a stack doesn't make sense for them. And since JavaScript's locals are stored on objects and those objects have a lifecycle that's (potentially) unrelated to the function returning...well, you get the idea. :-) In JavaScript, the stack is pretty much just for return addresses.
However, it's worth noting that just because things are as described above conceptually, that doesn't mean that an engine has to do it that way under the hood. As long as it works externally as described in the spec, implementations (engines) are free to do what they like. I understand that V8 (Google's JavaScript engine, used in Chrome and elsewhere) does some very clever things, like for instance using the stack for local variables (and even local object allocations within the function) and then only copying those out into the heap if necessary (e.g., because the execution context or individual objects on it survive the call). You can see how in the majority of cases, this would minimize heap fragmentation and reclaim memory used for temporaries more aggressively and efficiently than relying on GC, because the execution context associated with most function calls doesn't need to survive the call. Let's look at an example:
function foo() {
var n;
n = someFunctionCall();
return n * 2;
}
function bar() {
var n;
n = someFunction();
setCallback(function() {
if (n === 2) {
doThis();
}
else {
doThat();
}
});
}
In the above, an engine like V8 that aggressively optimizes can detect that the conceptual execution context for a call to foo never needs to survive when foo returns. So V8 would be free to allocate that context on the stack, and use a stack-based mechanism for cleanup.
In contrast, the execution context created for a call to bar has to stick around after bar returns, because there's a closure (the anonymous function we passed into setCallback) relying on it. So when compiling bar (because V8 compiles to machine code on-the-fly), V8 may well use a different strategy, actually allocating the context object in the heap.
(If either of the above had used eval in any way, by the way, it's likely V8 and other engines don't even attempt any form of optimization, because eval introduces too many optimization failure modes. Yet another reason not to use eval if you don't have to, and you almost never have to.)
But these are implementation details. Conceptually, things are as described above the break.
The size of objects can grow dynamically. Therefore, you would need to adjust their memory requirements. That is why, they are stored in heap.
Both primitive values and objects are always stored in some other object - they are properties of some object.
There is not one primitive value / object that is not a property of another object. (The only exception here is the global object).

Categories

Resources