New to closures and the inner workings of JS. I have a somewhat stable grasp of execution contexts and the associated objects within it. And while I know how to identify a closure and what it may yield, I don't quite see where the parent variables reside, once the parent function is popped from the stack.
I'd assume they become properties of the closure's variable object? But you know where that leads.
I don't quite see where the parent variables reside
They reside in scope. How that is expressed in the computer's memory is an implementation detail of the specific JavaScript runtime (which is probably written in C or C++).
Related
I understand the semantics that a closure holds a reference to a variable lengthen it's life cycle, makes primitive variables not limited by calling stack, and thus those variables captured by closures should be specially treated.
I also understand variables in same scope could be differently treated depends on whether it was captured by closures in now-days javascript engine. for example,
function foo(){
var a=2;
var b=new Array(a_very_big_number).join('+');
return function(){
console.log(a);
};
}
var b=foo();
as no one hold a reference to b in foo, there's no need to keep b in memory, thus memory used could be released as soon as foo returns(or even never created under furthur optimization).
My question is, why v8 seems to pack all variables referenced by all closures together in each calling context? for example,
function foo(){
var a=0,b=1,c=2;
var zig=function(){
console.log(a);
};
var zag=function(){
console.log(b);
};
return [zig,zag];
}
both zig and zag seems to hold a reference to a and b, even it's apparent that b is not available to zig. This could be awful when b is very big, and zig persists very long.
But stands on the point of view of the implementation, I can not understand why this is a must. Based on my knowledge, without calling eval, the scope chain can be determined before excution, thus the reference relationship can be determined. The engine should aware that when zig is no longer available, nether do a so the engine mark it as garbage.
Both chrome and firefox seems to obey the rule. Does standard say that any implementation must do this? Or this implementation is more practical, more efficient? I'm quite puzzling.
The main obstacle is mutability. If two closures share the same var then they must do so in a way that mutating it from one closure is visible in the other. Hence it is not possible to copy the values of referenced variables into each closure environment, like functional languages would do (where bindings are immutable). You need to share a pointer to a common mutable heap location.
Now, you could allocate each captured variable as a separate cell on the heap, instead of one array holding all. However, that would often be more expensive in space and time because you'd need multiple allocations and two levels of indirection (each closure points to its own closure environment, which points to each shared mutable variable cell). With the current implementation it's just one allocation per scope and one indirection to access a variable (all closures within a single scope point to the same mutable variable array). The downside is that certain life times are longer than you might expect. It's a trade-off.
Other considerations are implementation complexity and debuggability. With dubious features like eval and expectations that debuggers can inspect the scope chain, the scope-based implementation is more tractable.
The standard doesn't say anything about garbage collection, but gives some clues of what should happen.
Reference : Standard
An outer Lexical Environment may, of course, have its own outer
Lexical Environment. A Lexical Environment may serve as the outer
environment for multiple inner Lexical Environments. For example, if a
Function Declaration contains two nested Function Declarations then
the Lexical Environments of each of the nested functions will have as
their outer Lexical Environment the Lexical Environment of the current
execution of the surrounding function."
Section 13 Function definition
step 4: "Let closure be the result of creating a new Function object as specified in 13.2"
Section 13.2 "a Lexical Environment specified by Scope" (scope = closure)
Section 10.2 Lexical Environments:
"The outer reference of a (inner) Lexical Environment is a reference to the Lexical Environment that logically surrounds the inner Lexical Environment.
So, a function will have access to the environment of the parent.
I know how to use "closures"... but what does the word closure actually refer to?
On MDN the definition is that it is a function. Here is the page about closures. The first sentence is, "Closures are functions that refer to independent (free) variables." and in the first code example there is a comment highlighting the closure function. However, the second sentence seems to suggest that the closure is really the persistent scope that the inner function resides in. That's what this other stack overflow answer suggest, too (search word "persistent").
So, what is it? The function or the persistent scope?
Technically, a closure is the mechanism behind how persistent scope works in functions - in other words, it's the implementation. That's what it originally meant in terms of Lisp. And in some cases that's still what it means - just look at the explanations of what closures are in various Lisp dialects and they almost all try to explain it in terms of how the compiler/interpreter implements closures. Because that was how people used to explain scope.
The first time I came across a much simpler explanation of closures (that is, explaining the behavior instead of the mechanics) was in javascript. Now that more people are used to the idea of closures, the word itself has grown to mean:
the captured scope of an inner function allowing the function to refer to the variables of the outer function (this is the closest to the original meaning)
the inner function itself (typically in languages that call first-class functions or lambdas "closure")
the free variable captured by the closure
I personally prefer the last meaning because it captures the essence of what closures are: a form of variable sharing, kind of like globals. However, if you want to be pedantic, only the first meaning is actually the technical meaning: "closure" refers to the captured scope.
To answer your question as to whether the closure is the function or the environment, it's both - it's the combination of the function and the environment which it makes use of. It's especially important (and may really only exist) in the context of languages where functions can be defined (often anonymously) and passed around as parameters to other functions (witness, for example, the use of blocks in Smalltalk, anonymous functions in the Lisp/Scheme sphere, and etc). The issue is this: if global function A is created during the execution of function B and A references a parameter named P which was passed to B; and if A is subsequently stored in a variable V, and is then invoked through V after B has returned, A still needs to have access to P (which was a parameter to B, which has long since finished by the time A is executed) in order to do its work; thus, A must have a "closure over P" in order to do its job successfully. The "closure" is a combination of function A itself and the environment set up when A is compiled which stores a copy or reference or what-have-you of P so that A can do its job successfully.
Note that despite having used languages which make extensive use of closures (particularly Smalltalk and some Lisp variants) this is usually so well buried that it's beyond transparent - you don't even notice it. If you notice that there's a closure, IMO it's not well done.
YMMV.
Best of luck.
I found such an explanation why cycles with variable declared with var in node is faster than in chrome:
Within a web browser such as Chrome, declaring the variable i outside
of any function’s scope makes it global and therefore binds it as a
property of the window object. As a result, running this code in a web
browser requires repeatedly resolving the property i within the
heavily populated window namespace in each iteration of the for loop.
In Node.js, however, declaring any variable outside of any function’s
scope binds it only to the module’s own scope (not the window object)
which therefore makes it much easier and faster to resolve.
Curious about a let statement in Ecmascript6: does it make calculations faster using more block scope declared variables in loops or it is just a safety measure against name collisions?
The goal with let was to have better scoping mechanism in JavaScript (no more wrapping things in anonymous functions just for the sake of scoping). Any performance gains are just cherry on the top.
I watched this: http://www.youtube.com/watch?v=mHtdZgou0qU yesterday, and I've been thinking about how to improve my javascript. I'm trying to keep everything he said in mind when re-writing an animation that looked really choppy in firefox.
One of the things I'm wondering is if for loops add on to the scope chain. Zakas talked a lot about how closures add on to the scope chain, and accessing variables outside of the local scope tends to take longer. With a for loop, since you can declare a variable in the first statement, does this mean that it's adding another scope to the chain? I would assume not, because Zakas also said there is NO difference between do-while, while, and for loops, but it still seems like it would.
Part of the reason I ask is because I often see, in JS libraries, code like:
function foo(){
var x=window.bar,
i=0,
len=x.length;
for(;i<len;i++){
//
}
}
If for loops did add another object onto the chain, then this would be very inefficient code, because all the operations inside the loop (assuming they use i) would be accessing an out-of-scope variable.
Again, if I were asked to bet on it, I would say that they don't, but why then wouldn't the variables used be accessible outside of the loop?
JavaScript does not have block scope and it has variable hoisting, so any variables that appear to be defined in a for loop are not actually defined there.
The reason you see code like provided in your example is because of the hoisting behaviour. The code's author knows about variable hoisting so has declared all the variables for the scope at the start, so it's clear what JavaScript is doing.
Primitive values are stored in a stack in javascript but objects are stored in a heap. I understand why to store primitives in stack but any reason why objects are stored in heaps?
Actually, in JavaScript even primitives are stored in the heap rather than on a stack (see note below the break below, though). When control enters a function, an execution context (an object) for that call to the function is created, which has a variable object. All vars and arguments to the function (plus a couple of other things) are properties of that anonymous variable object, exactly like other properties of named objects. A call stack is used, but the spec doesn't require the stack be used for "local" variable storage, and JavaScript's closures would make using a stack a'la C, C++, etc. for that impractical. Details in the spec.
Instead, a chain (linked list) is used. When you refer to an unqualified symbol, the interpreter checks the variable object for the current execution context to see if it has a property for that name. If so, it gets used; if not, the next variable object in the scope chain is checked (note that this is in the lexical order, not the call order like a call stack), and so on until the global execution context is reached (the global execution context has a variable object just like any other execution context does). The variable object for the global EC is the only one we can directly access in code: this points to it in global scope code (and in any function called without this being explicitly set). (On browsers, we have another way of accessing it directly: The global variable object has a property called window that it uses to point to itself.)
Re your question of why objects are stored in the heap: Because they can be created and released independently of one another. C, C++, and others that use a stack for local variables can do so because variables can (and should) be destroyed when the function returns. A stack is a nice efficient way to do that. But objects aren't created an destroyed in that straightforward a way; three objects created at the same time can have radically different lifecycles, so a stack doesn't make sense for them. And since JavaScript's locals are stored on objects and those objects have a lifecycle that's (potentially) unrelated to the function returning...well, you get the idea. :-) In JavaScript, the stack is pretty much just for return addresses.
However, it's worth noting that just because things are as described above conceptually, that doesn't mean that an engine has to do it that way under the hood. As long as it works externally as described in the spec, implementations (engines) are free to do what they like. I understand that V8 (Google's JavaScript engine, used in Chrome and elsewhere) does some very clever things, like for instance using the stack for local variables (and even local object allocations within the function) and then only copying those out into the heap if necessary (e.g., because the execution context or individual objects on it survive the call). You can see how in the majority of cases, this would minimize heap fragmentation and reclaim memory used for temporaries more aggressively and efficiently than relying on GC, because the execution context associated with most function calls doesn't need to survive the call. Let's look at an example:
function foo() {
var n;
n = someFunctionCall();
return n * 2;
}
function bar() {
var n;
n = someFunction();
setCallback(function() {
if (n === 2) {
doThis();
}
else {
doThat();
}
});
}
In the above, an engine like V8 that aggressively optimizes can detect that the conceptual execution context for a call to foo never needs to survive when foo returns. So V8 would be free to allocate that context on the stack, and use a stack-based mechanism for cleanup.
In contrast, the execution context created for a call to bar has to stick around after bar returns, because there's a closure (the anonymous function we passed into setCallback) relying on it. So when compiling bar (because V8 compiles to machine code on-the-fly), V8 may well use a different strategy, actually allocating the context object in the heap.
(If either of the above had used eval in any way, by the way, it's likely V8 and other engines don't even attempt any form of optimization, because eval introduces too many optimization failure modes. Yet another reason not to use eval if you don't have to, and you almost never have to.)
But these are implementation details. Conceptually, things are as described above the break.
The size of objects can grow dynamically. Therefore, you would need to adjust their memory requirements. That is why, they are stored in heap.
Both primitive values and objects are always stored in some other object - they are properties of some object.
There is not one primitive value / object that is not a property of another object. (The only exception here is the global object).