I am a JavaScript newbie. I am used to Java, thus I am trying to map two worlds in atrociously incorrect ways.
Questions:
Each Java instance runs on a JVM. What is the JVM equivalent for JavaScript?
In Java, objects occupy memory and that memory is stored on the heap. Where do JavaScript objects get stored? In other words whats the JVM's heap equivalent for JavaScript?
Each function call in Java adds a stack frame. Do JavaScript function calls do the same (i.e. add a stack)?
Java and ECMAScript are not the same language and, past generalities, have different execution models. While the JLS (Java Language Specification) is very technically precise about things like "references" and the "heap", the ECMAScript specification focuses merely on behavior.
A conforming implementation of ECMAScript must provide and support all the types, values, objects, properties, functions, and program syntax and semantics (behavior) described in this specification.
ECMAScript is executed by a "JavaScript Engine". Generally there is one "environment" per browser window; that is, each window object is the global context for at most one concurrent Program execution. This effectively means that each browser window "is a separate VM".
Each mutable object is mutable, and each object is reachable (and thus guaranteed to be alive) only as long as it is strongly referenced. The implementations certainly use "heap" memory and "references", as it is a practical/required design choice; but neither the usage of a "heap" or "references" (in this sense) is discussed in the specification.
The specification discusses the stack in terms of Execution Contexts which is composed of a logic chain.
When control is transferred to ECMAScript executable code, control is entering an execution context. Active execution contexts logically form a stack ..
Related
I have always understood it to be a cost-savings operation in JavaScript to avoid repeatedly referencing a nested property on an object. Instead of writing a.b.c.d over and over, you would favor let x = a.b.c.d; and then use x (what I've often heard colloquially called "caching the reference.")
It recently came up in conversation with a friend that such a thing would be completely unnecessary and foolish in C++.
Is that true? If so, why? I guess it has to do with the difference in the underlying language implementation between a C++ object and a JavaScript object, but what is the difference exactly?
JavaScript objects are closer to C++'s std::map (or std::unordered_map) than they are to C++ classes.
C++ has the advantage of having separate compilation and run steps. The compiler can really take as long as it likes to analyze large chunks of your program and heavily optimize them. When you're writing C++ you aren't really writing a program for the CPU to execute. You're describing the behavior of a program and the compiler will use that description to come up with a program for you. Your browser's JavaScript runtime (likely a JIT compiler) simply doesn't have the time to do the same level of analysis and optimization. It has to compile and run your program quickly enough that users don't perceive any delay. That's not to say a JavaScript runtime won't do any optimization, but it will tend to be more incremental and localized than what a C++ compiler that takes 20 minutes to compile a program can do.
All of the attributes of a C++ class are known at compile time. When you access an attribute of an object in C++, the compiler will resolve that access at compile time; likely to a handful of memory loads or a single function call instruction. Since it's all resolved at compile time, it doesn't matter how deeply an attribute lookup is nested. The runtime performance will be the same. Additionally, the compiler will do that sort of memorization for you, likely by keeping a repeatedly-accessed attribute in a register.
The same is not true of JavaScript. JavaScript objects don't have a defined set of properties. They can be added and removed throughout the lifetime of the object. That means that the JavaScript runtime has to track an object's properties using some sort of associative data structure (likely a hash table). When you request an attribute of an object, the runtime has to look through that data structure to find the value that you want, and it has to do that lookup for every level of nesting.
ES5 changed variable object(VO) to lexical environment. What's the motivation of such change since VO is already very obvious as perception?
I think variable objects are more analogous to environment records.
An Environment Record records the identifier bindings that are created
within the scope of its associated Lexical Environment.
In ES5 there are two different kinds of environment records:
Declarative environment records are used to define the effect of
ECMAScript language syntactic elements such as FunctionDeclarations,
VariableDeclarations, and Catch clauses that directly associate identifier bindings with ECMAScript language values. Object
environment records are used to define the effect of ECMAScript
elements such as Program and WithStatement that associate
identifier bindings with the properties of some object.
So the question would be why declarative environment records were introduced instead of only using object environment records just like ES3 variable objects. The difference is that declarative environment records can have immutable bindings:
In addition to the mutable bindings supported by all Environment
Records, declarative environment records also provide for immutable
bindings. An immutable binding is one where the association between an
identifier and a value may not be modified once it has been
established.
Immutable bindings don't have a direct equivalent in objects. A property can be defined as both non-configurable and non-writable, becoming immutable. However,
Creation and initialisation of immutable binding are distinct steps so
it is possible for such bindings to exist in either an initialised or
uninitialised state.
But you can't have an uninitialized property. If you define a non-configurable non-writable property with value undefined, then you won't be able to initialize it to the desired value.
I don't think it's possible to have uninitialized immutable bindings in ES5. CreateImmutableBinding is only used in Declaration Binding Instantiation and Function Definition, and in both cases it's immediately initialized with InitializeImmutableBinding.
But possibly this was done to allow uninitialized immutable bindings as extensions of the language, like the JavaScript 1.5 const. Or maybe they already had in mind ES6 const.
The same author whose ES3 article you linked wrote also about ES5 (and even linked that section there). I'll quote Mr. Soshnikov from his "Declarative environment record" section in ECMA-262-5 in detail. Chapter 3.2. Lexical environments: ECMAScript implementation:
In general case the bindings of declarative records are assumed to be stored directly at low level of the implementation (for example, in registers of a virtual machine, thus providing fast access). This is the main difference from the old activation object concept used in ES3.
That is, the specification doesn’t require (and even indirectly doesn’t recommend) to implement declarative records as simple objects which are inefficient in this case. The consequence from this fact is that declarative environment records are not assumed to be exposed directly to the user-level, which means we cannot access these bindings as e.g. properties of the record. Actually, we couldn’t also before, even in ES3 — there activation object also was inaccessible directly to a user (except though Rhino implementation which nevertheless exposed it via __parent__ property).
Potentially, declarative records allow to use complete lexical addressing technique, that is to get the direct access to needed variables without any scope chain lookup — regardless the depth of the nested scope (if the storage is fixed and unchangeable, all variable addresses can be known even at compile time). However, ES5 spec doesn’t mention this fact directly.
So once again, the main thing which we should understand why it was needed to replace old activation object concept with the declarative environment record is first of all the efficiency of the implementation.
Thus, as Brendan Eich also mentioned (the last paragraph) — the activation object implementation in ES3 was just “a bug”: “I will note that there are some real improvements in ES5, in particular to Chapter 10 which now uses declarative binding environments. ES1-3’s abuse of objects for scopes (again I’m to blame for doing so in JS in 1995, economizing on objects needed to implement the language in a big hurry) was a bug, not a feature”.
I don't think I could express this any better.
The ECMAScript specification defines an "unique global object that is created before control enters any execution context". This global object is a standard built-in object of ECMAScript, and therefore a native object.
The spec also states:
In addition to the properties defined in this specification the global
object may have additional host defined properties. This may include a
property whose value is the global object itself; for example, in the
HTML document object model the window property of the global object is
the global object itself.
So, in web-browsers, the window object is just a convenient name for the ECMAScript global object, and therefore, the window object is a native ECMAScript object.
Did I get this correctly?
This mostly comes down to a question of what it really means to be a "native object" or a "host object". The ECMAScript specification provides fairly abstract definitions of those terms and there is plenty of room for differing interpretations of the definitions. For example, in the definition of native object, what is the word "semantics" actually talking about. Is it just the primitive object semantics (in ES specified by the [[propName]] internal properties) or does it include application level semantics of the object. The DOM window object certainly has observable application level semantics that are not defined in the ES specification, so if those semantics are considered it can't be a "native object".
The answer is probably much simpler if you look at it as a question of implementation pragmatics. A ES engine implementer probably thinks of any object that is allocated in the ES heap and managed by the ES garbage collector to be a "native ES object". A "host object" would generally be thought of something that exists external to the ES heap and that is accessed using some sort of interoperability layer such as COM, XPCOM, or the V8 embedding API. Depending upon the implementation, the DOM window object might fall into either category. This distinction is probably more relevant to both engine implementers and host providers than any of specification level distinctions.
There will likely be further definitional clarifications in the next edition of the ES specification. There is even a proposal to eliminate the "native" and "host" object terminology: http://wiki.ecmascript.org/doku.php?id=strawman:terminology . However, it isn't clear whether such definitions really have very much practical impact.
I could (and probably will) argue that the specification does not require the global object to be a native object. The spec defines a native object as:
object in an ECMAScript implementation whose semantics are fully defined by this specification rather than by the host environment.
And the definition of a host object:
object supplied by the host environment to complete the execution environment of ECMAScript.
The host object definition could certainly be applied to window, it is an object supplied by the host environment to complete the execution environment of ECMAScript. In addition, Its semantics are not fully defined by the ECMAScript specification.
There's also the case that the ECMAScript engines that run in browsers, such as V8, TraceMonkey, etc, do not create the window object. Rather, it is provided by the DOM (constructed and inheriting from DOMWindow or Window, for example).
Yes, your reasoning sounds about right. As a semi-proof, when a function is executed without explicit this ("in the global context"), its this will evaluate to window inside the function body. But this is really JSVM-specific. For example, have a look at this v8-users message (and the related discussion.) Things are a bit more complicated behind the scenes, but look approximately as you describe to the user.
In case you don't know what I'm talking about, please read John Resig - How JavaScript Timers Work and Is JavaScript guaranteed to be single-threaded?
There are several triggers that enqueue tasks in the JS engines' execution FiFo. This is not part of any standard, so I'm trying to find an exhausive list of those triggers. (I guess it all boils down to internal event handlers, like script load events or timer events, but I'd rather ignore the engine's internals and look at things from the user's point of view.)
So far I have identified
<script> elements in the initial document (including the ones added by document.write)*
<script> elements inserted by JS at runtime*
Event handlers
-- these include a wide variety of cases, such as user interaction, Error events, Web Worker messages or Ajax callbacks ...
window.setTimeout
window.setInterval
*) only in Browser/DOM environments
Any more? Any differences between JS engines?
"JavaScript" as a language name should not really be used it is too broad.
ECMAScript is what you are referring to. You can find information on ECMAScript at http://www.ecmascript.org/
The language standard is called ECMA-262 with the 5.1 Edition supported by most browsers.
setTimeout, setInterval, DOM events, and more are not part of the language. These are provided by the host environment as host objects. Writing ECMAScript for a wide range of host environment should take special care when using host objects.
ECMAScript code is executed within an execution context. This takes the form of a stack and will hold the state of the current execution context at the top.
There are 3 ways to push an execution context. Global code, eval, and function. This is the only way to start code. Host environments will use these methods to execute code.
The host environment can supply a call stack. This is used to stack function calls generated by host objects which may run in independent threads. Typically an event such as setTimeout will add a function to the call stack. The host environment will then wait until the execution context stack is empty then pop the function from the call stack, create a new execution context, execute the code until completed. It will repeat this until the call stack is empty.
Attempting to build a comprehensive list of host object execution context managers is futile.
To answer the questions.
Any more?
Yes there are many more. This is beyond the scope of this answer. Refer to the particular host environment you wish to use.
Any differences between JS engines? (ECMAScript Host environments).
Yes. Again this beyond the scope of this answer and dependent on the host
There are dozens of host environments, with new ones created all the time. What triggers the creation of a new execution context is highly dependent on the host environment.
Primitive values are stored in a stack in javascript but objects are stored in a heap. I understand why to store primitives in stack but any reason why objects are stored in heaps?
Actually, in JavaScript even primitives are stored in the heap rather than on a stack (see note below the break below, though). When control enters a function, an execution context (an object) for that call to the function is created, which has a variable object. All vars and arguments to the function (plus a couple of other things) are properties of that anonymous variable object, exactly like other properties of named objects. A call stack is used, but the spec doesn't require the stack be used for "local" variable storage, and JavaScript's closures would make using a stack a'la C, C++, etc. for that impractical. Details in the spec.
Instead, a chain (linked list) is used. When you refer to an unqualified symbol, the interpreter checks the variable object for the current execution context to see if it has a property for that name. If so, it gets used; if not, the next variable object in the scope chain is checked (note that this is in the lexical order, not the call order like a call stack), and so on until the global execution context is reached (the global execution context has a variable object just like any other execution context does). The variable object for the global EC is the only one we can directly access in code: this points to it in global scope code (and in any function called without this being explicitly set). (On browsers, we have another way of accessing it directly: The global variable object has a property called window that it uses to point to itself.)
Re your question of why objects are stored in the heap: Because they can be created and released independently of one another. C, C++, and others that use a stack for local variables can do so because variables can (and should) be destroyed when the function returns. A stack is a nice efficient way to do that. But objects aren't created an destroyed in that straightforward a way; three objects created at the same time can have radically different lifecycles, so a stack doesn't make sense for them. And since JavaScript's locals are stored on objects and those objects have a lifecycle that's (potentially) unrelated to the function returning...well, you get the idea. :-) In JavaScript, the stack is pretty much just for return addresses.
However, it's worth noting that just because things are as described above conceptually, that doesn't mean that an engine has to do it that way under the hood. As long as it works externally as described in the spec, implementations (engines) are free to do what they like. I understand that V8 (Google's JavaScript engine, used in Chrome and elsewhere) does some very clever things, like for instance using the stack for local variables (and even local object allocations within the function) and then only copying those out into the heap if necessary (e.g., because the execution context or individual objects on it survive the call). You can see how in the majority of cases, this would minimize heap fragmentation and reclaim memory used for temporaries more aggressively and efficiently than relying on GC, because the execution context associated with most function calls doesn't need to survive the call. Let's look at an example:
function foo() {
var n;
n = someFunctionCall();
return n * 2;
}
function bar() {
var n;
n = someFunction();
setCallback(function() {
if (n === 2) {
doThis();
}
else {
doThat();
}
});
}
In the above, an engine like V8 that aggressively optimizes can detect that the conceptual execution context for a call to foo never needs to survive when foo returns. So V8 would be free to allocate that context on the stack, and use a stack-based mechanism for cleanup.
In contrast, the execution context created for a call to bar has to stick around after bar returns, because there's a closure (the anonymous function we passed into setCallback) relying on it. So when compiling bar (because V8 compiles to machine code on-the-fly), V8 may well use a different strategy, actually allocating the context object in the heap.
(If either of the above had used eval in any way, by the way, it's likely V8 and other engines don't even attempt any form of optimization, because eval introduces too many optimization failure modes. Yet another reason not to use eval if you don't have to, and you almost never have to.)
But these are implementation details. Conceptually, things are as described above the break.
The size of objects can grow dynamically. Therefore, you would need to adjust their memory requirements. That is why, they are stored in heap.
Both primitive values and objects are always stored in some other object - they are properties of some object.
There is not one primitive value / object that is not a property of another object. (The only exception here is the global object).