pre-parsing and execution context

pre-parsing and execution context - javascript

I am trying to better understand what exactly happens when, for example, the V8 engine parses code and defines an execution context for a function.
What most JS courses and books typically discuss is what happens during a "creation phase" and an "execution phase". The language used for JS engines is different: pre-parsing, parsing, AST, baseline and optimized compiling.
My understanding based on my reading so far, although I may be very wrong, is that the "creation phase" is the same as pre-parsing the source code, where the parser does the bare minimum in terms of variable and function declaration so that the code can run. For example, it would not attempt to build an AST for a function before it is invoked in the code. When a function is invoked from the global scope, it will then be pre-parsed (creation phase) and then fully parsed (execution phase). A new execution context with lexical environment, etc. will be created during pre-parsing. This process would then be repeated for each nested function.
Question 1: Are my assumptions correct? Is pre-parsing===creating phase.
Question 2: This is the bit that I am mostly unclear about. Does pre-parsing create a function object for a function that is not yet invoked. Is the function object shown in Chrome dev tools at the debugger below, actually created based on pre-parsing alone. Are the argument and caller properties then just updated during the subsequent execution phase.
let fun1 = function(name){
let newname="Tom";
console.log(name, newname);
};
debugger
fun1("John");
//fun1 function as it appears in the dev tools before it is invoked (and presumably before it is added to the execution stack)
Script
fun1: ƒ (name)
arguments: null
caller: null
length: 1
name: "fun1"
prototype: {constructor: ƒ}
proto: ƒ ()
//fun1 function after it is invoked and added to the execution stack.
Local
name: "John"
newname: undefined
this: Window
Script
fun1: ƒ (name)
arguments: Arguments ["John", callee: ƒ, Symbol(Symbol.iterator): ƒ]
caller: null
length: 1
name: "fun1"
prototype: {constructor: ƒ}
proto: ƒ ()

What most JS courses and books typically discuss is ...
I don't know "most JS courses and books", and that reference is not specific enough to look it up. If you want a specific comparison to a specific usage of terms somewhere, please specify the source you are referring to.
When a function is invoked from the global scope, it will then be pre-parsed (creation phase) and then fully parsed (execution phase).
Pre-parsing is only useful when it happens (long) before the function is invoked or fully parsed -- that's why it's called pre- parsing.
Parsing is not part of execution. Parsing happens as the first step of compiling (or you could say: "right before compiling"; same thing), which must happen before execution can begin. (At least that's the case in modern high-performance engines; you could build an engine that doesn't compile anything but directly interprets source code, which would mean that effectively parsing and executing would be the same thing, but that would be much slower for everything except trivial scripts.)
Question 1: Are my assumptions correct? Is pre-parsing===creating phase.
Generally speaking, there is no close correspondence between high-level conceptual phases of JS execution, and engine-internal implementation details.
Pre-parsing is (part of) an entirely optional performance-enhancing implementation strategy. You can turn it off if you want -- things will be slower, but observable behavior will be the same. The basic idea is that the engine will generate code "lazily", i.e. when a function is invoked the first time (because generating all code "eagerly" up front would make for annoyingly slow startup). In order to be able to do that, it must solve the problem "here's a couple hundred kilobytes of JS code, compile function fun1 please", so it must know where "fun1" is. That's the information that the "pre-parsing" step generates: it basically creates an index mapping function names to the place in the source text where the respective function is defined.
(As another entirely optional performance-enhancing implementation strategy, modern engines make sophisticated choices about internal variable representation/allocation, which in order to produce the correct behavior require pre-parsing to generate additional data about variables referenced from other functions; so if you look at the implementation, you'll see that it is quite a bit more complicated than just identifying function <name> { ... } ranges. You're correct that pre-parsing doesn't build an AST, but it does build scope chains for this reason.)
Question 2: Does pre-parsing create a function object for a function that is not yet invoked.
Pre-parsing, full parsing, and compiling all only affect the code for a function. They are unrelated to the function object itself. When you have a function object, you generally can't tell whether it has been pre-parsed or not, whether its code has been compiled or not, or whether its code has been optimized (=compiled again, later, taking type feedback into account) or not. The function object is created by executing the respective outer function (which may be the code in the global scope). It has internal, hidden references to metadata (code and otherwise) required for its execution.
To emphasize this again: all the details about when-does-the-engine-parse/compile-what are internal implementation details that are different between engines and change over time (as the respective teams improve their engines), so they generally must not affect how your program is executed.

Related

Read codebase of javascript function

I just realize that I can use the function console.log to read a codebase of any javascript function. Anyone how it happens? and what behind the sense of this action (read codebase)
Example:
Open chrome console and paste this code
function superMan(){
var i=i+1;
var j=i+1;
}
then you use the function console.log like console.log(superMan) and you can see the codebase of SuperMan

In javascript, When a program is run a execution context is created. There are 2 phases in which its created namely below.
Creation phase
Execution phase
What you are referring to in the question is mostly related to the creation phase. In this phase all the variables are declared and initialized to undefined ( incase of var) and for variables with let and const they are declared in the Temporal Dead Zone and are unreachable and only reachable when the code is executed in execution context.
For functions(specifically Function statements and not function expressions) these are taken in as it is in memory and stored. So, when you try to log in the function in console.log it shows the variable in memory. This will be also true for function expression once the execution runs over it(since before its treated like a normal variable and not function). They will be showed as it will now be pointing to the function directly assigned to that variable. Also, You need to understand function in Javascript are first-class functions meaning it can be passed and treated like a normal variable (containing a number) and so.

It's part of the spec but a good reason to use it is in edge-cases where the code of the function could be important to its execution (usually for an API, interface or some form of abstraction) where you need to know a bit more about the function before acting upon it. There are many use-cases, but it can be particularly useful to try to detect and handle edge-cases for a transpiler (the code you write isn't verbatim to the code that the browser sees)

what is the lifetime of javascript anonymous function?

If I write this in global scope:
(function(){})();
is the anonymous function created when the statement is executed and destroyed immediately after the statement is executed?
if I write this in a function:
function foo()
{
var a=1;
(function(){})();
a++;
}
Does the anonymous function exist until foo returns, or just exist during the execution of that statement?

In this particular case, most engines will optimize that function entirely away, because it does not do anything.
But let's assume that function contains code and is indeed executed. In this case, the function will exist all the time, either as compiled code, as bytecode or as AST for the interpreter.
The part that won't exist all the time is the scope and the possible created closure. The scope created for that function and the closure will only exist as long as the function is executed or a reference to the function with a specific bound scope/closure exists.
So the combination function reference + scope will be allocated at the time the statement (function(){})(); is executed, and can be released after that statement. But the compiled version of function(){} might still exist in memory for later use.
For engines that do just in time compiling and optimization, a function might even exist in different compiled versions.
The JIT+optimizer part of modern js engines is a complex topic, a rough description of v8 can be found here html5rocks: JavaScript Compilation:
In V8, the Full compiler runs on all code, and starts executing code as soon as possible, quickly generating good but not great code. This compiler assumes almost nothing about types at compilation time - it expects that types of variables can and will change at runtime.
In parallel with the full compiler, V8 re-compiles "hot" functions (that is, functions that are run many times) with an optimizing compiler. [...] In the optimizing compiler, operations get speculatively inlined (directly placed where they are called). This speeds execution (at the cost of memory footprint), but also enables other optimizations.
Therefore it may be that the generated code has hardly any similarities to the original one.
So a immediately-invoked function expression might even be completely optimized away using inlining.

If I write this in global scope:
(function(){})();
is the anonymous function created when the statement is executed and destroyed immediately after the statement is executed?
As t.niese said, odds are the engine will optimize that function away entirely. So let's assume it has some code in it:
// At global scope
(function(){ console.log("Hi there"); })();
The engine can't guarantee that that code won't throw an error (for instance, if you replaced console with something else), so I'm fairly sure it can't just inline that.
Now the answer is: It depends.
From a language/specification level, all code in a compilation unit (roughly: script) is parsed when the compilation unit is first loaded. Then, that function is created when the code reaches it in step-by-step execution, executed after being created (which involves creating an execution context for the call), and immediately eligible for garbage collection when done (along with the execution context) because nothing has a reference to it. But that's just theory/high-level specification.
From a JavaScript engine perspective:
The function is parsed before any code runs. The result of that parsing (bytecode or machine code) will be associated with that function expression. This doesn't wait for execution to reach the function, it's done early (in the background on V8 [Google's engine in Chrome and Node.js]).
Once the function has been executed and nothing else can refer to it:
The function object and the execution context associated with calling it are both eligible for GC. When and how that happens depends on the JavaScript engine.
Which leaves the function's underlying code, either bytecode (modern versions of V8 using Ignition, possibly others) or compiled machine code (somehow the function got used so much it got compiled for TurboFan, or older versions of V8 using Full-codegen, others). Whether the JavaScript engine can then throw away that bytecode or machine code will depend on the engine. I doubt engines throw away byte/machine code they've generated for a function if they may need it again (e.g., for a new anonymous function created by a new call to foo). If foo became unreachable, maybe foo's byte/machine code and the anonymous function's could be tossed as unnecessary. I have no idea whether engines do that. On the one hand, it seems like low-hanging fruit; on the other, it seems like something that would be so uncommon it's not worth bothering with. (Remember, here we're not talking about code attached to a function instance, but the code that has been produced from the source that gets attached to an instance when the instance is created, and may get attached to multiple instances over time.)
Here are a couple of articles on the V8 blog that make interesting reading:
26 March 2018: Background compilation
15 May 2017: Launching Ignition and TurboFan
23 August 2016: Firing up the Ignition interpreter (some information outdated and updated by the May 2017 article)
Ignition (some of the Ignition information appears to be slightly out of date relative to the May 2017 article above)
if I write this in a function:
function foo()
{
var a=1;
(function(){})();
a++;
}
Does the anonymous function exist until foo returns, or just exist during the execution of that statement?
Let's assume again there's a console.log in that function, and that I'm correct (it is an assumption on my part) that the fact it relies on a writable global (console) means it can't just be inlined.
The high-level/specification answer is the same: The function is parsed when the script is loaded, created when it's reached, executed, and eligible for GC when it's done executing. But again, that's just high-level concept.
Things are probably different at the engine level:
The code will be parsed before any code in the script runs.
Bytecode or machine code is likely generated before any code in the script runs, although I seem to recall something from the V8 blog about parsing but not immediately compiling the contents of top-level functions. I can't find that article, though, if it wasn't just in my head.
When execution reaches the function, a function object for it is created along with an execution context (unless the engine is sure it can optimize that away without it being observable in code).
Once execution ends, the function object and that execution context are eligible for GC. (They may well have been on the stack, making GC trivial when foo returns.)
The underlying code, though, sticks around in memory in order to be used again (and if used often enough, optimized).

Does a browser truly read JavaScript line by line OR does it make multiple passes?

I understand that JavaScript is interpreted and not compiled. No problem there. However, I keep reading here that JavaScript is executed "on the fly" and that lines are read one at a time. This idea is confusing me quite a bit when it comes to the following example:
writeToConsole();
function writeToConsole() {
console.log("This line was reached.");
}
For the record, this bit of code will write to the console just fine. Still, how would the browser know of the existence of exampleFunction() if it had not yet reached the function?
In other words, when exactly is this function first interpreted?

First, you make an incorrect assumption: modern JavaScript is compiled. Engines like V8, SpiderMonkey, and Nitro compile JS source into the native machine code of the host platform.
Even in older engines, JavaScript isn't interpreted. They transform source code into bytecode, which the engine's virtual machine executes.
This is actually how things in Java and .NET languages work: When you "compile" your application, you're actually transforming source code into the platform's bytecode, Java bytecode and CIL respectively. Then at runtime, a JIT compiler compiles the bytecode into machine code.
Only very old and simplistic JS engines actually interpret the JavaScript source code, because interpretation is very slow.
So how does JS compilation work? In the first phase, the source text is transformed into an abstract syntax tree (AST), a data structure that represents your code in a format that machines can deal with. Conceptually, this is much like how HTML text is transformed into its DOM representation, which is what your code actually works with.
In order to generate an AST, the engine must deal with an input of raw bytes. This is typically done by a lexical analyzer. The lexer does not really read the file "line-by-line"; rather it reads byte-by-byte, using the rules of the language's syntax to convert the source text into tokens. The lexer then passes the stream of tokens to a parser, which is what actually builds the AST. The parser verifies that the tokens form a valid sequence.
You should now be able to see plainly why a syntax error prevents your code from working at all. If unexpected characters appear in your source text, the engine cannot generate a complete AST, and it cannot move on to the next phase.
Once an engine has an AST:
An interpreter might simply begin executing the instructions directly from the AST. This is very slow.
A JS VM implementation uses the AST to generate bytecode, then begins executing the bytecode.
A compiler uses the AST to generate machine code, which the CPU executes.
So you should now be able to see that at minimum, JS execution happens in two phases.
However, the phases of execution really have no impact on why your example works. It works because of the rules that define how JavaScript programs are to be evaluated and executed. The rules could just as easily be written in a way such that your example would not work, with no impact on how the engine itself actually interprets/compiles source code.
Specifically, JavaScript has a feature commonly known as hoisting. In order to understand hoisting, you must understand the difference between a function declaration and a function expression.
Simply, a function declaration is when you declare a new function that will be called elsewhere:
function foo() {
}
A function expression is when you use the function keyword in any place that expects an expression, such as variable assignment or in an argument:
var foo = function() { };
$.get('/something', function() { /* callback */ });
JavaScript mandates that function declarations (the first type) be assigned to variable names at the beginning of an execution context, regardless of where the declaration appears in source text (of the context). An execution context is roughly equatable to scope – in plain terms, the code inside a function, or the very top of your script if not inside a function.
This can lead to very curious behavior:
var foo = function() { console.log('bar'); };
function foo() { console.log('baz'); }
foo();
What would you expect to be logged to the console? If you simply read the code linearly, you might think baz. However, it will actually log bar, because the declaration of foo is hoisted above the expression that assigns to foo.
So to conclude:
JS source code is never "read" line-by-line.
JS source code is actually compiled (in the true sense of the word) in modern browsers.
Engines compile code in multiple passes.
The behavior is your example is a byproduct of the rules of the JavaScript language, not how it is compiled or interpreted.

All functions will be examined first by the browser before any code is executed.
However,
var foo = function(){};
This will not be examined, and thus the following will throw a TypeError: undefined is not a function
foo();
var foo = function(){};

It does take 2 passes. The first pass parses the syntax tree, a part of which is performing hoisting. That hoisting is what makes your posted code work. Hoisting moves any var or named function declarations function fn(){} (but not function expressions fn = function(){}) to the top of the function they appear in.
The second pass executes the parsed, hoisted, and in some engines compiled, source code tree.
Check out this example. It shows how a syntax error can prevent all execution of your script by throwing a wrench in the first pass, which prevents the second pass (actual code execution) from ever happening.
var validCode = function() {
alert('valid code ran!');
};
validCode();
// on purpose syntax error after valid code that could run
syntax(Error(
http://jsfiddle.net/Z86rj/
No alert() occurs here. The first pass parsing fails, and no code executes.

The script is first parsed, then interpreted, and then executed. When the first statement (writeToConsole();) is executed, the function declaration has already been interpreted.
Since all variable and function declarations are hoisted in the current scope (in your case, the global script scope), you will be able to call a function that was declared below.

JavaScript is actually interpreted line by line. BUT, before it's executed, there is a first pass made by the compiler, reading certain things in (pretty geeky, have a look at this: https://www.youtube.com/watch?v=UJPdhx5zTaw if you're really interested).
The point is, JavaScript will be first "read" by the compiler, who stores functions defined as function foo(){...} already. You can call those ones at any given time in the script, given you are calling them from the same or a subordinate scope. What modern compilers also do is preallocate objects, so as a side effect it does make sense to strongly type your variables for the matter of performance.
var foo = function(){...} will not be stored by the compiler, due to the fact that JavaScript is loosely typed and the type of a variable could change during execution time.

The javascript engine create a execution context before executing your code. The execution context is created mainly in two phases:
Creation phase
Excecution phase
Creation Phase
In the creation phase we have the Global object, this and outer environment reference.In the creation phase as the parser runs through the code and begins to setup what we have written for translation, it recognizes where we have created variables and where we have created function. Basically it setup memory space for variable and functions Hoisting.
Execution Phase
In this the code is run line by line(interpreting, converting, compiling and executing it on the computer).
consider the following example, in this we are able to call the function b even before its declaration. This is because the javascript engine already knows the existence of the function b. And this is also referred to as hoisting in javascript.
b();
function b() {
console.log("I have been hoisted");
}

Why do you need to pass in arguments to a self executing function in javascript if the variable is global?

I was looking at the code for the underscore.js library (jQuery does the same thing) and just wanted some clarification on why the window object is getting passed into the self executing function.
For example:
(function() { //Line 6
var root = this; //Line 12
//Bunch of code
}).call(this); //Very Bottom
Since this is global, why is it being passed into the function? Wouldn't the following work as well? What issues would arrise doing it this way?
(function() {
var root = this;
//Bunch of code
}).call();

I suspect the reason is ECMAScript 5 strict mode.
In non-strict mode, this IIFE
(function() {
console.log(this); // window or global
})();
logs the window object (or the global object, if you're on a node.js server), because the global object is supplied as this to functions that don't run as a method of an object.
Compare that result to
"use strict";
(function() {
console.log(this); // undefined
})();
In strict mode, the this of a bare-invoked function is undefined. Therefore, the Underscore authors use call to supply an explicit this to the anonymous function so that it is standardized across strict and non-strict mode. Simply invoking the function (as I did in my examples) or using .call() leads to an inconsistency that can be solved with .call(this).

jQuery does the same thing, and you can find several answers to your question on SO by searching along those lines.
There are two reasons to have an always-available global variable like window in local scope, both of which are for the sake of fairly minor optimizations:
Loading time. Looking up a local variable takes a tiny bit less time than looking up a global variable, because the interpreter doesn't have to look as far up the context tree. Over the length of a function the size of Underscore or jQuery, that can theoretically add up to a less-trivial amount of time on slower machines.
Filesize. Javascript minification relies on the fact that variables can be named anything as long as they're consistent. myLongVariableName becomes a in the minified version. The catch, of course, is that this can only apply to variables defined inside the script; anything it's pulling from outside has to keep the same name to work, because if you shrink window down to pr the interpreter won't know what the hell you're talking about. By shadowing it with a reference to itself, Minify can do things like:
(function(A){ B(A, {...}); A.doStuff(); //etc });})(window)
and every time you would otherwise have put window (6 characters) you now have A (1 character). Again, not major, but in a large file that needs to explicitly call window on a regular basis for scope management it can be important, and you've already got an advantage by a few characters if you use it twice. And in large, popular libraries that get served by limited servers and to people who might have lousy Internet plans, every byte counts.
Edit after reading comments on question:
If you look at what function.call() actually does, your second snippet wouldn't actually work - the this being passed in isn't an argument, it's an explicit calling context for the function, and you have to provide it to .call(). The line at the beginning of the file serves the two purposes above. The reason for setting the context explicitly with call is futureproofing - right now, anonymous functions get the global context, but theoretically that could change, be discouraged in a future ECMA specification, or behave oddly on a niche browser. For something that half the Internet uses, it's best to be more explicit and avoid the issue entirely. The this/window choice at the top level I think must have something to do with worrying about non-PC devices (i.e. mobile) potentially using a different name for their top-level object.

objects in javascript

Primitive values are stored in a stack in javascript but objects are stored in a heap. I understand why to store primitives in stack but any reason why objects are stored in heaps?

Actually, in JavaScript even primitives are stored in the heap rather than on a stack (see note below the break below, though). When control enters a function, an execution context (an object) for that call to the function is created, which has a variable object. All vars and arguments to the function (plus a couple of other things) are properties of that anonymous variable object, exactly like other properties of named objects. A call stack is used, but the spec doesn't require the stack be used for "local" variable storage, and JavaScript's closures would make using a stack a'la C, C++, etc. for that impractical. Details in the spec.
Instead, a chain (linked list) is used. When you refer to an unqualified symbol, the interpreter checks the variable object for the current execution context to see if it has a property for that name. If so, it gets used; if not, the next variable object in the scope chain is checked (note that this is in the lexical order, not the call order like a call stack), and so on until the global execution context is reached (the global execution context has a variable object just like any other execution context does). The variable object for the global EC is the only one we can directly access in code: this points to it in global scope code (and in any function called without this being explicitly set). (On browsers, we have another way of accessing it directly: The global variable object has a property called window that it uses to point to itself.)
Re your question of why objects are stored in the heap: Because they can be created and released independently of one another. C, C++, and others that use a stack for local variables can do so because variables can (and should) be destroyed when the function returns. A stack is a nice efficient way to do that. But objects aren't created an destroyed in that straightforward a way; three objects created at the same time can have radically different lifecycles, so a stack doesn't make sense for them. And since JavaScript's locals are stored on objects and those objects have a lifecycle that's (potentially) unrelated to the function returning...well, you get the idea. :-) In JavaScript, the stack is pretty much just for return addresses.
However, it's worth noting that just because things are as described above conceptually, that doesn't mean that an engine has to do it that way under the hood. As long as it works externally as described in the spec, implementations (engines) are free to do what they like. I understand that V8 (Google's JavaScript engine, used in Chrome and elsewhere) does some very clever things, like for instance using the stack for local variables (and even local object allocations within the function) and then only copying those out into the heap if necessary (e.g., because the execution context or individual objects on it survive the call). You can see how in the majority of cases, this would minimize heap fragmentation and reclaim memory used for temporaries more aggressively and efficiently than relying on GC, because the execution context associated with most function calls doesn't need to survive the call. Let's look at an example:
function foo() {
var n;
n = someFunctionCall();
return n * 2;
}
function bar() {
var n;
n = someFunction();
setCallback(function() {
if (n === 2) {
doThis();
}
else {
doThat();
}
});
}
In the above, an engine like V8 that aggressively optimizes can detect that the conceptual execution context for a call to foo never needs to survive when foo returns. So V8 would be free to allocate that context on the stack, and use a stack-based mechanism for cleanup.
In contrast, the execution context created for a call to bar has to stick around after bar returns, because there's a closure (the anonymous function we passed into setCallback) relying on it. So when compiling bar (because V8 compiles to machine code on-the-fly), V8 may well use a different strategy, actually allocating the context object in the heap.
(If either of the above had used eval in any way, by the way, it's likely V8 and other engines don't even attempt any form of optimization, because eval introduces too many optimization failure modes. Yet another reason not to use eval if you don't have to, and you almost never have to.)
But these are implementation details. Conceptually, things are as described above the break.

The size of objects can grow dynamically. Therefore, you would need to adjust their memory requirements. That is why, they are stored in heap.

Both primitive values and objects are always stored in some other object - they are properties of some object.
There is not one primitive value / object that is not a property of another object. (The only exception here is the global object).

Develop Reference

JavaScript is the programming language of the Web.