How does Chrome V8 compile and run Javascript?

How does Chrome V8 compile and run Javascript? - javascript

It is said that Chrome compiles Javascript 'on the fly' (Just In Time). I don't understand what the JIT part actually means here? As far as I understand, the browser will take all of the JS code, compile it, and then execute it. It can't really do compilation in steps, as that would be more like interpreting (Does V8 interpret the code too at any point?).
Also, I want to understand why is Javascript called non-blocking? In reality, isn't the run-time environment (V8) the one which actually makes JS 'non-blocking'? Javascript is single-threaded, and as per my understanding, the thread dies as soon as all synchronous code is finished executing. It is the event loop that actually keeps Javascript 'alive' by 'bringing back the dead thread into life'. And the event loop is actually not part of the Javascript specification.
(Or is it that the global execution context is always present in the execution stack queue and whenever a new event handler is to be executed, a new execution stack is created and popped on top of the queue, hence the thread never truly dies)?

Does V8 interpret the code too at any point?).
Yes it does. Turning an AST into bytecode takes time, especially for such a dynamic language as JS. If a function only runs once or twice, it makes no sense to spend a lot of time to generate optimal bytecode, wereas interpreting the AST would be faster. Thats why most engines actually interpret the code, and then start building compiled versions for hot functions.
Also, I want to understand why is Javascript called non-blocking? In reality, isn't the run-time environment (V8) the one which actually makes JS 'non-blocking'?
Yes exactly. JavaScripts execution model is observably synchronous. The asynchrony comes from events coming in from the outside, e.g. from the engine
Javascript is single-threaded, and as per my understanding, the thread dies as soon as all synchronous code is finished executing. It is the event loop that actually keeps Javascript 'alive' by 'bringing back the dead thread into life'.
Yes, exactly.
And the event loop is actually not part of the Javascript specification.
It is. It is just not named event loop. The spec defines that agents work on task queues. Whats inside the task queue depends on the actual usecase of the engine, for browsers some of these queues are defined by the Web Spec.
I highly recommend reading some of the posts at https://v8.dev/blog/, especially this linked article.

Related

what is the lifetime of javascript anonymous function?

If I write this in global scope:
(function(){})();
is the anonymous function created when the statement is executed and destroyed immediately after the statement is executed?
if I write this in a function:
function foo()
{
var a=1;
(function(){})();
a++;
}
Does the anonymous function exist until foo returns, or just exist during the execution of that statement?

In this particular case, most engines will optimize that function entirely away, because it does not do anything.
But let's assume that function contains code and is indeed executed. In this case, the function will exist all the time, either as compiled code, as bytecode or as AST for the interpreter.
The part that won't exist all the time is the scope and the possible created closure. The scope created for that function and the closure will only exist as long as the function is executed or a reference to the function with a specific bound scope/closure exists.
So the combination function reference + scope will be allocated at the time the statement (function(){})(); is executed, and can be released after that statement. But the compiled version of function(){} might still exist in memory for later use.
For engines that do just in time compiling and optimization, a function might even exist in different compiled versions.
The JIT+optimizer part of modern js engines is a complex topic, a rough description of v8 can be found here html5rocks: JavaScript Compilation:
In V8, the Full compiler runs on all code, and starts executing code as soon as possible, quickly generating good but not great code. This compiler assumes almost nothing about types at compilation time - it expects that types of variables can and will change at runtime.
In parallel with the full compiler, V8 re-compiles "hot" functions (that is, functions that are run many times) with an optimizing compiler. [...] In the optimizing compiler, operations get speculatively inlined (directly placed where they are called). This speeds execution (at the cost of memory footprint), but also enables other optimizations.
Therefore it may be that the generated code has hardly any similarities to the original one.
So a immediately-invoked function expression might even be completely optimized away using inlining.

If I write this in global scope:
(function(){})();
is the anonymous function created when the statement is executed and destroyed immediately after the statement is executed?
As t.niese said, odds are the engine will optimize that function away entirely. So let's assume it has some code in it:
// At global scope
(function(){ console.log("Hi there"); })();
The engine can't guarantee that that code won't throw an error (for instance, if you replaced console with something else), so I'm fairly sure it can't just inline that.
Now the answer is: It depends.
From a language/specification level, all code in a compilation unit (roughly: script) is parsed when the compilation unit is first loaded. Then, that function is created when the code reaches it in step-by-step execution, executed after being created (which involves creating an execution context for the call), and immediately eligible for garbage collection when done (along with the execution context) because nothing has a reference to it. But that's just theory/high-level specification.
From a JavaScript engine perspective:
The function is parsed before any code runs. The result of that parsing (bytecode or machine code) will be associated with that function expression. This doesn't wait for execution to reach the function, it's done early (in the background on V8 [Google's engine in Chrome and Node.js]).
Once the function has been executed and nothing else can refer to it:
The function object and the execution context associated with calling it are both eligible for GC. When and how that happens depends on the JavaScript engine.
Which leaves the function's underlying code, either bytecode (modern versions of V8 using Ignition, possibly others) or compiled machine code (somehow the function got used so much it got compiled for TurboFan, or older versions of V8 using Full-codegen, others). Whether the JavaScript engine can then throw away that bytecode or machine code will depend on the engine. I doubt engines throw away byte/machine code they've generated for a function if they may need it again (e.g., for a new anonymous function created by a new call to foo). If foo became unreachable, maybe foo's byte/machine code and the anonymous function's could be tossed as unnecessary. I have no idea whether engines do that. On the one hand, it seems like low-hanging fruit; on the other, it seems like something that would be so uncommon it's not worth bothering with. (Remember, here we're not talking about code attached to a function instance, but the code that has been produced from the source that gets attached to an instance when the instance is created, and may get attached to multiple instances over time.)
Here are a couple of articles on the V8 blog that make interesting reading:
26 March 2018: Background compilation
15 May 2017: Launching Ignition and TurboFan
23 August 2016: Firing up the Ignition interpreter (some information outdated and updated by the May 2017 article)
Ignition (some of the Ignition information appears to be slightly out of date relative to the May 2017 article above)
if I write this in a function:
function foo()
{
var a=1;
(function(){})();
a++;
}
Does the anonymous function exist until foo returns, or just exist during the execution of that statement?
Let's assume again there's a console.log in that function, and that I'm correct (it is an assumption on my part) that the fact it relies on a writable global (console) means it can't just be inlined.
The high-level/specification answer is the same: The function is parsed when the script is loaded, created when it's reached, executed, and eligible for GC when it's done executing. But again, that's just high-level concept.
Things are probably different at the engine level:
The code will be parsed before any code in the script runs.
Bytecode or machine code is likely generated before any code in the script runs, although I seem to recall something from the V8 blog about parsing but not immediately compiling the contents of top-level functions. I can't find that article, though, if it wasn't just in my head.
When execution reaches the function, a function object for it is created along with an execution context (unless the engine is sure it can optimize that away without it being observable in code).
Once execution ends, the function object and that execution context are eligible for GC. (They may well have been on the stack, making GC trivial when foo returns.)
The underlying code, though, sticks around in memory in order to be used again (and if used often enough, optimized).

JavaScript compilation in V8

In the V8 home (the Google's JavaScript engine) we read this:
V8 compiles and executes JavaScript source code
Does it mean that JavaScript is not an interpreted language in V8?
Does V8 use a just-in-time compilation approach for JavaScript?
Edit: There is another existing question which already address my first question, but not the second.

Does it mean that JavaScript is not an interpreted language in V8?
The answer to this is "it depends".
Historically V8 has compiled directly to machine code using its "full-codegen" compiler, which produces unoptimized code which uses inline caching to implement most operations such as arithmetic operations, load and stores of variables and properties, etc.
The code generated by full-codegen keeps track of how "hot" each function is, by adjusting a counter when the function is called and when it jumps back to the top of loops.
It also keeps track of the types of the variables used in each expression.
If it determines that a function (or part of a function) is very hot, and it has collected enough type information, it triggers the "Crankshaft" compiler which generates much better code.
However, the V8 developers are actively working on moving to a different system where they start off with an interpreter called "Ignition" and then use a compiler called "Turbofan" to produce optimized code for hot functions.
Here are a couple of posts from the V8 developers blog describing this:
Firing up the Ignition Interpreter
Help us test the future of V8
Does V8 use a just-in-time compilation approach for JavaScript?
Yes, in a number of ways.
Firstly, it has a lazy parsing and lazy compilation mechanism. This means that when it parses a Javascript source file it parses the outermost scope eagerly, generating the full-codegen code immediately.
However, for functions defined within the file, it skips over them and just records the name of the function and the location of its source code. It generates a dummy function which simply calls into the V8 runtime to trigger the actual compilation of the function.
Secondly, it has a two stage compiler pipeline as described above, using either full-codegen+crankshaft or ignition+turbofan.
When the compilation is triggered it will initially generate unoptimized code or ignition bytecode (which it can do very quickly), and then later if the code is hot it triggers an optimized re-compilation (which is much slower but generates much better code).

How does JavaScript's Single Threaded Model handle time consuming tasks?

This question is regarding the sinlge threaded model of JavaScript. I understand that javascript is non-block in nature cause of its ability to add a callbacks to the async event queue. But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? How does nodejs handle such a problem? And is this an unavoidable problem for developers on the front end? I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?

But if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded?
Yes.
How does nodejs handle such a problem?
Node.js handles nothing. How you handle concurrency is up to you and your application. Now, Node.js does have a few tools available to you. The first thing you have to understand is that Node.js is basically V8 (JavaScript engine) with a lightweight library split between JavaScript and native code bolted on. While your JavaScript code is single-threaded by nature, the native code can and does create threads to handle your work.
For example, when you ask Node.js to load a file from disk, your request is passed off to native code where a thread pool is used, and your data is loaded from disk. Once your request is made, your JavaScript code continues on. This is the meaning of "non-blocking" in the context of Node.js. Once that file on disk is loaded, the native code passes it off to the Node.js JavaScript library, which then executes your callback with the appropriate parameters. Your code continued to run while the background work was going on, but when your callback is dealing with that data, other JavaScript code is indeed blocked from running.
This architecture allows you to get much of the benefit of multithreaded code without having to actually write any multithreaded code, keeping your application straightforward.
I'm asking this question cause I have read that its generally good practice to keep function tasks as small as possible. Is it really because long tasks in javascript will actually block other tasks?
My philosophy is always to use what you need. It's true that if a request comes in to your application and you have a lot of JavaScript processing of data that is blocking, other requests will not be processed during this time. Remember though that if you are doing this sort of work, you are likely CPU bound anyway and doing double the work will cause both requests to take longer.
In practice, the majority of web applications are IO bound. They shuffle data from a database, reformat it, and send it out over the network. The part where they handle data is actually not all that time consuming when compared to the amount of time the application is simply waiting to hear back from the upstream data source. It is in these applications where Node.js really shines.
Finally, remember that you can always spawn child processes to better distribute the load. If your application is that rare application where you do 99% of your work load in CPU-bound JavaScript and you have a box with many CPUs and/or cores, split the load across several processes.

Your question is a very large one, so I am just going to focus on one part.
if the callback function does infact take a long time to complete, won't JavaScript then be blocking everything else during that time as it is single threaded? (...) Is it really because long tasks in javascript will actually block other tasks?
Non-blocking is a beautiful thing XD
The best practices include:
Braking every function down into its minimum functional form.
Keep CallBacks asynchronies, THIS is an excellent post on the use of CallBacks
Avoid stacking operations, (Like nested Loops)
Use setTimeout() to brake up potentially blocking code
And many other things, Node.JS is the gold standard of none blocking so its worth a look.
--
--
setTimeout() is one of the most important functions in no-blocking code
So lets say you make a clock function that looks like this:
function setTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
}
while(true){setTime();}
Its quite problematic, because this code will happily loop its self until the end of time. No other function will ever be called. You want to brake up the operation so other things can run.
function startTime() {
var date=new Date();
time = date.getTime()
document.getElementById('id').innerHTML = time;
setTimeout(startTime(),1000);
}
'setTimeout();' brakes up the loop and executes it every 1-ish seconds. An infinite loop is a bit of an extreme example. The point is 'setTimeout();' is great at braking up large operation chains into smaller ones, making everything more manageable.

Is this recursion or not

function x(){
window.setTimeout(function(){
foo();
if(notDone()){
x();
};
},1000);
}
My concern being unbounded stack growth. I think this is not recursion since the x() call in the timer results in a brand new set of stack frames based on a new dispatch in the JS engine.
But reading the code as an old-fashioned non JS guy it makes me feel uneasy
One extra side question, what happens if I scheduled something (based on math rather than a literal) that resulted in no delay. Would that execute in place or would it be in immediately executed async, or is that implementation defined

It's not - I call it "pseudo-recursion".
The rationale is that it kind of looks like recursion, except that the function always correctly terminates immediately, hence unwinding the stack. It's then the JS event loop that triggers the next invocation.

It is recusive in a sense that it is a function that calls itself but I believe you are right about the stack trace being gone. Under normal execution the stack will just show that it was invoked by setTimeout. The chrome debugger for example will allow you to keep stack traces on async execution, I am not sure how they are doing it but the engine can keep track of the stack somehow.
No matter how the literal is calculated the execution will still be async.
setTimeout(function(){console.log('timeout');}, 0);console.log('executing');
will output:
executing
undefined
timeout

One extra side question, what happens if I scheduled something (based on math rather than a literal) that resulted in no delay. Would that execute in place or would it be in immediately executed async, or is that implementation defined
Still asynchronous. It's just that the timer will be processed immediately once the function returns and the JavaScript engine can process events on the event loop.

Recursion has many different definitions, but if we define it as the willful (as opposed to bug-induced) use of a function that calls itself repeatedly in order to solve a programming problem (which seems to be a common one in a Javascript context), it absolutely is.
The real question is whether or not this could crash the browser. The answer is no in my experience...at least not on Firefox or Chrome. Whether good practice or not, what you've got there is a pretty common Javascript pattern, used in a lot of semi-real-time web applications. Notably, Twitter used to do something very similar to provide users with semi-real-time feed updates (I doubt they still do it now that they're using a Node server).
Also, out of curiously I ran your script with the schedule reset to run every 50ms, and have experienced no slowdowns.

JavaScript and single-threadedness

I always hear that JavaScript is single-threaded; that when JavaScript is executed, it's all run in the same global mosh pit, all in a single thread.
While that may be true, that single execution thread may spawn new threads, asynchronousy reqeiving data back to the main thread, correct? For example, when an XMLHttpRequest is sent, doesn't the browser create a new thread that performs the HTTP transaction, then invoke callbacks back in the main thread when the XMLHttpRequest returns?
What about timers--setTimeout and setInterval? How do those work?
Is this single-threadedness the result of the language? What has stopped JavaScript from having multi-threaded execution before the new Web Workers draft?

XMLHttpRequest, notably, does not block the current thread. However, its specifics within the runtime are not outlined in any specification. It may run in a separate thread or within the current thread, making use of non-blocking I/O.
setTimeout and setInterval set timers that, when run down to zero, add an item for execution, either a line of code of a function/callback, to the execution stack, starting the JavaScript engine if code execution has stopped. In other words, they tell the JavaScript engine to do something after it has finished doing whatever it's doing currently. To see this in action, set multiple setTimeout(s) within one method and call it.

Your JavaScript itself is single-threaded. It may, however, interact with other threads in the browser (which is frequently written with something like C and C++). This is how asynchronous XHR's work. The browser may create a new thread (or it may re-use an existing one with an event loop.)
Timers and intervals will try to make your JavaScript run later, but if you have a while(1){ ; } running don't expect a timer or interval to interrupt it.
(edit: left something out.)
The single-threadedness is largely a result of the ECMA specification. There's really no language constructs for dealing with multiple threads. It wouldn't be impossible to write a JavaScript interpreter with multiple threads and the tools to interact with them, but no one really does that. Certainly no one will do it in a web browser; it would mess everything up. (If you're doing something server-side like Node.js, you'll see that they have eschewed multithreading in the JavaScript proper in favor of a snazzy event loop, and optional multi-processing.)

See this post for a description of how the javascript event queue works, including how it's related to ajax calls.
The browser certainly uses at least one native OS thread/process to handle the actual interface to the OS to retrieve system events (mouse, keyboard, timers, network events, etc...). Whether there is more than one native OS-level thread is dependent upon the browser implementation and isn't really relevant to Javascript behavior. All events from the outside world go through the javascript event queue and no event is processed until a previous javascript thread of execution is completed and the next event is then pulled from the queue given to the javascript engine.

Browser may have other threads to do the job but your Javascript code will still be executed in one thread. Here is how it would work in practice.
In case of time out, browser will create a separate thread to wait for time out to expire or use some other mechanism to implement actual timing logic. Then timeout expires, the message will be placed on main event queue that tells the runtime to execute your handler. and that will happen as soon as message is picked up by main thread.
AJAX request would work similarly. Some browser internal thread may actually connect to server and wait for the response and once response is available place appropriate message on main event queue so main thread executes the handler.
In all cases your code will get executed by main thread. This is not different from most other UI system except that browser hides that logic from you. On other platforms you may need to deal with separate threads and ensure execution of handlers on UI thread.

Putting it more simply than talking in terms of threads, in general (for the browsers I'm aware of) there will only be one block of JavaScript executing at any given time.
When you make an asynchronous Ajax request or call setTimeout or setInterval the browser may manage them in another thread, but the actual JS code in the callbacks will not execute until some point after the currently executing block of code finishes. It just gets queued up.
A simple test to demonstrate this is if you put a piece of relatively long running code after a setTimeout but in the same block:
setTimeout("alert('Timeout!');", 5);
alert("After setTimeout; before loop");
for (var i=0, x=0; i < 2000000; i++) { x += i };
alert("After loop");
If you run the above you'll see the "After setTimeout" alert, then there'll be a pause while the loop runs, then you'll see "After loop", and only after that will you see "Timeout!" - even though clearly much longer than 5ms has passed (especially if you take a while to close the first alert).
An often-quoted reason for the single-thread is that it simplifies the browser's job of rendering the page, because you don't get the situation of lots of different threads of JavaScript all trying to update the DOM at the same time.

Javascript is a language designed to be embedded. It can and has been used in programs that execute javascript concurrently on different operating threads. There isn't much demand for an embedded language to explicitly control the creation of new threads of execution, but it could certainly be done by providing a host object with the required capabilities. The WHATWG actually includes a justification for their decision not to push a standard concurrent execution capability for browsers.

Develop Reference

JavaScript is the programming language of the Web.