On https://v8.dev/docs/ignition we can see that:
Ignition is a fast low-level register-based interpreter written using the backend of TurboFan
on https://docs.google.com/document/d/11T2CRex9hXxoJwbYqVQ32yIPMh0uouUZLdyrtmMoL44/edit?ts=56f27d9d#
The aim of the Ignition project is to build an interpreter for V8 which executes a low-level bytecode, thus enabling run-once or non-hot code to be stored more compactly in bytecode form.
The interpreter itself consists of a set of bytecode handler code snippets, each of which handles a specific bytecode and dispatches to the handler for the next bytecode. These bytecode handlers
To compile a function to bytecode, the JavaScript code is parsed to generate its AST (Abstract Syntax Tree). The BytecodeGenerator walks this AST and generates bytecode for each of the AST nodes as appropriate.
Once the graph for a bytecode handler is produced it is passed through a simplified version of Turbofan’s pipeline and assigned to the corresponding entry in the interpreter table.
So it seems that Ignition job is to take bytecode generated by BytecodeGenerator convert it to bytecode handlers and execute it through Turbofan.
But here:
and here:
You can see that it is ignition that produces bytecode.
What is more, in this video https://youtu.be/p-iiEDtpy6I?t=722 Ignition is said to be a baseline compiler.
So what's it?
A baseline compiler? A bytecode interpreter? An AST to bytecode transformer?
This image seems to be most appropriate:
where ignition is just an interpreter and everything before is no-name bytecode generator/optimizer thing.
V8 developer here.
On https://v8.dev/docs/ignition we can see that:
Ignition is a fast low-level register-based interpreter written using the backend of TurboFan
Yes, that sums it up. To add a little more detail:
The name "Ignition" refers to both the Bytecode Generator and the Bytecode Interpreter. Often, the entire thing is also seen as one big black box and casually called "the interpreter", which can sometimes lead to a bit of confusion around the terms.
The Bytecode Generator takes the AST produced by the Parser for a given JavaScript function, and generates bytecode from it.
The Bytecode Interpreter takes the bytecode generated by the Bytecode Generator and executes it by interpreting it by sending it to a set of Bytecode Handlers.
The Bytecode Handlers that make up the Bytecode Interpreter are generated using parts of the Turbofan pipeline. This happens at V8 compilation time, not at runtime. In other words, you need Turbofan to build (parts of) Ignition, but not to run Ignition.
The Parser (and the AST/Abstract Syntax Tree it produces are not part of Ignition.
Once the graph for a bytecode handler is produced it is passed through a simplified version of Turbofan’s pipeline and assigned to the corresponding entry in the interpreter table.
So it seems that Ignition job is to take bytecode generated by BytecodeGenerator convert it to bytecode handlers and execute it through Turbofan
This section of the design document talks about generating the Bytecode Handlers, which happens "ahead of time" (i.e. when V8 is compiled) using parts of Turbofan. At runtime, bytecode is not converted to handlers, it is "handled" (=run, executed, interpreted) by the existing fixed set of handlers, and Turbofan is not involved.
What is more, in this video https://youtu.be/p-iiEDtpy6I?t=722 Ignition is said to be a baseline compiler.
At that moment, the talk is referring to the general idea that all modern JavaScript engines have a "baseline compiler" (in a very general, conceptual sense -- I agree that the slide could have made that clearer). Note that the slide does not say anything about Ignition. The next slide says that Ignition fills that role in V8. So more accurate would be to say "Ignition takes the place of a baseline compiler" or "Ignition is a baseline execution engine". Or you could redefine your terms slightly and say "Ignition is a compiler that produces bytecode and then interprets it".
ignition is just an interpreter and everything before is no-name bytecode generator/optimizer thing
That slide shows the "Interpreter" box as part of the "Ignition Bytecode Pipeline". The Bytecode Generator/Optimizer are also part of Ignition.
As I mentioned in a comment, sadly some of the docs are out of date, including the one with your first graphic above. Full-codegen and Crankshaft are no longer used at all, it's purely parsing and Ignition + TurboFan. (you've removed the image from the outdated docs that sadly are still linked by some of the V8 docs)
Ignition is a high-speed bytecode interpreter.
V8's parser produces Ignition bytecode. That bytecode is executed (interpreted) by Ignition. Code that only runs once (startup code and such) or isn't run often stays at the bytecode level and continues to be executed by Ignition.
"Hot" code goes to the second phase, where TurboFan kicks in: TurboFan's input is the same bytecode that Ignition interprets (rather than source code, as it was with Crankshaft), which it then aggressively compiles to highly-optimized machine code that runs directly (rather than being interpreted).
This article goes into the motivations for moving off Full-codegen and Crankshaft (memory savings in the former case, difficulty implementing and in particular optimizing language features in the second). The design of TurboFan also helps the V8 authors minimize the amount of platform-specific code they have to write (by having an intermediate representation, which amongst other things they can also use to write Ignition's bytecode handlers).
Please correct my mistake in understanding the conversion process:
JS code parsed into AST
AST sent to Ignition to be converted into bytecode
Generate Machine Code
If byte code is optimal, Send it to the CPU for processing
If byte code needs optimization, it is sent to Turbofan for optimization
If Turbofan received byte code, it further optimizes, and sent to computer for processing.
Related
I have learnt about how V8 of chrome browser works in an abstract view from this webpage https://blog.bitsrc.io/how-does-javascript-really-work-part-1-7681dd54a36d
the interpreter(ignition) converts it into bytecode...
but then who will convert this bytecode to machine code ?
It doesn't turn the bytecode into machine code.
the Ignition interpreter takes the Abstract syntax tree and produce a bytecode from it.
after the bytecode has been produced the Ignition interpreter it start to execute those bytecodes directly.
i know it's a bit confusing when you hear that the interpreter produce a bytecodes and also execute those bytecodes.
but that's actually what the Ignition interpreter does. it's not only executing it's also producing bytecodes
so don't get confused by the name.
for example we have the following bytecode
LdaSmi 5
when the Ignition sees that bytecode it will actually call a function from the engine which handles that instruction.
I read article about how JavaScript Engine works, but there is thing that confusing me, it says the following:
JavaScript code first parsed
The source code translated to bytecode
The bytecode gets optimized
The code generator takes bytecode and translates into low level assembly code
Is last step true?
Does above how JavaScript Engines work like "V8"?
(V8 developer here.)
Yes, the JavaScript engines used in "modern" (since 2008) browsers have just-in-time compilers that compile JavaScript to machine code. That's probably what that article meant to say. If you want to distinguish the terms "assembly language" and "machine code", then the former would be the human-readable form (such as mov eax, ebx, written by humans and produced by disassemblers) and the latter would be the binary-encoded form that the CPU understands (such as 0x89 0xD8, produced by compilers/assemblers). I'd say that the term "assembly code" is sufficiently ambiguous that it could refer to either, or could imply that you don't want to distinguish.
I find the third step in your description more misleading: byte code is typically not optimized. The bytecode interpreter, if it exists, is usually the engine's first execution tier, and its purpose is to start execution as soon as possible, without first spending any time on optimizations. If a function runs hot enough, the engine will eventually decide to spend the time to optimize it to machine code (depending on the engine, possibly in a succession of several increasingly powerful but costly compilers). These later, optimizing tiers may or may not take the bytecode as input; alternatively they can parse the source again to build an AST (taking V8 as a specific example, it used to do the latter and is currently doing the former).
Side note: that article is pretty silly indeed. Example:
techniques like inlining (removing white space)
That's so wrong that it's outright funny :-D
Im trying to understand how v8 works but im unable to locate where in the code does it actually get the input raw js script to parse it and compile it into c++.
I've seen the api.cc and tried to set up a breakpoint in the compiler function but with no luck (im using chromium to do so), it never hits this function.
MaybeLocal<Script> ScriptCompiler::Compile(Local<Context> context,
Source* source,
CompileOptions options,
NoCacheReason no_cache_reason)
***** UPDATE ****
After #jmrk reply I've been trying to figure out where does the JS actually start coming in, what im really interested in is understanding how a website renders and then passes the script into the V8 for it to compile.
I have found quite a lot of information on the topic but im still unable to understand the whole picture:
Turns out the first step isn't the Parser but the Scanner, which gets a UTF-16 stream as an input.
The source code is first broken up in chunks; each chunk may be
associated with a different encoding. A stream then unifies all chunks
under the UTF-16 encoding.
Prior to parsing, the scanner then breaks up the UTF-16 stream into
tokens. A token is the smallest unit of a script that has semantic
meaning. There are several categories of tokens, including whitespace
(used for automatic semicolon insertion), identifiers, keywords, and
surrogate pairs (combined to make identifiers only when the pair is
not recognized as anything else). These tokens are then fed first to
the preparser and then to the parser.
https://blog.logrocket.com/how-javascript-works-optimizing-for-parsing-efficiency/
I have also found out it indeed gets this stream from Blink:
he UTF16CharacterStream provides a (possibly buffered) UTF-16 view over the underlying Latin1, UTF-8, or UTF-16 encoding that V8 receives from Chrome, which Chrome in turn received from the network. In addition to supporting more than one encoding, the separation between scanner and character stream allows V8 to transparently scan as if the entire source is available, even though we may only have received a portion of the data over the network so far.
https://v8.dev/blog/scanner
It also seems like the scanner feeds tokens to the parser:
V8’s parser consumes ‘tokens’ provided by the ‘scanner’. Tokens are
blocks of one or more characters that have a single semantic meaning:
a string, an identifier, an operator like ++. The scanner constructs
these tokens by combining consecutive characters in an underlying
character stream.
But the question remains, where is the Javascript raw code coming in from blink into V8?
How can I see what chrome reads and where does it initialize v8?
It's complicated :-)
ScriptCompiler::Compile is generally correct as the outermost entrypoint. Note that there are two overloads of it. Additionally, Chrome tries to do streaming compilation when it can, which takes a different path. Also, when working with Chrome/Chromium, note that you have to set the breakpoints in the renderer processes, not the browser process.
It's easier to work with the d8 shell when poking around V8. Look for Shell::ExecuteString (which calls ScriptCompiler::Compile) in d8.cc.
Also, to clarify, V8 does not compile JavaScript to C++. It compiles it first to its own internal bytecode format which is executed by the "Ignition" interpreter; hot functions are then later compiled to machine code by the "Turbofan" optimizing compiler.
Don't be discouraged if you have trouble understanding the whole pipeline. No single person does; V8 is too big and too complicated for that. Focus on what you're interested in (parser? interpreter? optimizing compiler?) and dig into that.
To improve performance JavaScript engines sometimes only fully parse functions when they are actually called.
For example, from the Spidermonkey source code:
Checking the syntax of a function is several times faster than doing a full parse/emit, and lazy parsing improves both performance and memory usage significantly when pages contain large amounts of code that never executes (which happens often).
What steps can the parser skip while still being able to validate the syntax?
It appears that in Spidermonkey some of the savings come from not emitting bytecode, like after a full parse. Does a full parse in e.g. V8 also include generating machine code?
First off a clarification: the two steps are called "pre-parsing" and "full parsing". "Lazy parsing" describes the strategy of doing the former first, and then the latter when needed.
The two big reasons why pre-parsing is faster are:
it doesn't build the AST (abstract syntax tree), which is usually the output of the parser
it doesn't do scope resolution for variables.
There are a few other internal steps done by the parser in preparation for code generation that aren't needed when only checking for errors, but the above two are the major points.
(Full) parsing and code generation are separate steps.
I wrote an interpreter in JavaScript for a small language using jison, which is a JS port of bison. The language is used to evaluate expressions and conditions. Right now the evaluation is mixed with the parsing.
I'm trying to optimize it and the bottleneck is lexer and the parser. So I decided to parse it before hand and only evaluate on runtime.
The question is that which one is faster or cleaner, generate JS code before and only run that or generate AST and iterate on it on runtime?
In general*, faster will always be to generate anything that is closest to machine code. In your case generating javascript would be faster.
The generated javascript code would be executed directly by the underlying C/C++ interpreter (and in some cases compiled JIT into machine code). In contrast, writing your own VM in javascript to execute the AST would run on an additional layer of VM - javascript.
*note: There are some corner cases where interpreters can sometimes execute as fast as native code. Forth being an example because it's interpreter is dead simple - it's just a table of function pointers.