Is there a way to learn at how JavaScript is interpreted and executed? In .NET or JAVA for instance, you could look at the generated byte code, in C you could look at the generated assembly instruction but from what I gather, JavaScript is interpreted line by line and then it varies on the JS engine in different browsers.
Still is there a way to learn how JavaScript does this? Does the interpreter in modern browsers tend to look ahead and optimize as a compiler might?
For instance, if I did:
$('#div1').css('background-color','red');
$('#div1').css('color','white');
Could I have a perf gain by doing:
var x = $('#div1');
x.css('background-color','red');
x.css('color','white');
The point of this question is to get some information how one might gain some insight as to how JavaScript is run in the browser.
The optimization steps taken, as always, depend on the compiler. I know that SpiderMonkey is fairly well documented, open source, and I believe does JIT compilation. You can use it outside of the browser to tinker with, so that's one less black-box to deal with when experimenting. I'm not sure if there's any way to dump the compiled code as it runs to see how it optimizes in your code, but since there's no standard concept of an intermediate representation of Javascript (like there is with .NET/Java bytecode) it would be specific to the engine anyway.
EDIT: With some more research, it does seem that you can get SpiderMonkey to dump its bytecode. Note, however that optimizations can take place both in the interpreter that generates the bytecode and the JIT compiler that consumes/compiles/executes the bytecode, so you are only halfway there to understanding any optimizations that may occur (as is true with Java and .NET bytecodes).
V8 does some amazing things internally. It compiles to machine code and creates hidden classes for your objects. Mind-blowing details here: https://developers.google.com/v8/design
Ofcourse when it comes to the DOM, performing lookups can only go that fast. Which is why chaining or temp variables can make a difference, especially in loops and during animations.
Besides that, using the right selectors can be helpful. #id lookups are undoubtedly the fastest. Simple tagname or class lookups are reasonably fast as well. jQuery selectors like :not can't fall back on builtins and are considerably slower.
Besides reducing lookups, another good rule of thumb with the DOM is: If you need to do a lot of DOM manipulation, take out some parent node, do the manipulations, and reinsert it. This could save a whole lot of reflows/repaints.
Yes, you will get a performance gain. The DOM is not queried twice, and only one jQuery object is instantiated.
Yet you should not do this for performance (the gain will be negligible), but for code quality. Also, you do not need a variable, jQuery supports chaining:
$('#div1').css('background-color','red').css('color','white');
and the .css method also can take an object:
$('#div1').css({'background-color': 'red', 'color': 'white'});
Related
I was thinking about this today and I realized I don't have a clear picture here.
Here are some statements I think to be true (please correct me if I'm wrong):
the DOM is a collection of interfaces specified by W3C.
when parsing HTML source code, the browser creates a DOM tree which has nodes that implement DOM interfaces.
the ECMAScript spec has no reference of browser host objects (DOM, BOM, HTML5 APIs etc.).
how the DOM is actually implemented depends on browser internals and is probably different among most of them.
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
I am curious about what happens behind the scenes when I call document.getElementById('foo'). Does the call get delegated to browser native code by the interpreter or does the browser have JS implementations of all host objects? Do you know about any optimizations they do in regard to this?
I read this overview of browser internals but it didn't mention anything about this. I will look through the Chrome and FF source when I have time, but I thought about asking here first. :)
All of your bullet points are correct, except:
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
should be "...and translate it to native code". SpiderMonkey (the JS engine in Firefox) worked as a bytecode interpreter for a long time before the current JS speed arms race.
On Mozilla's JS-to-DOM bridge:
The host objects are typically implemented in C++, though there is an experiment underway to implement DOM in JS. So when a web page calls document.getElementById('foo'), the actual work of retrieving the element by its ID is done in a C++ method, as hsivonen noted.
The specific way the underlying C++ implementation gets called depends on the API and also changed over time (note that I'm not involved in the development, so might be wrong about some details, here's a blog post by jst, who was actually involved in creating much of this code):
At the lowest level every JS engine provides APIs to define host objects. For example, the browser can call JS_DefineFunctions (as demonstrated in the SpiderMonkey User Guide) to let the engine know that whenever script calls a function with the specified name, a provided C callback should be called. Same for other aspects of the host objects (e.g. enumeration, property getters/setters, etc.)
For the core ECMAScript functionality and in some tricky DOM cases the JS engine/the browser uses these APIs directly to define host objects and their behaviors, but it requires a lot of common boilerplate code for e.g. checking parameter types, converting them to the appropriate C++ types, error handling etc.
For reasons I won't go into, let's say historically, Mozilla made heavy use of XPCOM for many of its objects, including much of the DOM. One feature of XPCOM is its binding to JS called XPConnect. Among other things, XPConnect can take an interface definition in IDL (such as nsIDOMDocument; or more precisely its compiled representation), expose an object with the specified properties to the script, and later, when a script calls getElementById, perform the necessary parameter checks/conversions and route the call directly to a C++ method (nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn))
The way XPConnect worked was quite inefficient: it registered generic functions as callbacks to be executed when a script accesses a host object, and these generic functions figured out what they needed to do in every particular case dynamically. This post about quickstubs walks you through one example.
"Quick stubs" mentioned in the previous link is a way to optimize JS->C++ calls time by trading some code size for it: instead of always using generic C++ functions that know how to make any kind of call, the specialized code is automatically generated at the Firefox build time for a pre-defined list of "hot" calls.
Later on the JIT (tracemonkey at that time) was taught to generate the code calling C++ methods as part of the native code generated for "hot" paths in JS. I'm not sure how the newer JITs (jaegermonkey) work in this regard.
With "paris bindings" the objects are exposed to webpage JS without any reliance on XPConnect, instead generating all the necessary glue JSClass code based on WebIDL (instead of XPCOM-era IDL). See also posts by developers who worked on this: jst and khuey. Also see How is the web-exposed DOM implemented?
I'm fuzzy on details of the three last points in particular, so take it with a grain of salt.
The most recent improvements are listed as dependencies of bug 622298, but I don't follow them closely.
JS calls to DOM methods like getElementById cause the JS engine to call into the C++ code that implements the DOM. For example, in Firefox, the call ends up in nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn).
As you can see, Firefox maintains a hashtable that maps ids to elements in C++ as an optimization in this case, so it doesn't walk the whole DOM tree looking for the id.
The DOM is implemented as a language-independent library pretty much in all major browser implementations, which means it's in a different library from the Javascript engine. For example in IE, the JS engine is implemented in jscript.dll while the DOM is implemented in mshtml.dll. Safari has Nitro(JS) and WebCore(DOM). Chrome has V8(JS) and WebCore(DOM), and Firefox has SpiderMonkey/TraceMonkey(JS) and Gecko(DOM).
What this means is that anytime your JS has to access the DOM, it has to reach over to the DOM library - which is inherently slow because of all the marshaling that has to take place. An analogy that has been used is 2 pieces of land connected by a toll bridge, any time you touch the DOM, you must cross over the bridge and cross back - paying a performance toll.
References
Video: Building High Performance Web Applications and Sites
Book: High Performance Javascript (Chapter 3 on the DOM)
As I have learnt, its better to cache the values in objects which we need repeatedly. For example, doing
var currentObj = myobject.myinnerobj.innermostobj[i]
and using 'currentObj' for further operations is better for performance than just
myobject.myinnerobj.innermostobj[i]
everywhere, like say in loops.. I am told it saves the script from looking-up inside the objects every time..
I have around 1000 lines of code, the only change I did to it with the intention of improving performance is this (at many locations) and the total time taken to execute it increased from 190ms to 230ms. Both times were checked using firebug 1.7 on Firefox 4.
Is what I learnt true (meaning either I am overusing it or mis-implemented it)? Or are there any other aspects to it that I am unaware of..?
There is an initial cost for creating the variable, so you have to use the variable a few times (depending on the complexity of the lookup, and many other things) before you see any performance gain.
Also, how Javascript is executed has changed quite a bit in only a few years. Nowadays most browsers compile the code in some form, which changes what's performant and what's not. It's likely that the perforance gain from caching reference is less now than when the advice was written.
The example you have given appears to simply be Javascript, not jQuery. Since you are using direct object property references and array indices to existing Javascript objects, there is no lookup involved. So in your case, adding var currentObj... could potentially increase overhead by the small amount needed to instantiate currentObj. Though this is likely very minor, and not uncommon for convenience and readability in code, in a long loop, you could possibly see the difference when timing it.
The caching you are probably thinking of has to do with jQuery objects, e.g.
var currentObj = $('some_selector');
Running a jQuery selector involves a significant amount of processing because it must look through the entire DOM (or some subset of it) to resolve the selector. So doing this, versus running the selector each time you refer to something, can indeed save a lot of overhead. But that's not what you're doing in your example.
See this fiddle:
http://jsfiddle.net/SGqGu/7/
In firefox and chrome (didn't test IE) -- the time is identical in pretty much any scenario.
Is what I learnt true (meaning either
I am overusing it or mis-implemented
it)? Or are there any other aspects to
it that I am unaware of..?
It's not obvious if either is the case because you didn't post a link to your code.
I think most of your confusion comes from the fact that JavaScript developers are mainly concerned with caching DOM objects. DOM object lookups are substantially more expensive than looking up something like myobj.something.something2. I'd hazard a guess that most of what you've been reading about the importance of caching are examples like this (since you mentioned jQuery):
var myButton = $('#my_button');
In such cases, caching the DOM references can pay dividends in speed on pages with a complex DOM. With your example, it'd probably just reduce the readability of the code by making you have to remember that currentObj is just an alias to another object. In a loop, that'd make sense, but elsewhere, it wouldn't be worth having another variable to remember.
When I saw this question I thought it would be helpful if a jQuery compiler could be written. Now, by compiler, I mean something that takes in jQuery code and outputs raw javascript code that is ultimately executed.
This is how I vision a block of jQuery code execution:
a jQuery function is called and parameters are passed to it
the function calls a raw javascript function and passes the parameters it received to it
the newly called function performs the intended action
I understand that this is a very simplified model and it could be much more complex, but I think the complexity is reduced to steps 2 and 3 being repeated with different raw js functions being called and each time fed with all or a subset of parameters / previous results.
If we subscribe to that model, then we might come up with methods to make the jQuery functions perform double-duty:
What they already do
Logging what they did in form of raw_function(passed_params)
Am I making some wrong assumptions that would make this impossible?
Any ideas how Firebug's profiler attempts to get function names? Could it be used here?
Edit
What I was thinking was making a black box with input / output as:
normal jquery code → [BB] → code you'd write if you used no library
I called this a compiler, because you compiled once and then would use the resulting code.
I argued that it could have at least educational use, and probably other uses as well.
People said this would take in a small amount of code and output a huge mass; that does not defy the intended purpose as far as I see
People said I'd be adding an extra, needless step to page rendering, which, given only the resulting code would ultimately be used (and probably be used just for studying), is not correct.
People said there is no one-to-one relation between javascript functions and jquery functions, and implied such a converter would be too complicated and probably not worth the effort. With this I now agree.
Thank you all!
I think what you mean is: if you write
var myId = $("#myId")
it will be converted to
var myId = document.getElementById("myId")
I think its possible, but the problem is, jQuery functions return jQuery objects, so in the above example, the first myId will be a jQuery object & the second will be a node object(i think) which will affect other functions that needs to use it later in the code after compilation. Especially if they are chained
secondly you will have to be sure that the conversion actually has performance benefits.
However if you are aware of all this and you can plan you code accordingly, i think it will be possible
If the purpose of the compiler to convert Javascript (which may be jquery or anything) to better Javascript (which I understood from you saying "ultimately executed"), then Google has already done that. They made closure compiler and some have tried it with JQuery in this post. Isn't this what you are suggesting ?
jQuery code is "raw JavaScript code" so I'm not sure what a compiler would really buy you. That's like writing a C# compiler which takes C# 4.0 code and emits C# 1.1 code. What's the benefit?
jQuery isn't a different language which replaces or even sits on top of JavaScript. It's a framework of JavaScript code which provides lots of useful helpers for common tasks. In my view, it's greatest benefit is that its distinctive structure helps to differentiate it from the more "Java-like" languages.
JavaScript by itself is deceptively similar to other languages and this tends to be one of its biggest faults as a language. People try to think of it in terms of Java, even though the similarities pretty much stop at the name. Structurally, JavaScript is very different in many ways (typing, scope, concurrence, inheritance, polymorphism, etc.) and I particularly like how jQuery and other modern JavaScript projects have brought the language to stand on its own merits.
I guess to get back to the question... If you're looking to turn jQuery-style JavaScript into Java-style JavaScript, then that's something of a step backwards. Both versions would be interpreted by the browser the same way, but one of the versions is less elegant and represents the language more poorly than the other.
Note that jQuery isn't the only framework that does these things, it's just the most popular. Would such a compiler need to also handle all the other frameworks? They all do different things in different ways to take advantage of the language. I don't think that homogenizing them to a "simpler" form buys us anything.
Edit: (In response to the various comments around this question and its answers, kind of...
How would you structure this compiler? Given that (as we've tried to point out) jQuery is JavaScript and is just a library of JavaScript code, and given how browsers and JavaScript work, your "compiler" would just have to be another JavaScript library. So essentially, what you want is to go from:
A web page
The jQuery library
Some JavaScript code which uses the jQuery library
to:
A web page
The jQuery library
Some JavaScript code which uses the jQuery library
Your "compiler" library
Some more JavaScript code which sends the previous JavaScript code through your library somehow
Your "jQuery-equivalent" library
Some more JavaScript code which replaces the original JavaScript code with your new version
in order to make things simpler? Or to somehow make debugging tools like FireBug easier to use? What you're proposing is called an "obfuscator" and its sole purpose is to make code more difficult to reverse-engineer. A side effect is that it also make code more difficult to understand and maintain.
Now, by compiler, I mean something
that takes in jQuery code and outputs
raw javascript code that is ultimately
executed.
I think that statement may indicate what's going wrong for you.
jQuery is a library implemented in the Javascript language. jQuery isn't a language separated from Javascript. jQuery is a collection of Javascript code that you can use to make Javascript development easier. It's all Javascript. A "jQuery compiler" to convert "jQuery code" to "raw Javascript" would be quite useless because jQuery is raw Javascript.
What you probably actually want is a Javascript compiler. In that case, it's certainly possible. In fact, some web browsers nowadays actually "compile" on the Javascript code in some kind of bytecode to enhance performance. But development workflows involving Javascript typically don't involve a compiler tool of some kind.
Apparently what you actually want is to "inline" jQuery code into your code, sort of like this:
var myfoo = $('#foo'); → var myfoo = document.getElementById('foo');
This is actually something a C++ compiler would do to optimize performance, but Javascript is not C++ so it doesn't apply here.
I don't see how this is useful. The point of jQuery is to simplify Javascript development by providing a consistent interface like the $() function. By performing this "inlining" procedure you produce code that is even harder to read and maintain.
And why add an extra step? Why not just deliver the application javascript code and the jQuery library to the browser? Why add an extra step involving an extra tool to convert Javascript to Javascript that doesn't provide any substantial extra benefits?
Are there any good tutorials on how to write fast, efficient code for v8 (specifically, for node.js)?
What structures should I avoid using? What are the idioms that v8 optimises well?
From my experience:
It does inlining
Function call overhead is minimal (inlining)
What is expensive is to pass huge strings to functions, since those need to be copied and from my experience V8 isn't always as smart as it could be in this case
Scope lookup is expensive (surprise)
Don't do tricks e.g. I have a binary encoder for JS Object, cranking out some extra performance with bit shifting there (instead of Math.floor) latest Crankshaft (yes alpha, but still) runs the code 30% slower
Don't use magic. eval, arguments.callee etc. Those pretty much kill any optimization since code can no longer be inlined
Some of the new ES5 stuff e.g. .bind() is really slow in V8 at the moment
Somehow new Object() and new Array() are a bit faster currently (MICROoptimization, unless you're writing some crazy encoder stick with {} and [])
My rules:
Write good code
Write working code
Write code that works in strict mode (support still has to land, but when it does further optimization can be applied by V8)
If you're an JS expert and your already applying all good practices to your code, there's hardly anything you can do to improve performance.
If you encounter performance issues:
Verify them
Change the code / algorithm
And as a last resort: Write a C++ extension (and watch every commit to ry/node on GitHub since nobody cares whether some internal changes break your build)
The docs give a great answer: http://code.google.com/apis/v8/design.html
Understanding V8 is a set of slides from nodecamp.eu and gives very some interesting tips. In particular, I found the notes on avoiding "dictionary mode" useful i.e. it helps if you keep the "shape" of objects constant and don't add arbitrary properties to them.
You should also run node with --crankshaft --trace-opt --trace-bailout (the --crankshaft is only needed on 64-bit platforms e.g. OS X) to see whether V8 is "bailing" on JITing certain functions. There is a ton of other trace options including --trace-gc and various other GC tracing, which can be useful for optimisation.
Let me know if you have any specific questions about the slides above as they're a bit concise. :-) They're not mine but I've done some research about the areas they cover.
Javascript is an incredible language and libraries like jQuery make it almost too easy to use.
What should the original designers of Javascript have included in the language, or what should we be pressuring them into adding to future versions?
Things I'd like to see:-
Some kind of compiled version of the language, so we programmers can catch more of our errors earlier, as well as providing a faster solution for browsers to consume.
optional strict types (eg, being able to declare a var as a float and keep it that way).
I am no expert on Javascript, so maybe these already exist, but what else should be there? Are there any killer features of other programming languages that you would love to see?
Read Javascript: The Good Parts from the author of JSLint, Douglas Crockford. It's really impressive, and covers the bad parts too.
One thing I've always longed for and ached for is some support for hashing. Specifically, let me track metadata about an object without needing to add an expando property on that object.
Java provides Object.getHashCode() which, by default, uses the underlying memory address; Python provides id(obj) to get the memory address and hash(obj) to be customizable; etc. Javascript provides nothing for either.
For example, I'm writing a Javascript library that tries to unobtrusively and gracefully enhance some objects you give me (e.g. your <li> elements, or even something unrelated to the DOM). Let's say I need to process each object exactly once. So after I've processed each object, I need a way to "mark it" as seen.
Ideally, I could make my own hashtable or set (either way, implemented as a dictionary) to keep track:
var processed = {};
function process(obj) {
var key = obj.getHashCode();
if (processed[key]) {
return; // already seen
}
// process the object...
processed[key] = true;
}
But since that's not an option, I have to resort to adding a property onto each object:
var SEEN_PROP = "__seen__";
function process(obj) {
if (obj[SEEN_PROP]) { // or simply obj.__seen__
return; // already seen
}
// process the object...
obj[SEEN_PROP] = true; // or obj.__seen__ = true
}
But these objects aren't mine, so this makes my script obtrusive. The technique is effectively a hack to work around the fact that I can't get a reliable hash key for any arbitrary object.
Another workaround is to create wrapper objects for everything, but often you need a way to go from the original object to the wrapper object, which requires an expando property on the original object anyway. Plus, that creates a circular reference which causes memory leaks in IE if the original object is a DOM element, so this isn't a safe cross-browser technique.
For developers of Javascript libraries, this is a recurring issue.
What should the original designers of Javascript have included in the language, or what should we be pressuring them into adding to future versions?
They should have got together and decided together what to implement, rather than competing against each other with slightly different implementations of the language (naming no names), to prevent the immense headache that has ensued for every developer over the past 15 years.
The ability to use arrays/objects as keys without string coercion might've been nice.
Javascript is missing a name that differentiates it from a language it is nothing like.
There are a few little things it could do better.
Choice of + for string concatenation was a mistake. An & would have been better.
It can be frustrating that for( x in list ) iterates over indices, as it makes it difficult to use a literal array. Newer versions have a solution.
Proper scoping would be nice. v1.7 is adding this, but it looks clunky.
The way to do 'private' and 'protected' variables in an object is a little bit obscure and hard to remember as it takes advantage of closures and how they affect scoping. Some syntactic sugar to hide the mechanics of this would be fabulous.
To be honest, many of the problems I routinely trip over are actually DOM quirks, not JavaScript per se. The other big problem, of course, is that recent versions of JavaScript have interesting and useful things, like generators. Unfortunately, most browsers are stuck at 1.5. Apparantly only FireFox is forging ahead.
File IO is missing.... though some would say it doesn't really need it...