Efficiently counting the number of keys / properties of an Object in JavaScript - javascript

This question is almost identical to How to efficiently count the number of keys/properties of an object in JavaScript?.
I want to know one extra piece of information: what is a "constant-time" way of determining the number of keys in an Object? I am mostly concerned with doing this in Node.JS, as most Objects on the browser aren't too large to be of great concern.
EDIT:
It appears that Object.keys(obj).length returns in linear time O(n) in Google Chrome and in Node.JS (i.e. dependent on the number of keys in obj). Is there a better O(1) method?
I did some testing in Node.JS (source is below)
var tests = [10e3, 10e4, 10e5, 10e6]
for(j in tests) {
var obj = {};
for(i = 0; i < tests[j]; i++)
obj[i] = i;
console.time('test' + tests[j]);
Object.keys(obj).length;
console.timeEnd('test' + tests[j]);
}
For n = 10e3, 10e4, 10e5, 10e6... results are:
test10000: 5ms
test100000: 20ms
test1000000: 371ms
test10000000: 4009ms

After a bit of research, there is no way to determine the number of keys in a JavaScript Object in constant time, at least not in Node... and not quite yet. Node internally keeps track of this information, but it does not expose it, since there is no method to do so in ECMA-262 5th.
It's worth noting that Harmony (ECMA version 6) may natively support Maps and Sets. Not sure what the spec for these will turn out to be.
I am told that we need to bring this up with TC39 committee.
Bug report for V8: http://code.google.com/p/v8/issues/detail?id=1800

ECMA 6 harmony introduces Map and Set classes which you can probably utilize (in the future :)
var map = new Map;
map.set('a', 'b');
console.log(map.size); // prints 1
I believe it should have complexity O(1), not tried though. You can run it in node 0.11+ via node --harmony script.js.
Another way is to use Proxy class which also has been added in harmony.

See the source, specifically GetLocalElementKeys
v8 objects.cc

Related

What is the best option to fill array in Javascript?

Recently I have come to solve some programming problems where I need to fill an array with a default value. I find some approaches but don't know which one is the best option with respect to performance.
For example, if I want to fill an array of size 10 with 0. Here are some options that I ended up with.
Option One
let arr = []
arr.length = 10
arr.fill(0)
console.log(arr)
Option Two
let arr = new Array(10).fill(0)
console.log(arr)
Option Three
let arr = Array(10).fill(0)
console.log(arr)
Option Four
function makeNewArray(size, value) {
let arr = []
for (let i=1; i<=size; i++)
arr.push(value)
return arr
}
let arr = makeNewArray(10,0)
console.log(arr)
I am confused which one is standard to use and needs less time to compile. Or is there any other better approach?
If the performance really matters for your use case, test it. On my browser, the last is (slightly surprisingly to me) apparently the fastest.
However, all of these can execute millions of times per second (in Chrome on my mid-range laptop), so I'm rather skeptical that this is important to the overall performance of your application. I'd recommend avoiding premature optimization of this sort except in code that you know to be performance-critical. Everywhere else I'd prioritize legibility, so I would write something like new Array(10).fill(0) which is concise, likely to be understood by a JavaScript developer, and plenty fast.
The performance difference between that and the last likely has to do with the JS runtime implementation details. (In my case, that's V8, which doesn't optimize Array.prototype.fill using Torque/CSA builtins that can be optimized at the call site, but instead calls out to C++ code. Likely this is because fill performance hasn't often been an issue in the past.)

Why is <= slower than < using this code snippet in V8?

I am reading the slides Breaking the Javascript Speed Limit with V8, and there is an example like the code below. I cannot figure out why <= is slower than < in this case, can anybody explain that? Any comments are appreciated.
Slow:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
(Hint: primes is an array of length prime_count)
Faster:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i < this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
[More Info] the speed improvement is significant, in my local environment test, the results are as follows:
V8 version 7.3.0 (candidate)
Slow:
time d8 prime.js
287107
12.71 user
0.05 system
0:12.84 elapsed
Faster:
time d8 prime.js
287107
1.82 user
0.01 system
0:01.84 elapsed
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
this.primes is a contiguous array (every element holds a value) and the elements are all numbers.
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
One condition would be if some of the array elements are missing. For example:
let array = [];
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]? It has no value. (It isn't even correct to say it has the value undefined - an array element containing the undefined value is different from an array element that is missing entirely.)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1] contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
The first loop with <= attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
this.primes[i] evaluates to undefined because i is past the array end.
candidate % undefined (for any value of candidate) evaluates to NaN.
NaN == 0 evaluates to false.
Therefore, the return true is not executed.
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
The second loop with < reads only elements that exist within the array, so it allows an optimized array and code.
The problem is described in pages 90-91 of the talk, with related discussion in the pages before and after that.
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even read past the end of an array, it would prevent the simple optimized format from being used.
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
For reference, here's the full code example from the slides:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the < and <= operators directly. So please don't jump through hoops just to avoid <= in your code because you read on Stack Overflow that it's slow --- it isn't!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes:
this.primes = new Array(iterations);
This results in an array with a HOLEY elements kind in V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i] in the loop within isPrimeDivisible. No big deal!
TL;DR The array being HOLEY is not the problem here.
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact. What's so special about this particular case, then?
The out-of-bounds read results in this.primes[i] being undefined on this line:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the % operator is now being used with non-integer operands!
integer % someOtherInteger can be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.
integer % undefined on the other hand amounts to a way less efficient Float64Mod, since undefined is represented as a double.
The code snippet can indeed be improved by changing the <= into < on this line:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <= is somehow a superior operator than <, but just because this avoids the out-of-bounds read in this particular case.
TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error' ;)): out-of-bounds Array access.
Example 1:
a Dense Array (being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between < vs <= (or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the undefined value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
unboxed representations of integers and floating-point numbers;
absence of runtime type checks;
absence of garbage collection; and
efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
Call to isPrimeDivisible
Compile isPrimeDivisible using general assumptions (like no out of bounds access)
Do work
BAM, suddenly array accesses out of bounds (right at the end).
Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes is a 'dense array' pure numerical which was either
Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
most likely generated using a numerical function filling a pre-sized (new Array(/*size value*/)) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes array's length is cached as prime_count ! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array.
As I have tried to make clear with my sum function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String to the sum function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!
To add some scientificness to it, here's a jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
1. Looping out of bounds
2. Holey arrays
3. Modular arithmetic against NaNs
4. Completely undefined values
5. Using a new Array()
It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array() case is almost as good as the raw array, just a few percent slower.

how performance differs in loop declarations

how these two for loops differs in performance and why? what is the best way to iterate
var letters=
['a','b','c','d','e','f','','','',''];
var start=new Date()
for(var i=0,abc=letters.length;i<abc;i++){
alert(letters[i]);
}
var end=new Date()
alert(end-start)
var letters1=['a','b','c','d','e','f','','','',''];
var start1=new Date()
for(var i=0;i<letters1.length;i++){
alert(letters1[i]);
}
var end1=new Date()
alert(end1-start1);
as i mentioned in comment testing with alerts is absolute nonsense because it pauses the loop
you can compare js performance for example on http://jsperf.com/
heres the result:
UPDATE: made some better testing
snippet 1 seems to be better becau it caches the result of letters.length
heres the test: http://jsperf.com/testing-perf-fo-for-loop
In old browsers, caching the limit of a loop when iterating over objects like arrays was nearly always much faster than getting the limit (e.g. array.length) every time. In modern browsers, not so much.
Caching usually only has a significant performance benefit if the object being iterated is a type that the script engine knows might change during iteration, but doesn't.
For example, if iterating over a live NodeList:
var nodeList = document.getElementsByTagName('*');
for (var i=0, iLen=nodeList.length; i<iLen; i++) {
// do stuff
}
is measurably (if not noticeably) faster in browsers than:
for (var i=0; i<nodeList.length; i++) {
// do stuff
}
However, as others have noted, the work done in the loop may be far more significant so that saving a few cycles in the test is not useful. But for some, their standard loop pattern includes caching the limit if it isn't modified in the loop and do it always.
BTW, for measuring time in a browsers there is the High Resolution Time specification (also see MDN Peformance.now).

Does Javascript have an enhanced for loop syntax similar to Java's

I am wondering if JavaScript has an enhanced for loop syntax that allows you to iterate over arrays. For example, in Java, you can simply do the following:
String[] array = "hello there my friend".split(" ");
for (String s : array){
System.out.println(s);
}
output is:
hello
there
my
friend
Is there a way to do this in JavaScript? Or do I have to use array.length and use standard for loop syntax as below?
var array = "hello there my friend".split(" ");
for (i=0;i<array.length;i++){
document.write(array[i]);
}
JavaScript has a foreach-style loop (for (x in a)), but it is extremely bad coding practice to use it on an Array. Basically, the array.length approach is correct. There is also a a.forEach(fn) method in newer JavaScripts you can use, but it is not guaranteed to be present in all browsers - and it's slower than the array.length way.
EDIT 2017: "We'll see how it goes", indeed. In most engines now, .forEach() is now as fast or faster than for(;;), as long as the function is inline, i.e. arr.forEach(function() { ... }) is fast, foo = function() { ... }; arr.forEach(foo) might not be. One might think that the two should be identical, but the first is easier for the compiler to optimise than the second.
Belated EDIT 2020: There is now for (const item of iterable), which solves the downsides of using for (item in iterable).
Using the latest versions of JavaScript available to most modern browsers, you can do this:
array.forEach(function(x){
document.write(x);
});
Details are at https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/forEach. If you're worried that a browser may not have support for this, you can add it yourself, using a (hopefully minified) version of the implementation that they have listed under "Compatibility".
This is a bit outdated, but this is a minified compatibility version of forEach that I derived from Mozilla's page a few years ago:
if(!Array.prototype.forEach){Array.prototype.forEach=function(b){if(typeof b!="function"){throw new TypeError()}var a=this.length,d=arguments[1],c;for(c=0;c<a;c++){if(c in this){b.call(d,this[c],c,this)}}}};
I've never run into any issues with this, but the implementation on Mozilla's page has since been expanded with some additional checks and code to make it compatible with ECMA-262, Edition 5, 15.4.4.18.
I have a file called common.js that I use and include on all of my pages to include this, as well as all of the other "Array extras" that were introduced with JavaScript 1.6, as listed at https://developer.mozilla.org/en/JavaScript/New_in_JavaScript/1.6#Array_extras. (I've been meaning to get this updated and published for public use.)
This may not be the fastest approach (see http://jsperf.com/for-vs-foreach/15 for some specifics - thanks for the link, Amadan) - but there is something to be said for conciseness and maintainability, etc. Additionally, it'll be very interesting to see how much of this disparity is optimized away by further JavaScript engine improvements over the next few months and years. :-)
In ES2015(ES6), you can use the for-of loop. It's supported in most browser with the exception of IE.
let array = [10, 20, 30];
for (let value of array) {
console.log(value);
}
See the Mozilla explanation here
You can do for(s in array), but be careful, it's not the same as a foreach.
In this case s is the key (index), not the value. You also need to use hasOwnProperty because in loops though the object's prototype also.
for(s in array){
if(array.hasOwnProperty(s)){
console.log(array[s]);
}
}
EDIT: As #Amadan pointed out, hasOwnProperty does iterate properties when they're added like this: array.test = function(){}. I suggest not using for...in.
EDIT2: If your using a modern web browser (anything that isn't IE < 9), you can use Array.forEach). #ziesemer points out that Mozilla has a shim for this if you need to support IE < 9.
array.forEach(function(s){
console.log(s);
});
NOTE: Personally I use jQuery for my JavaScript projects, and I use $.each.
$.each(array, function(i,s){
console.log(s);
});
There's the "forEach" method on the Array prototype in newer JavaScript engines. Some libraries extend the prototype themselves with a similar method.
Try this:
var errorList = new Array();
errorList.push("e1");
errorList.push("e2");
for (var indx in errorList) {
alert(errorList[indx]);
}
x = [1,2,3];
for (i in x) {
console.log(i);
}

Is it faster access an javascript array directly?

I was reading an article: Optimizing JavaScript for Execution Speed
And there is a section that says:
Use this code:
for (var i = 0; (p = document.getElementsByTagName("P")[i]); i++)
Instead of:
nl = document.getElementsByTagName("P");
for (var i = 0; i < nl.length; i++)
{
p = nl[i];
}
for performance reasons.
I always used the "wrong" way, according the article, but, am I wrong or is the article wrong?
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
--Donald Knuth
Personally i would use your way because it is more readable and easier to maintain. Then i would use a tool such as YSlow to profile the code and iron out the performance bottlenecks.
If you look at it from a language like C#, you'd expect the second statement to be more efficient, however C# is not an interpreter language.
As the guide states: your browser is optimized to retrieved the right Nodes from live lists and does this a lot faster than retrieving them from the "cache" you define in your variable. Also you have to determine the length each iteration, with might cause a bit of performance loss as well.
Interpreter languages react differently from compiled languages, they're optimized in different ways.
Good question - I would assume not calling the function getElementsByTagName() every loop would save time. I could see this being faster - you aren't checking the length of the array, just that the value got assigned.
var p;
var ps = document.getElementsByTagName("P");
for (var i = 0; (p=ps[i]); i++) {
//...
}
Of course this also assumes that none of the values in your array evaluate to "false". A numeric array that may contain a 0 will break this loop.
Interesting. Other references do seem to back up the idea that NodeLists are comparatively heavyweight.
The question is ... what's the overhead? Is it enough to bother about? I'm not a fan of premature optimisation. However this is an interesting case, because it's not just the cost of iteration that's affected, there's extra overhead as the NodeList must be kept in synch with any changes to the DOM.
With no further evidence I tend to believe the article.
No it will not be faster. Actually it is nearly totally nonsense since for every step of the for loop, you call the "getElementsByTagName" which, is a time consuming function.
The ideal loop would be as follows:
nl = document.getElementsByTagName("P");
for (var i = nl.length-1; i >= 0; i--)
{
p = nl[i];
}
EDIT:
I actually tested those two examples you have given in Firebug using console.time and as everyone have though the first one took 1ms whereas the second one took 0ms =)
It makes sense to just assign it directly to the variable. Probably quicker to write as well. I'd say that the article might have some truth to it.
Instead of having to get the length everytime, check it against i, and then assign the variable, you simply check if p was able to be set. This cleans up the code and probably is actually faster.
I typically do this (move the length test outside the for loop)
var nl = document.getElementsByTagName("p");
var nll = nl.length;
for(var i=0;i<nll;i++){
p = nl[i];
}
or for compactness...
var nl = document.getElementsByTagName("p");
for(var i=0,nll=nl.length;i<nll;i++){
p = nl[i];
}
which presumes that the length doesn't change during access (which in my cases doesn't)
but I'd say that running a bunch of performance tests on the articles idea would be the definitive answer.
The author of the article wrote:
In most cases, this is faster than caching the NodeList. In the second example, the browser doesn't need to create the node list object. It needs only to find the element at index i at that exact moment.
As always it depends. Maybe it depends on the number of elements of the NodeList.
For me this approach is not safe if the number of elemets can change, this could cause and index out of bound.
From the article you link to:
In the second example, the browser doesn't need to create the node list object. It needs only to find the element at index i at that exact moment.
This is nonsense. In the first example, the node list is created and a reference to it is held in a variable. If something happens which causes the node list to change - say you remove a paragraph - then the browser has to do some work to update the node list. However, if your code doesn't cause the list to change, this isn't an issue.
In the second example, far from not needing to create the node list, the browser has to create the node list every time through the loop, then find the element at index i. The fact that a reference to the node list is never assigned to a variable doesn't mean the list doesn't have to be created, as the author seems to think. Object creation is expensive (no matter what the author says about browsers "being optimized for this"), so this is going to be a big performance hit for many applications.
Optimisation is always dependant on the actual real-world usage your application encounters. Articles such as this shouldn't be seen as saying "Always work this way" but as being collections of techniques, any one of which might, in some specific set of circumstances, be of value. The second example is, in my opinion, less easy to follow, and that alone puts it in the realm of tricksy techniques that should only be used if there is a proven benefit in a specific case.
(Frankly, I also don't trust advice offered by a programmer who uses a variable name like "nl". If he's too lazy to use meaningful names when writing a tutorial, I'm glad I don't have to maintain his production code.)
It is worthless to discuss theory when actual tests will be more accurate, and by comparing both the second method was clearly faster.
Here is sample code from a benchmark test
var start1 = new Date().getTime();
for (var j= 0; j < 500000; j++){
for (var i = 0; (p = document.getElementsByTagName("P")[i]); i++);
}
var end1 = new Date().getTime() - start1;
var start2 = new Date().getTime();
for (var j= 0; j < 500000; j++){
nl = document.getElementsByTagName("P");
for (var i = 0; i < nl.length; i++)
{
p = nl[i];
}
}
var end2 = new Date().getTime() - start2;
alert("first:" + end1 + "\nSecond:" + end2);
In chrome the first method was taking 2324ms while the second took 1986ms.
But note that for 500,000 iterations there was a difference only 300ms so I wouldn't bother with this at all.

Categories

Resources