Javascript arrays, sorting, and branch predition - javascript

Edit
After spending several hours on this and working with #pst, it turns out the issue was completely different.
Inside the code, you can see I used a timestamp shortcut of "+new Date()". This returns a timestamp as does the standard "new Date().getTime()".
However, +new Date() performs very, very badly when used with mathematical operations (+,-,/). Although the typeof() for the 'start' variable says 'number', something happens that makes it bloody slow. When using the standard getTime() method, there is no performance penalty when doing the timing subtractions.
Take a look at this jsperf detailing the problem, http://jsperf.com/new-date-timing.
In regards to #pst's very detailed answer and the efforts I made to replicate the linked question, use his answer as the canonical answer to that question.
I am going to change the title of this question to accurately reflect #pst's answer and my original intent, but will leave the original title and question for future reference.
New Question
Do javascript arrays utilize branch prediction on arrays of random sorted and unsorted data?
See #pst's answer below.
Original Title and Question below
Title: Array iterations taking 2x as long on the same data
I was looking at this question earlier, Why is it faster to process a sorted array than an unsorted array?, and wanted to try setting up the same test in javascript.
This has lead me to something unexpected. In the tests linked in the following fiddle, simply iterating over the same array of randomly generated number values with the same code results in vastly different response times.
I've tested this in Chrome, Safari, Firefox, and node.js.
In Chrome & node, the first iteration is faster than the 2nd iteration. In Safari and Firefox, the first iteration is slower than the 2nd iteration.
Here's the fiddle, http://jsfiddle.net/9QbWB/6/
In the linked fiddle, I've disabled sorting (thinking that was originally the issue, but it wasn't). Sorting the data made the looping even longer.
I've gone through the code pretty thoroughly to make sure I've removed anything that could be affecting the results. I feel like a particular set of scientists announcing FTL neutrinos, where I can't find a problem in my experiment and the data is unexpected.
In the code I'm including below, a few things like initial setTimeout, jQuery DOM visualizations, etc are for displaying data visually in jsfiddle.net. The core functions are the same.
//javascript test of https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array
setTimeout(function(){
var unsorted_list = $('#unsorted'),
sorted_list = $('#sorted');
var length = 32768,
data = [];
for(var i=0; i<length; i++){
data[i] = Math.floor( Math.random()*256);
}
var test = function(){
var sum = 0,
start = +new Date();
for(var i=0; i<1000; i++){
for(var c=0; c<length; c++){
if( data[c] >= 128 ){
//sum += data[c];
}
}
}
return +new Date() - start;
}
//Unsorted
var results = 0;
for( var i=0; i<10; i++){
var x = test();
console.log(x);
unsorted_list.append('<div>'+x+'</div>');
results += x;
}
unsorted_list.append('<div>Final:'+(results/10)+'</div>');
console.log( 'Unsorted: ', results/10 );
//Sort array
//data.sort();
//Sorted
var results = 0;
for( var i=0; i<10; i++){
var x = test();
console.log(x);
sorted_list.append('<div>'+x+'</div>');
results += x;
}
sorted_list.append('<div>Final:'+(results/10)+'</div>');
console.log( 'Sorted: ', results/10 );
},5000);

(Apparently this answer misses the question - left here for related reasons.)
Here is my jsperf case - http://jsperf.com/sorted-loop/2 - perhaps enough something will be revealed with more browsers. I have also included a test case using only bit-wise operations, as taken from the linked post (and I have not verified the validity in JS).
CONCLUSION: the performance appears to be related to branch prediction.
The "+bitwise" test pair that does not use a condition are equivalent in speed (for the same browser) across all major browser runs. (Chrome is just faster than FF which just faster than IE at bit-wise operations; see other SO posts.)
The "+cond" test pair that uses the condition is greatly affected and the sorted data is highly favored. This is the result that would be expected if branch prediction is a factor.

Related

Performance of array includes vs mapping to an Object and accessing it in JavaScript

According to the fundamentals of CS
the search functionality of an unsorted list has to occur in O(n) time where as direct access into an array will occur in O(1) time for HashMaps.
So is it more performant to map an array into a dictionary and then access the element directly or should I just use includes? This question is specifically for JavaScript because I believe this would come down to core implementation details of how includes() and {} is implemented.
let y = [1,2,3,4,5]
y.includes(3)
or...
let y = {
1: true,
2: true
3: true
4: true
5: true
}
5 in y
It's true that object lookup occurs in constant time - O(1) - so using object properties instead of an array is one option, but if you're just trying to check whether a value is included in a collection, it would be more appropriate to use a Set, which is a (generally unordered) collection of values, which can also be looked up in linear time. (Using a plain object instead would require you to have values in addition to your keys, which you don't care about - so, use a Set instead.)
const set = new Set(['foo', 'bar']);
console.log(set.has('foo'));
console.log(set.has('baz'));
This will be useful when you have to look up multiple values for the same Set. But, adding items to the Set (just like adding properties to an object) is O(N), so if you're just going to look up a single value, once, there's no benefit to this nor the object technique, and you may as well just use an array includes test.
Updated 04/29/2020
As the commenter rightly pointed out it would seem V8 was optimizing out the array includes calls. An updated version that assigns to a var and uses it produces more expected results. In that case Object address is fastest, followed by Set has and in a distant third is Array includes (on my system / browser).
All the same, I do stand by my original point, that if making micro-optimizations it is worth testing assumptions. Just make sure your tests are valid ;)
Original
Well. Despite the obvious expectation that Object address and Set has should outperform Array includes, benchmarks against Chrome indicate that implementation trumps expectation.
In the benches I ran against Chrome Array includes was far and away the best performer.
I also tested locally with Node and got more expected results. In that Object address wins, followed closely by Set has, then Array includes was marginally slower than both.
Bottom line is, if you're making micro-optimizations (not recommending that) it's worth benchmarking rather than assuming which might be best for your particular case. Ultimately it comes down to the implementation, as your question implies. So optimizing for the target platform is key.
Here's the results I got:
Node (12.6.0):
ops for Object address 7804199
ops for Array includes 5200197
ops for Set has 7178483
Chrome (75.0):
https://jsbench.me/myjyq4ixs1/1
This isn't necessarily a direct answer to the question but here is a related performance test I ran real quick in my chrome dev tools
function getRandomInt(max) {
return Math.floor(Math.random() * max);
}
var arr = [1,2,3];
var t = performance.now();
for (var i = 0; i < 100000; i++) {
var x = arr.includes(getRandomInt(3));
}
console.log(performance.now() - t);
var t = performance.now();
for (var i = 0; i < 100000; i++) {
var n = getRandomInt(3);
var x = n == 1 || n == 2 || n == 3;
}
console.log(performance.now() - t);
VM44:9 9.100000001490116
VM44:16 5.699999995529652
I find the array includes syntax nice to look at, so I wanted to know if the performance was likely to be an issue the way I use it, for checking if a variable is one of a set of enums for instance. It doesn't seem to be much of an impact for situations like this with a short list. Then I ran this.
function getRandomInt(max) {
return Math.floor(Math.random() * max);
}
var t = performance.now();
for (var i = 0; i < 100000; i++) {
var x = [1,2,3].includes(getRandomInt(3));
}
console.log(performance.now() - t);
var t = performance.now();
for (var i = 0; i < 100000; i++) {
var n = getRandomInt(3);
var x = n == 1 || n == 2 || n == 3;
}
console.log(performance.now() - t);
VM83:8 12.600000001490116
VM83:15 4.399999998509884
and so the way I actually use it and like lookin at it is quite worse with performance, despite still not being very significant unless run a few million times, so using it inside of an Array.filter that may run a lot as a react redux selector may not be a great idea like I was about to do when I decided to test this.

How do I use a typed array with offset in node?

I am writing a mat file parser using jBinary, which is built on top of jDataView. I have a working parser with lots of tests, but it runs very slowly for moderately sized data sets of around 10 MB. I profiled with look and found that a lot of time is spent in tagData. In the linked tagData code, ints/uints/single/doubles/whatever are read one by one from the file and pushed to an array. Obviously, this isn't super-efficient. I want to replace this code with a typed array view of the underlying bytes to remove all the reading and pushing.
I have started to migrate the code to use typed arrays as shown below. The new code preserves the old functionality for all types except 'miINT8'. The new functionality tries to view the buffer b starting at offset s and with length l, consistent with the docs. I have confirmed that the s being passed to the Int8Array constructor is non-zero, even going to far as to hard code it to 5. In all cases, the output of console.log(elems.byteOffset) is 0. In my tests, I can see that the Int8Array is indeed starting from the beginning of the buffer and not at offset s as I intend.
What am I doing wrong? How do I get the typed array to start at position s instead of position 0? I have tested this on node.js version 10.25 as well as 12.0 with the same results in each case. Any guidance appreciated as I'm totally baffled by this one!
tagData: jBinary.Template({
baseType: ['array', 'type'],
read: function (ctx) {
var view = this.binary.view
var b = view.buffer
var s = view.tell()
var l = ctx.tag.numBytes
var e = s + l
var elems
switch (ctx.tag.type) {
case 'miINT8':
elems = new Int8Array(b,s,l); view.skip(l); console.log(elems.byteOffset); break;
default:
elems = []
while (view.tell() < e && view.tell() < view.byteLength) {
elems.push(this.binary.read(ctx.tag.type))
}
}
return elems
}
}),

Measure what part of a loop that is slow?

I'm looping through a dataset with a couple of thousand items in it like this.
users.forEach(function(user){
//List ALLTHETHINGS!
listAllEverything(user)
//Add gropings
user.groupings = groupings.filter(function(grouping){
return grouping.conditional(user)
})
//Add conversions to user, one per user.
user.conversions = {}
//for each conversionList
conversionLists.forEach(function(conversionList){
user.conversions[conversionList.name] = [];
//for each conversion
for (var i = conversionList.conversions.length - 1; i >= 0; i--) {
var conversion = conversionList.conversions[i]
//test against each users apilog
var converted = user.apilog.some(function(event){
return conversion.conditional(event);
})
if (converted){
//Lägg till konverteringen och alla konverteringar som kommer innan.
for (var i = i; i >= 0; i--){
user.conversions[conversionList.name].push(conversionList.conversions[i])
}
break;
}
};
})
})
I know this is not the most optimized code and I have some ideas how it can be improved. But i'm pretty new to these kinds of problems, so I'm not sure how I should prioritize. I know about console.time, which is useful but I want to use something that allows me to compound the time spent on each part of the forEach-loop, either a tool (I usually use chrome) or some debugging-method. Perferably something that doesn't effect the performance too much.
Since you are using Chrome you shoud check out the Timeline tab in your browsers DevTools - just hit the record button before running the loop and stop it once it's done. You will se a nice breakdown of everything that just happened and you will be mostly interested in yellow bars - they show JavaScript operations.
Please check out this video presentation by Paul Irish about Performance Tooling
As you know, in Chrome or Firefox you can just wrap a piece of code with console.time (and console.timeEnd) and it will measure the speed of particular operation and print it in the console.
For example: to measure the time it takes for an entire loop to execute use:
console.time('For loop benchmark');
for(i=0; i<1000; i++) {
// do some operations here
}
console.timeEnd('For loop benchmark');
But, if you want to measure each iteration you can parameterize the name of the log inside the loop so that you can name each specific operation the way you want:
for(i=0; i<1000; i++)
var consoleTimeName = 'Measuring iteration no '+i+' which is doing this and that...';
console.time(consoleTimeName);
// do some operations here
console.timeEnd(consoleTimeName);
}
Using it you can see for yourself how much faster simple for loop can be in comparsion to jQuery's $.each loop.
You can find more about this on developer.mozilla.org and developer.chrome.com. Please not that this is note a standarized, cross-browser compatibile feature and you should not be using it on a production website, since some browser like IE may throw you an error when they see it.

Javascript: TypeError variable is undefined

I am currently building a small web application with similar functionality across all modules. I want to code small generic functions so that all programmers next to me, call these functions and these functions return necessary but important data for them to implement their functionality. In this example, I am trying to deal with the typical "choose true or false" exercise. So from the template.php they call this function:
function checkAnswers(){
var radiobuttons = document.form1.exer1;
var correctAnswers = answers(); //this is an array of string
var checkedAnswers = checkExerciseRB(radiobuttons, 2, correctAnswers);
for(i=0; i<checkedAnswers.length; i++){
alert(checkedAnswers[i]);
}
}
Function checkExerciseRB is my generic function, it is called from checkAnswers.
function checkExerciseRB(rbuttons, opciones, correct){
var answers = new Array();
var control = 0;
for(i=0; i<rbuttons.length; i++){
var noPick="true";
for(j=0; j<opciones; j++){
if(rbuttons[control+j].checked){
if(rbuttons[control+j].value==correct[i]){
answers[i]= 1;
noPick="false";
break;
}
else{
answers[i]=2;
noPick="false";
break;
}
}
}
if(noPick=="true")
answers[i]=0;
control=control+opciones;
}
return answers;
}
It works great but while looking at my favorite browsers (FireFox, Chrome) error log it says:
TypeError: rbuttons[control + j] is undefined
Any clue on how to deal with this matter?
This probably means that control + j is greater than or equal to the length of the array rbuttons. There's no such array element as rbuttons[control + j].
You should learn how to use the JavaScript debugger in your favorite browsers! Debuggers are great. They let you watch this code run, line by line, as fast or as slow as you want, and watch how the value of control changes as you go.
You’ll watch it, and you’ll think “Oh! That line of code is wrong!”
You're looping through rbuttons.length times, but in each loop you're adding 2 to control. Using control to index your array, you're going to run past the end.
Does the index specified by control + j exist in the array? i.e: If that evaluates to 4, is there at least 5 items in the array?
Also, you should be using var i, var j, etc inside your for loop. Without it your variables are leaking into the scope this code is executed in (most likely the global scope, and that's not good) :)

Javascript: Is the length method efficient?

i'm doing some javascript coding and I was wondering if the length method is "precomputed", or remembered by the JS engine.
So, the question is:
If I'm checking really often for an array length, and supposing that i'm not changing it (making it immutable through a closure), should I precompute the length method and store it in some variable?
Thanks!
As always, the answer is "it depends".
Let's test native arrays with a million-element array:
for (var i = 0; i < arr.length; i++);
var len=arr.length;
for (var i = 0; i < len; i++);
http://josh3736.net/images/arrlen.png
Chrome and Firefox optimize the property accessor to be as efficient as copying the length to a local variable. IE and Opera do not, and are 50%+ slower.
However, keep in mind that the test results' "ops/second" means number of complete iterations through an array of one million elements per second.
To put this in perspective, even in IE8 (the worst performer in this bunch)—which scored .44 and 3.9 on property access and local variable (respectively)—the per-iteration penalty was a scant 2 µs. Iterating over a thousand items, using array.length will only cost you an extra 2 ms. In other words: beware premature optimization.
The length of an actual array is not computed on the fly. It's stored as part of the array data structure so accessing it involves no more work than just fetching the value (there is no computation). As such, it will generally be as fast as retrieving any fixed property of an object. As you can see in this performance test, there is basically no difference between retrieving the length of an array and retrieving a property of an object:
http://jsperf.com/length-comparisons
An exception to this is the nodeList objects that the DOM returns from functions like getElementsByTagName() or getElementsByClassName(). In these, it is often much slower to access the length property. This is probably because these nodeList objects are not true javascript objects and there may be a bridge between Javascript and native code that must be crossed each time something is accessed from these objects. In this case, it would be a LOT faster (10-100x faster) to cache the length into a local variable rather than use it repeatedly in a loop off the nodeList. I've added that to the length-comparison and you can see how much slower it is.
In some browsers, it is meaningfully faster to put the length into a local variable and use it from there if you will be referring to it over and over again (like in a loop). Here's the performance graph from the above jsperf test:
All major interpreters provide efficient accessors for the lengths of native arrays, but not for array-like objects like NodeLists.
"Efficient looping in Javascript"
Test / Browser Firefox 2.0 Opera 9.1 Internet Explorer 6
Native For-Loop 155 (ms) 121 (ms) 160 (ms)
...
Improved Native While-Loop 120 (ms) 100 (ms) 110 (ms)
"Efficient JavaScript code" suggests
for( var i = 0; i < document.getElementsByTagName('tr').length; i++ ) {
document.getElementsByTagName('tr')[i].className = 'newclass';
document.getElementsByTagName('tr')[i].style.color = 'red';
...
}
var rows = document.getElementsByTagName('tr');
for( var i = 0; i < rows.length; i++ ) {
rows[i].className = 'newclass';
rows[i].style.color = 'red';
...
}
Neither of these are efficient. getElementsByTagName returns a dynamic object, not a static array. Every time the loop condition is checked, Opera has to reassess the object, and work out how many elements it references, in order to work out the length property. This takes a little more time than checking against a static number.
There's probably a modest speed boost attainable by caching the length in a local variable due to attribute lookup speed. This may or may not be negligible, depending on how the JS engine JITs the code.
See http://jsperf.com/for-loop-caching for a rudimentary JSperf testcase.
For any collection-type object whose length you will not be manipulating (e.g. any immutable collection), it's always a good idea to cache its length for better performance.
var elems = document.getElementsByName("tst");
var elemsLen = elems.length;
var i;
for(i = 0; i < elemsLen; ++i)
{
// work with elems... example:
// elems[i].selected = false;
}
elems = [10,20,30,40,50,60,70,80,90,100];
elemsLen = elems.length;
for(i = 0; i < elemsLen; ++i)
{
// work with elems... example:
// elems[i] = elems[i] / 10;
}

Categories

Resources