I have been playing around with the Riemann zeta function.
I want to optimize execution time as much as possible here so I put the intermediary results in temporary variables. But testing revealed that I get no performance boost from this. At least not noticeably.
function zeta(z, limit){
var zres = new Complex(0, 0);
for(var x = 1; x <= limit; x++){
var ii = z.imaginary * Math.log(1/x);
var pp = Math.pow(1/x, z.real);
zres.real += pp * Math.cos(ii);
zres.imaginary += pp * Math.sin(ii);
}
return zres;
}
My question is: Even though I couldn't measure a difference in execution time, what's theoretically faster? Calculating ii and pp once and handing them over as variables, or calculating them twice and not wasting time with the declaration?
Putting things in (local) variables on its own will usually not have a major effect on performance. If anything it could increase the pressure on the register allocator (or equivalent) and slightly reduce performance.
Avoiding calculating expressions multiple times by putting the result into a local variable can improve performance if the just-in-time compiler (or runtime) isn't smart enough to do the equivalent optimization (i.e. compute the value only once and re-use the computation result each times the expression is used).
There's really no universally applicable rule here. You need to benchmark and optimize on the specific system you want best performance on.
Instanciating a variable is surely faster then a math operation (like Math.log or Math.pow) so it is better to instantiate them. If you want to prevent the local scope of the for to waste some very little less time due to the variable initializzation and collection you can declare pp and ii out of the loop. This is a quite ridiculous time in respect of all the other operations.
function zeta(z, limit){
var zres = new Complex(0, 0);
var ii, pp;
for(var x = 1; x <= limit; x++){
ii = z.imaginary * Math.log(1/x);
pp = Math.pow(1/x, z.real);
zres.real += pp * Math.cos(ii);
zres.imaginary += pp * Math.sin(ii);
}
return zres;
}
Related
I'm optimizing the compiler of a language to JavaScript, and found a very interesting, if not frustrating, case:
function add(n,m) {
return n === 0 ? m : add(n - 1, m) + 1;
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
s += add(4000, 4000);
}
console.log(s);
It takes 2.3s to complete on my machine[1]. But if I make a very small change:
function add(n,m) {
return (() => n === 0 ? m : add(n - 1, m) + 1)();
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
s += add(4000, 4000);
}
console.log(s);
It completes in 1.1s. Notice the only difference is the addition of an immediately invoked lambda, (() => ...)(), around the return of add. Why does this added call make my program two times faster?
[1] MacBook Pro 13" 2020, 2.3 GHz Quad-Core Intel Core i7, Node.js v15.3.0
Interesting! From looking at the code, it seems fairly obvious that the IIFE-wrapped version should be slower, not faster: in every loop iteration, it creates a new function object and calls it (which the optimizing compiler will eventually avoid, but that doesn't kick in right away), so generally just does more work, which should be taking more time.
The explanation in this case is inlining.
A bit of background: inlining one function into another (instead of calling it) is one of the standard tricks that optimizing compilers perform in order to achieve better performance. It's a double-edged sword though: on the plus side, it avoids calling overhead, and can often enable further optimizations, such as constant propagation, or elimination of duplicate computation (see below for an example). On the negative side, it causes compilation to take longer (because the compiler does more work), and it causes more code to be generated and stored in memory (because inlining a function effectively duplicates it), and in a dynamic language like JavaScript where optimized code typically relies on guarded assumptions, it increases the risk of one of these assumptions turning out to be wrong and a large amount of optimized code having to be thrown away as a result.
Generally speaking, making perfect inlining decisions (not too much, not too little) requires predicting the future: knowing in advance how often and with which parameters the code will be executed. That is, of course, impossible, so optimizing compilers use various rules/"heuristics" to make guesses about what might be a reasonably good decision.
One rule that V8 currently has is: don't inline recursive calls.
That's why in the simpler version of your code, add will not get inlined into itself. The IIFE version essentially has two functions calling each other, which is called "mutual recursion" -- and as it turns out, this simple trick is enough to fool V8's optimizing compiler and make it sidestep its "don't inline recursive calls" rule. Instead, it happily inlines the unnamed lambda into add, and add into the unnamed lambda, and so on, until its inlining budget runs out after ~30 rounds. (Side note: "how much gets inlined" is one of the somewhat-complex heuristics and in particular takes function size into account, so whatever specific behavior we see here is indeed specific to this situation.)
In this particular scenario, where the involved functions are very small, inlining helps quite a bit because it avoids call overhead. So in this case, inlining gives better performance, even though it is a (disguised) case of recursive inlining, which in general often is bad for performance. And it does come at a cost: in the simple version, the optimizing compiler spends only 3 milliseconds compiling add, producing 562 bytes of optimized code for it. In the IIFE version, the compiler spends 30 milliseconds and produces 4318 bytes of optimized code for add. That's one reason why it's not as simple as concluding "V8 should always inline more": time and battery consumption for compiling matters, and memory consumption matters too, and what might be acceptable cost (and improve performance significantly) in a simple 10-line demo may well have unacceptable cost (and potentially even cost overall performance) in a 100,000-line app.
Now, having understood what's going on, we can get back to the "IIFEs have overhead" intuition, and craft an even faster version:
function add(n,m) {
return add_inner(n, m);
};
function add_inner(n, m) {
return n === 0 ? m : add(n - 1, m) + 1;
}
On my machine, I'm seeing:
simple version: 1650 ms
IIFE version: 720 ms
add_inner version: 460 ms
Of course, if you implement add(n, m) simply as return n + m, then it terminates in 2 ms -- algorithmic optimization beats anything an optimizing compiler could possibly accomplish :-)
Appendix: Example for benefits of optimization. Consider these two functions:
function Process(x) {
return (x ** 2) + InternalDetail(x, 0, 2);
}
function InternalDetail(x, offset, power) {
return (x + offset) ** power;
}
(Obviously, this is silly code; but let's assume it's a simplified version of something that makes sense in practice.)
When executed naively, the following steps happen:
evaluate temp1 = (x ** 2)
call InternalDetail with parameters x, 0, 2
evaluate temp2 = (x + 0)
evaluate temp3 = temp2 ** 2
return temp3 to the caller
evaluate temp4 = temp1 + temp3
return temp4.
If an optimizing compiler performs inlining, then as a first step it will get:
function Process_after_inlining(x) {
return (x ** 2) + ( (x + 0) ** 2 );
}
which allows two simplifications: x + 0 can be folded to just x, and then the x ** 2 computation occurs twice, so the second occurrence can be replaced by reusing the result from the first:
function Process_with_optimizations(x) {
let temp1 = x ** 2;
return temp1 + temp1;
}
So comparing with the naive execution, we're down to 3 steps from 7:
evaluate temp1 = (x ** 2)
evaluate temp2 = temp1 + temp1
return temp2
I'm not predicting that real-world performance will go from 7 time units to 3 time units; this is just meant to give an intuitive idea of why inlining can help reduce computational load by some amount.
Footnote: to illustrate how tricky all this stuff is, consider that replacing x + 0 with just x is not always possible in JavaScript, even when the compiler knows that x is always a number: if x happens to be -0, then adding 0 to it changes it to +0, which may well be observable program behavior ;-)
So I was curious what would be faster for iterating through an array, the normal for loop or forEach so I executed this code in the console:
var arr = [];
arr.length = 10000000;
//arr.fill(1);
for (var i_1 = 0; i_1 < arr.length; i_1++) {
arr[i_1] = 1;
}
//////////////////////////////////
var t = new Date();
var sum = 0;
for (var i = 0; i < arr.length; i++) {
var a = arr[i];
if (a & 1) {
sum += a;
}
else {
sum -= a;
}
}
console.log(new Date().getTime() - t.getTime());
console.log(sum);
t = new Date();
sum = 0;
arr.forEach(function (value, i, aray) {
var a = value;
if (a & 1) {
sum += a;
}
else {
sum -= a;
}
});
console.log(new Date().getTime() - t.getTime());
console.log(sum);
Now the results in Chrome are 49ms for the for loop, 376ms for the forEach loop. Which is ok but the results in Firefox and IE (and Edge) are a lot different.
In both other browsers the first loop takes ~15 seconds (yes seconds) while the forEach takes "only" ~4 seconds.
My question is can someone tell me the exact reason Chrome is so much faster?
I tried all kinds of operations inside the loops, the results were always in favor for Chrome by a mile.
Disclaimer: I do not know the specifics of V8 in Chrome or the interpreter of Firefox / Edge, but there are some very general insights. Since V8 compiles Javascript to native code, let's see what it potentially could do:
Very crudely: variables like your var i can be modelled as a very general Javascript variable, so that it can take any type of value from numbers to objects (modelled as a pointer to a struct Variable for instance), or the compiler can deduce the actual type (say an int in C++ for instance) from your JS and compile it like that. The latter uses less memory, exploits caching, uses less indirection, and can potentially be as fast as a for-loop in C++. V8 probably does this.
The above holds for your array as well: maybe it compiles to a memory efficient array of ints stored contiguously in memory; maybe it is an array of pointers to general objects.
Temporary variables can be removed.
The second loop could be optimized by inlining the function call, maybe this is done, maybe it isn't.
The point being: all JS interpreters / compilers can potentially exploit these optimizations. This depends on a lot of factors: the trade-off between compilation and execution time, the way JS is written, etc.
V8 seems to optimize a lot, Firefox / Edge maybe don't in this example. Knowing why precisely requires in-depth understanding of the interpreter / compiler.
For loop is the afastest when compared to other iterators in every browser. But when comparing browsers ie is the slowest in iteration of for loops. Go and try jsperf.com for optimization is going to be my best recommendation. V8 engine implementation is the reason. After chrome split from webkit it stripped off more than 10k line of code in first few days.
Out of curiosity I wrote some trivial benchmarks comparing the performance of golang maps to JavaScript (v8/node.js) objects used as maps and am surprised at their relative performance. JavaScript objects appear to perform roughly twice as fast as go maps (even including some minor performance edges for go)!
Here is the go implementation:
// map.go
package main
import "fmt"
import "time"
func elapsedMillis(t0, t1 time.Time) float64 {
n0, n1 := float64(t0.UnixNano()), float64(t1.UnixNano())
return (n1 - n0) / 1e6
}
func main() {
m := make(map[int]int, 1000000)
t0 := time.Now()
for i := 0; i < 1000000; i++ {
m[i] = i // Put.
_ = m[i] + 1 // Get, use, discard.
}
t1 := time.Now()
fmt.Printf("go: %fms\n", elapsedMillis(t0, t1))
}
And here is the JavaScript:
#!/usr/bin/env node
// map.js
function elapsedMillis(hrtime0, hrtime1) {
var n0 = hrtime0[0] * 1e9 + hrtime0[1];
var n1 = hrtime1[0] * 1e9 + hrtime1[1];
return (n1 - n0) / 1e6;
}
var m = {};
var t0 = process.hrtime();
for (var i=0; i<1000000; i++) {
m[i] = i; // Put.
var _ = m[i] + 1; // Get, use, discard.
}
var t1 = process.hrtime();
console.log('js: ' + elapsedMillis(t0, t1) + 'ms');
Note that the go implementation has a couple of minor potential performance edges in that:
Go is mapping integers to integers directly, whereas JavaScript will convert the integer keys to string property names.
Go makes its map with initial capacity equal to the benchmark size, whereas JavaScript is growing from its default capacity).
However, despite the potential performance benefits listed above the go map usage seems to perform at about half the rate of the JavaScript object map! For example (representative):
go: 128.318976ms
js: 48.18517ms
Am I doing something obviously wrong with go maps or somehow comparing apples to oranges?
I would have expected go maps to perform at least as well - if not better than JavaScript objects as maps. Is this just a sign of go's immaturity (1.4 on darwin/amd64) or does it represent some fundamental difference between the two language data structures that I'm missing?
[Update]
Note that if you explicitly use string keys (e.g. via s := strconv.Itoa(i) and var s = ''+i in Go and JavaScript, respectively) then their performance is roughly equivalent.
My guess is that the very high performance from v8 is related to a specific optimization in that runtime for objects whose keys are consecutive integers (e.g. by substituting an array implementation instead of a hashtable).
I'm voting to close since there is likely nothing to see here...
Your benchmark is synthetic a bit, just like any benchmarks are. Just for curious try
for i := 0; i < 1000000; i += 9 {
in Go implementation. You may be surprised.
I've try to probe that plus (+) conversion is faster than parseInt with the following jsperf, and the results surprised me:
Parse vs Plus
Preparation code
<script>
Benchmark.prototype.setup = function() {
var x = "5555";
};
</script>
Parse Sample
var y = parseInt(x); //<---80 million loops
Plus Sample
var y = +x; //<--- 33 million loops
The reason is because I'm using "Benchmark.prototype.setup" in order to declare my variable, but I don't understand why
See the second example:
Parse vs Plus (local variable)
<script>
Benchmark.prototype.setup = function() {
x = "5555";
};
</script>
Parse Sample
var y = parseInt(x); //<---89 million loops
Plus Sample
var y = +x; //<--- 633 million loops
Can someone explain the results?
Thanks
In the second case + is faster because in that case V8 actually moves it out of the benchmarking loop - making benchmarking loop empty.
This happens due to certain peculiarities of the current optimization pipeline. But before we get to the gory details I would like to remind how Benchmark.js works.
To measure the test case you wrote it takes Benchmark.prototype.setup that you also provided and the test case itself and dynamically generates a function that looks approximately like this (I am skipping some irrelevant details):
function (n) {
var start = Date.now();
/* Benchmark.prototype.setup body here */
while (n--) {
/* test body here */
}
return Date.now() - start;
}
Once the function is created Benchmark.js calls it to measure your op for a certain number of iterations n. This process is repeated several times: generate a new function, call it to collect a measurement sample. Number of iterations is adjusted between samples to ensure that function runs long enough to give meaningful measurement.
Important things to notice here is that
both your case and Benchmark.prototype.setup are the textually inlined;
there is a loop around the operation you want to measure;
Essentially we discussing why the code below with a local variable x
function f(n) {
var start = Date.now();
var x = "5555"
while (n--) {
var y = +x
}
return Date.now() - start;
}
runs slower than the code with global variable x
function g(n) {
var start = Date.now();
x = "5555"
while (n--) {
var y = +x
}
return Date.now() - start;
}
(Note: this case is called local variable in the question itself, but this is not the case, x is global)
What happens when you execute these functions with a large enough values of n, for example f(1e6)?
Current optimization pipeline implements OSR in a peculiar fashion. Instead of generating an OSR specific version of the optimized code and discarding it later, it generates a version that can be used for both OSR and normal entry and can even be reused if we need to perform OSR at the same loop. This is done by injecting a special OSR entry block into the right spot in the control flow graph.
OSR entry block is injected while SSA IR for the function is built and it eagerly copies all local variables out of the incoming OSR state. As a result V8 fails to see that local x is actually a constant and even looses any information about its type. For subsequent optimization passes x2 looks like it can be anything.
As x2 can be anything expression +x2 can also have arbitrary side-effects (e.g. it can be an object with valueOf attached to it). This prevents loop-invariant code motion pass from moving +x2 out of the loop.
Why is g faster than? V8 pulls a trick here. It tracks global variables that contain constants: e.g. in this benchmark global x always contains "5555" so V8 just replaces x access with its value and marks this optimized code as dependent on the value of x. If somebody replaces x value with something different than all dependent code will be deoptimized. Global variables are also not part of the OSR state and do not participate in SSA renaming so V8 is not confused by "spurious" φ-functions merging OSR and normal entry states. That's why when V8 optimizes g it ends up generating the following IR in the loop body (red stripe on the left shows the loop):
Note: +x is compiled to x * 1, but this is just an implementation detail.
Later LICM would just take this operation and move it out of the loop leaving nothing of interest in the loop itself. This becomes possible because now V8 knows that both operands of the * are primitives - so there can be no side-effects.
And that's why g is faster, because empty loop is quite obviously faster than a non-empty one.
This also means that the second version of benchmark does not actually measure what you would like it to measure, and while the first version did actually grasp some of the differences between parseInt(x) and +x performance that was more by luck: you hit a limitation in V8's current optimization pipeline (Crankshaft) that prevented it from eating the whole microbenchmark away.
I believe the reason is because parseInt looks for more than just a conversion to an integer. It also strips any remaining text off of the string like when parsing a pixel value:
var width = parseInt(element.style.width);//return width as integer
whereas the plus sign could not handle this case:
var width = +element.style.width;//returns NaN
The plus sign does an implicit conversion from string to number and only that conversion. parseInt tries to make sense out of the string first (like with integers tagged with a measurement).
I want to do some speed tests on very basic stuff like variable declaration.
Now i have a function that executes X times to have a more significant time difference.
http://jsfiddle.net/eTbsv/ (you need to open your console & it takes a few seconds to execute)
this is the code:
var doit = 10000000,
i = 0,
i2 = 0;
//testing var with comma
console.time('timer');
function test(){
var a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;
};
while (i<=doit){
test();
i++;
};
console.timeEnd('timer');
//testing individual var declarations
console.time('timer2');
function test2(){
var a; var b; var c; var d; var e; var f; var g; var h; var i; var j; var k; var l; var m; var n; var o; var p; var q; var r; var s; var t; var u; var v; var w; var x; var y; var z;
};
while (i2<=doit){
test();
i2++;
};
console.timeEnd('timer2');
Now i have two questions:
Is this an accurate way of testing the speed of variable declarations?
how could i test more cycles without having firefox to crash? If i set doit to 1000000000 for example, firefox want to stop the script.
why are my results (of my script and in jspref) so different each time? Sometime the individual variable declaration is faster then the grouped :/
edit just made JS Pref testcase: http://jsperf.com/testing-js-variable-declaration-speed would be nice if some of you with different browsers and configuration could participate. But im still interested to know if this way of testing it is accurate.
Is this an accurate way of testing the speed of variable declarations?
It's good enough to get a rough idea, but it's not perfect. Like most things, it relies on the CPU. If the CPU spikes during testing due to another application, such as a virus scanner, or another action from the browser, such as a phishing check, the JavaScript execution can be slowed. Even when the CPU is idle, it's not an exact science and you will have to run it many times to get a good average.
how could i test more cycles without having firefox to crash? If i set doit to 1000000000 for example, firefox want to stop the script.
Firefox limits JavaScript execution to a maximum of 10 seconds. I'm not sure if there's a work around.
why are my results (of my script and in jspref) so different each time? Sometime the individual variable declaration is faster then the grouped :/
Because there's probably no real difference between the two. All variable declarations are "hoisted", and it's likely that this is done at parse-time instead of run-time as an optimization, so the internal representation of the functions after they have been parsed would be identical. The only difference is the subtle factors affecting the time it takes to initialize the undefined variables and execute the otherwise empty functions.
With regard to 2 Interrupting your loop for user input is the only way I can think of to easily stop the unresponsive script dialogs.
So display an alert every n iterations (obviously stop your timer for this duration).
Have you considered doing this in spidermonkey etc or are you specifically interested in the browser implementations?