JsPerf: ParseInt vs Plus conversion - javascript

I've try to probe that plus (+) conversion is faster than parseInt with the following jsperf, and the results surprised me:
Parse vs Plus
Preparation code
<script>
Benchmark.prototype.setup = function() {
var x = "5555";
};
</script>
Parse Sample
var y = parseInt(x); //<---80 million loops
Plus Sample
var y = +x; //<--- 33 million loops
The reason is because I'm using "Benchmark.prototype.setup" in order to declare my variable, but I don't understand why
See the second example:
Parse vs Plus (local variable)
<script>
Benchmark.prototype.setup = function() {
x = "5555";
};
</script>
Parse Sample
var y = parseInt(x); //<---89 million loops
Plus Sample
var y = +x; //<--- 633 million loops
Can someone explain the results?
Thanks

In the second case + is faster because in that case V8 actually moves it out of the benchmarking loop - making benchmarking loop empty.
This happens due to certain peculiarities of the current optimization pipeline. But before we get to the gory details I would like to remind how Benchmark.js works.
To measure the test case you wrote it takes Benchmark.prototype.setup that you also provided and the test case itself and dynamically generates a function that looks approximately like this (I am skipping some irrelevant details):
function (n) {
var start = Date.now();
/* Benchmark.prototype.setup body here */
while (n--) {
/* test body here */
}
return Date.now() - start;
}
Once the function is created Benchmark.js calls it to measure your op for a certain number of iterations n. This process is repeated several times: generate a new function, call it to collect a measurement sample. Number of iterations is adjusted between samples to ensure that function runs long enough to give meaningful measurement.
Important things to notice here is that
both your case and Benchmark.prototype.setup are the textually inlined;
there is a loop around the operation you want to measure;
Essentially we discussing why the code below with a local variable x
function f(n) {
var start = Date.now();
var x = "5555"
while (n--) {
var y = +x
}
return Date.now() - start;
}
runs slower than the code with global variable x
function g(n) {
var start = Date.now();
x = "5555"
while (n--) {
var y = +x
}
return Date.now() - start;
}
(Note: this case is called local variable in the question itself, but this is not the case, x is global)
What happens when you execute these functions with a large enough values of n, for example f(1e6)?
Current optimization pipeline implements OSR in a peculiar fashion. Instead of generating an OSR specific version of the optimized code and discarding it later, it generates a version that can be used for both OSR and normal entry and can even be reused if we need to perform OSR at the same loop. This is done by injecting a special OSR entry block into the right spot in the control flow graph.
OSR entry block is injected while SSA IR for the function is built and it eagerly copies all local variables out of the incoming OSR state. As a result V8 fails to see that local x is actually a constant and even looses any information about its type. For subsequent optimization passes x2 looks like it can be anything.
As x2 can be anything expression +x2 can also have arbitrary side-effects (e.g. it can be an object with valueOf attached to it). This prevents loop-invariant code motion pass from moving +x2 out of the loop.
Why is g faster than? V8 pulls a trick here. It tracks global variables that contain constants: e.g. in this benchmark global x always contains "5555" so V8 just replaces x access with its value and marks this optimized code as dependent on the value of x. If somebody replaces x value with something different than all dependent code will be deoptimized. Global variables are also not part of the OSR state and do not participate in SSA renaming so V8 is not confused by "spurious" φ-functions merging OSR and normal entry states. That's why when V8 optimizes g it ends up generating the following IR in the loop body (red stripe on the left shows the loop):
Note: +x is compiled to x * 1, but this is just an implementation detail.
Later LICM would just take this operation and move it out of the loop leaving nothing of interest in the loop itself. This becomes possible because now V8 knows that both operands of the * are primitives - so there can be no side-effects.
And that's why g is faster, because empty loop is quite obviously faster than a non-empty one.
This also means that the second version of benchmark does not actually measure what you would like it to measure, and while the first version did actually grasp some of the differences between parseInt(x) and +x performance that was more by luck: you hit a limitation in V8's current optimization pipeline (Crankshaft) that prevented it from eating the whole microbenchmark away.

I believe the reason is because parseInt looks for more than just a conversion to an integer. It also strips any remaining text off of the string like when parsing a pixel value:
var width = parseInt(element.style.width);//return width as integer
whereas the plus sign could not handle this case:
var width = +element.style.width;//returns NaN
The plus sign does an implicit conversion from string to number and only that conversion. parseInt tries to make sense out of the string first (like with integers tagged with a measurement).

Related

Why does adding in an immediately invoked lambda make my JavaScript code 2x faster?

I'm optimizing the compiler of a language to JavaScript, and found a very interesting, if not frustrating, case:
function add(n,m) {
return n === 0 ? m : add(n - 1, m) + 1;
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
s += add(4000, 4000);
}
console.log(s);
It takes 2.3s to complete on my machine[1]. But if I make a very small change:
function add(n,m) {
return (() => n === 0 ? m : add(n - 1, m) + 1)();
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
s += add(4000, 4000);
}
console.log(s);
It completes in 1.1s. Notice the only difference is the addition of an immediately invoked lambda, (() => ...)(), around the return of add. Why does this added call make my program two times faster?
[1] MacBook Pro 13" 2020, 2.3 GHz Quad-Core Intel Core i7, Node.js v15.3.0
Interesting! From looking at the code, it seems fairly obvious that the IIFE-wrapped version should be slower, not faster: in every loop iteration, it creates a new function object and calls it (which the optimizing compiler will eventually avoid, but that doesn't kick in right away), so generally just does more work, which should be taking more time.
The explanation in this case is inlining.
A bit of background: inlining one function into another (instead of calling it) is one of the standard tricks that optimizing compilers perform in order to achieve better performance. It's a double-edged sword though: on the plus side, it avoids calling overhead, and can often enable further optimizations, such as constant propagation, or elimination of duplicate computation (see below for an example). On the negative side, it causes compilation to take longer (because the compiler does more work), and it causes more code to be generated and stored in memory (because inlining a function effectively duplicates it), and in a dynamic language like JavaScript where optimized code typically relies on guarded assumptions, it increases the risk of one of these assumptions turning out to be wrong and a large amount of optimized code having to be thrown away as a result.
Generally speaking, making perfect inlining decisions (not too much, not too little) requires predicting the future: knowing in advance how often and with which parameters the code will be executed. That is, of course, impossible, so optimizing compilers use various rules/"heuristics" to make guesses about what might be a reasonably good decision.
One rule that V8 currently has is: don't inline recursive calls.
That's why in the simpler version of your code, add will not get inlined into itself. The IIFE version essentially has two functions calling each other, which is called "mutual recursion" -- and as it turns out, this simple trick is enough to fool V8's optimizing compiler and make it sidestep its "don't inline recursive calls" rule. Instead, it happily inlines the unnamed lambda into add, and add into the unnamed lambda, and so on, until its inlining budget runs out after ~30 rounds. (Side note: "how much gets inlined" is one of the somewhat-complex heuristics and in particular takes function size into account, so whatever specific behavior we see here is indeed specific to this situation.)
In this particular scenario, where the involved functions are very small, inlining helps quite a bit because it avoids call overhead. So in this case, inlining gives better performance, even though it is a (disguised) case of recursive inlining, which in general often is bad for performance. And it does come at a cost: in the simple version, the optimizing compiler spends only 3 milliseconds compiling add, producing 562 bytes of optimized code for it. In the IIFE version, the compiler spends 30 milliseconds and produces 4318 bytes of optimized code for add. That's one reason why it's not as simple as concluding "V8 should always inline more": time and battery consumption for compiling matters, and memory consumption matters too, and what might be acceptable cost (and improve performance significantly) in a simple 10-line demo may well have unacceptable cost (and potentially even cost overall performance) in a 100,000-line app.
Now, having understood what's going on, we can get back to the "IIFEs have overhead" intuition, and craft an even faster version:
function add(n,m) {
return add_inner(n, m);
};
function add_inner(n, m) {
return n === 0 ? m : add(n - 1, m) + 1;
}
On my machine, I'm seeing:
simple version: 1650 ms
IIFE version: 720 ms
add_inner version: 460 ms
Of course, if you implement add(n, m) simply as return n + m, then it terminates in 2 ms -- algorithmic optimization beats anything an optimizing compiler could possibly accomplish :-)
Appendix: Example for benefits of optimization. Consider these two functions:
function Process(x) {
return (x ** 2) + InternalDetail(x, 0, 2);
}
function InternalDetail(x, offset, power) {
return (x + offset) ** power;
}
(Obviously, this is silly code; but let's assume it's a simplified version of something that makes sense in practice.)
When executed naively, the following steps happen:
evaluate temp1 = (x ** 2)
call InternalDetail with parameters x, 0, 2
evaluate temp2 = (x + 0)
evaluate temp3 = temp2 ** 2
return temp3 to the caller
evaluate temp4 = temp1 + temp3
return temp4.
If an optimizing compiler performs inlining, then as a first step it will get:
function Process_after_inlining(x) {
return (x ** 2) + ( (x + 0) ** 2 );
}
which allows two simplifications: x + 0 can be folded to just x, and then the x ** 2 computation occurs twice, so the second occurrence can be replaced by reusing the result from the first:
function Process_with_optimizations(x) {
let temp1 = x ** 2;
return temp1 + temp1;
}
So comparing with the naive execution, we're down to 3 steps from 7:
evaluate temp1 = (x ** 2)
evaluate temp2 = temp1 + temp1
return temp2
I'm not predicting that real-world performance will go from 7 time units to 3 time units; this is just meant to give an intuitive idea of why inlining can help reduce computational load by some amount.
Footnote: to illustrate how tricky all this stuff is, consider that replacing x + 0 with just x is not always possible in JavaScript, even when the compiler knows that x is always a number: if x happens to be -0, then adding 0 to it changes it to +0, which may well be observable program behavior ;-)

Why is the execution time of this function call changing?

Preface
This issue seems to only affect Chrome/V8, and may not be reproducible in Firefox or other browsers. In summary, the execution time of a function callback increases by an order of magnitude or more if the function is called with a new callback anywhere else.
Simplified Proof-of-Concept
Calling test(callback) arbitrarily many times works as expected, but once you call test(differentCallback), the execution time of the test function increases dramatically no matter what callback is provided (i.e., another call to test(callback) would suffer as well).
This example was updated to use arguments so as to not be optimized to an empty loop. Callback arguments a and b are summed and added to total, which is logged.
function test(callback) {
let start = performance.now(),
total = 0;
// add callback result to total
for (let i = 0; i < 1e6; i++)
total += callback(i, i + 1);
console.log(`took ${(performance.now() - start).toFixed(2)}ms | total: ${total}`);
}
let callback1 = (a, b) => a + b,
callback2 = (a, b) => a + b;
console.log('FIRST CALLBACK: FASTER');
for (let i = 1; i < 10; i++)
test(callback1);
console.log('\nNEW CALLBACK: SLOWER');
for (let i = 1; i < 10; i++)
test(callback2);
Original post
I am developing a StateMachine class (source) for a library I'm writing and the logic works as expected, but in profiling it, I've run into an issue. I noticed that when I ran the profiling snippet (in global scope), it would only take about 8ms to finish, but if I ran it a second time, it would take up to 50ms and eventually balloon as high as 400ms. Typically, running the same named function over and over will cause its execution time to drop as the V8 engine optimizes it, but the opposite seems to be happening here.
I've been able to get rid of the problem by wrapping it in a closure, but then I noticed another weird side effect: Calling a different function that relies on the StateMachine class would break the performance for all code depending on the class.
The class is pretty simple - you give it an initial state in the constructor or init, and you can update the state with the update method, which you pass a callback that accepts this.state as an argument (and usually modifies it). transition is a method that is used to update the state until the transitionCondition is no longer met.
Two test functions are provided: red and blue, which are identical, and each will generate a StateMachine with an initial state of { test: 0 } and use the transition method to update the state while state.test < 1e6. The end state is { test: 1000000 }.
You can trigger the profile by clicking the red or blue button, which will run StateMachine.transition 50 times and log the average time the call took to complete. If you click the red or blue button repeatedly, you will see that it clocks in at less than 10ms without issue - but, once you click the other button and call the other version of the same function, everything breaks, and the execution time for both functions will increase by about an order of magnitude.
// two identical functions, red() and blue()
function red() {
let start = performance.now(),
stateMachine = new StateMachine({
test: 0
});
stateMachine.transition(
state => state.test++,
state => state.test < 1e6
);
if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
else return performance.now() - start;
}
function blue() {
let start = performance.now(),
stateMachine = new StateMachine({
test: 0
});
stateMachine.transition(
state => state.test++,
state => state.test < 1e6
);
if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
else return performance.now() - start;
}
// display execution time
const display = (time) => document.getElementById('results').textContent = `Avg: ${time.toFixed(2)}ms`;
// handy dandy Array.avg()
Array.prototype.avg = function() {
return this.reduce((a,b) => a+b) / this.length;
}
// bindings
document.getElementById('red').addEventListener('click', () => {
const times = [];
for (var i = 0; i < 50; i++)
times.push(red());
display(times.avg());
}),
document.getElementById('blue').addEventListener('click', () => {
const times = [];
for (var i = 0; i < 50; i++)
times.push(blue());
display(times.avg());
});
<script src="https://cdn.jsdelivr.net/gh/TeleworkInc/state-machine#bd486a339dca1b3ad3157df20e832ec23c6eb00b/StateMachine.js"></script>
<h2 id="results">Waiting...</h2>
<button id="red">Red Pill</button>
<button id="blue">Blue Pill</button>
<style>
body{box-sizing:border-box;padding:0 4rem;text-align:center}button,h2,p{width:100%;margin:auto;text-align:center;font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol"}button{font-size:1rem;padding:.5rem;width:180px;margin:1rem 0;border-radius:20px;outline:none;}#red{background:rgba(255,0,0,.24)}#blue{background:rgba(0,0,255,.24)}
</style>
Updates
Bug Report "Feature Request" filed (awaiting update) - See #jmrk's answers below for more details.
Ultimately, this behavior is unexpected and, IMO, qualifies as a nontrivial bug. The impact for me is significant - on Intel i7-4770 (8) # 3.900GHz, my execution times in the example above go from an average of 2ms to 45ms (a 20x increase).
As for nontriviality, consider that any subsequent calls to StateMachine.transition after the first one will be unnecessarily slow, regardless of scope or location in the code. The fact that SpiderMonkey does not slow down subsequent calls to transition signals to me that there is room for improvement for this specific optimization logic in V8.
See below, where subsequent calls to StateMachine.transition are slowed:
// same source, several times
// 1
(function() {
let start = performance.now(),
stateMachine = new StateMachine({
test: 0
});
stateMachine.transition(state => state.test++, state => state.test < 1e6);
if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
console.log(`took ${performance.now() - start}ms`);
})();
// 2
(function() {
let start = performance.now(),
stateMachine = new StateMachine({
test: 0
});
stateMachine.transition(state => state.test++, state => state.test < 1e6);
if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
console.log(`took ${performance.now() - start}ms`);
})();
// 3
(function() {
let start = performance.now(),
stateMachine = new StateMachine({
test: 0
});
stateMachine.transition(state => state.test++, state => state.test < 1e6);
if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
console.log(`took ${performance.now() - start}ms`);
})();
<script src="https://cdn.jsdelivr.net/gh/TeleworkInc/state-machine#bd486a339dca1b3ad3157df20e832ec23c6eb00b/StateMachine.js"></script>
This performance decrease can be avoided by wrapping the code in a named closure, where presumably the optimizer knows the callbacks will not change:
var test = (function() {
let start = performance.now(),
stateMachine = new StateMachine({
test: 0
});
stateMachine.transition(state => state.test++, state => state.test < 1e6);
if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
console.log(`took ${performance.now() - start}ms`);
});
test();
test();
test();
<script src="https://cdn.jsdelivr.net/gh/TeleworkInc/state-machine#bd486a339dca1b3ad3157df20e832ec23c6eb00b/StateMachine.js"></script>
Platform Information
$ uname -a
Linux workspaces 5.4.0-39-generic #43-Ubuntu SMP Fri Jun 19 10:28:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ google-chrome --version
Google Chrome 83.0.4103.116
V8 developer here. It's not a bug, it's just an optimization that V8 doesn't do. It's interesting to see that Firefox seems to do it...
FWIW, I don't see "ballooning to 400ms"; instead (similar to Jon Trent's comment) I see about 2.5ms at first, and then around 11ms.
Here's the explanation:
When you click only one button, then transition only ever sees one callback. (Strictly speaking it's a new instance of the arrow function every time, but since they all stem from the same function in the source, they're "deduped" for type feedback tracking purposes. Also, strictly speaking it's one callback each for stateTransition and transitionCondition, but that just duplicates the situation; either one alone would reproduce it.) When transition gets optimized, the optimizing compiler decides to inline the called function, because having seen only one function there in the past, it can make a high-confidence guess that it's also always going to be that one function in the future. Since the function does extremely little work, avoiding the overhead of calling it provides a huge performance boost.
Once the second button is clicked, transition sees a second function. It must get deoptimized the first time this happens; since it's still hot it'll get reoptimized soon after, but this time the optimizer decides not to inline, because it's seen more than one function before, and inlining can be very expensive. The result is that from this point onwards, you'll see the time it takes to actually perform these calls. (The fact that both functions have identical source doesn't matter; checking that wouldn't be worth it because outside of toy examples that would almost never be the case.)
There's a workaround, but it's something of a hack, and I don't recommend putting hacks into user code to account for engine behavior. V8 does support "polymorphic inlining", but (currently) only if it can deduce the call target from some object's type. So if you construct "config" objects that have the right functions installed as methods on their prototype, you can get V8 to inline them. Like so:
class StateMachine {
...
transition(config, maxCalls = Infinity) {
let i = 0;
while (
config.condition &&
config.condition(this.state) &&
i++ < maxCalls
) config.transition(this.state);
return this;
}
...
}
class RedConfig {
transition(state) { return state.test++ }
condition(state) { return state.test < 1e6 }
}
class BlueConfig {
transition(state) { return state.test++ }
condition(state) { return state.test < 1e6 }
}
function red() {
...
stateMachine.transition(new RedConfig());
...
}
function blue() {
...
stateMachine.transition(new BlueConfig());
...
}
It might be worth filing a bug (crbug.com/v8/new) to ask if the compiler team thinks that this is worth improving. Theoretically it should be possible to inline several functions that are called directly, and branch between the inlined paths based on the value of the function variable that's being called. However I'm not sure there are many cases where the impact is as pronounced as in this simple benchmark, and I know that recently the trend has been towards inlining less rather than more, because on average that tends to be the better tradeoff (there are drawbacks to inlining, and whether it's worth it is necessarily always a guess, because the engine would have to predict the future in order to be sure).
In conclusion, coding with many callbacks is a very flexible and often elegant technique, but it tends to come at an efficiency cost. (There are other varieties of inefficiency: e.g. a call with an inline arrow function like transition(state => state.something) allocates a new function object each time it's executed; that just so happens not to matter much in the example at hand.) Sometimes engines might be able to optimize away the overhead, and sometimes not.
Since this is getting so much interest (and updates to the question), I thought I'd provide some additional detail.
The new simplified test case is great: it's very simple, and very clearly shows a problem.
function test(callback) {
let start = performance.now();
for (let i = 0; i < 1e6; i++) callback();
console.log(`${callback.name} took ${(performance.now() - start).toFixed(2)}ms`);
}
var exampleA = (a,b) => 10**10;
var exampleB = (a,b) => 10**10;
// one callback -> fast
for (let i = 0; i < 10; i++) test(exampleA);
// introduce a second callback -> much slower forever
for (let i = 0; i < 10; i++) test(exampleB);
for (let i = 0; i < 10; i++) test(exampleA);
On my machine, I'm seeing times go as low as 0.23 ms for exampleA alone, and then they go up to 7.3ms when exampleB comes along, and stay there. Wow, a 30x slowdown! Clearly that's a bug in V8? Why wouldn't the team jump on fixing this?
Well, the situation is more complicated than it seems at first.
Firstly, the "slow" case is the normal situation. That's what you should expect to see in most code. It's still pretty fast! You can do a million function calls (plus a million exponentiations, plus a million loop iterations) in just 7 milliseconds! That's only 7 nanoseconds per iteration+call+exponentiation+return!
Actually, that analysis was a bit simplified. In reality, an operation on two constants like 10**10 will be constant-folded at compile time, so once exampleA and exampleB get optimized, the optimized code for them will return 1e10 immediately, without doing any multiplications.
On the flip side, the code here contains a small oversight that causes the engine to have to do more work: exampleA and exampleB take two parameters (a, b), but they're called without any arguments simply as callback(). Bridging this difference between expected and actual number of parameters is fast, but on a test like this that doesn't do much else, it amounts to about 40% of the total time spent. So a more accurate statement would be: it takes about 4 nanoseconds to do a loop iteration plus a function call plus a materialization of a number constant plus a function return, or 7 ns if the engine additionally has to adapt the arguments count of the call.
So what about the initial results for just exampleA, how can that case be so much faster? Well, that's the lucky situation that hits various optimizations in V8 and can take several shortcuts -- in fact it can take so many shortcuts that it ends up being a misleading microbenchmark: the results it produces don't reflect real situations, and can easily cause an observer to draw incorrect conclusions. The general effect that "always the same callback" is (typically) faster than "several different callbacks" is certainly real, but this test significantly distorts the magnitude of the difference.
At first, V8 sees that it's always the same function that's getting called, so the optimizing compiler decides to inline the function instead of calling it. That avoids the adaptation of arguments right off the bat. After inlining, the compiler can also see that the result of the exponentiation is never used, so it drops that entirely. The end result is that this test tests an empty loop! See for yourself:
function test_empty(no_callback) {
let start = performance.now();
for (let i = 0; i < 1e6; i++) {}
console.log(`empty loop took ${(performance.now() - start).toFixed(2)}ms`);
}
That gives me the same 0.23ms as calling exampleA. So contrary to what we thought, we didn't measure the time it takes to call and execute exampleA, in reality we measured no calls at all, and no 10**10 exponentiations either. (If you like more direct proof, you can run the original test in d8 or node with --print-opt-code and see the disassembly of the optimized code that V8 generates internally.)
All that lets us conclude a few things:
(1) This is not a case of "OMG there's this horrible slowdown that you must be aware of and avoid in your code". The default performance you get when you don't worry about this is great. Sometimes when the stars align you might see even more impressive optimizations, but… to put it lightly: just because you only get presents on a few occasions per year, doesn't mean that all the other non-gift-bearing days are some horrible bug that must be avoided.
(2) The smaller your test case, the bigger the observed difference between default speed and lucky fast case. If your callbacks are doing actual work that the compiler can't just eliminate, then the difference will be smaller than seen here. If your callbacks are doing more work than a single operation, then the fraction of overall time that's spent on the call itself will be smaller, so replacing the call with inlining will make less of a difference than it does here. If your functions are called with the parameters they need, that will avoid the needless penalization seen here. So while this microbenchmark manages to create the misleading impression that there's a shockingly large 30x difference, in most real applications it will be between maybe 4x in extreme cases and "not even measurable at all" for many other cases.
(3) Function calls do have a cost. It's great that (for many languages, including JavaScript) we have optimizing compilers that can sometimes avoid them via inlining. If you have a case where you really, really care about every last bit of performance, and your compiler happens to not inline what you think it should be inlining (for whatever reason: because it can't, or because it has internal heuristics that decide not to), then it can give significant benefits to redesign your code a bit -- e.g. you could inline by hand, or otherwise restructure your control flow to avoid millions of calls to tiny functions in your hottest loops. (Don't blindly overdo it though: having too few too big functions isn't great for optimization either. Usually it's best to not worry about this. Organize your code into chunks that make sense, let the engine take care of the rest. I'm only saying that sometimes, when you observe specific problems, you can help the engine do its job better.)
If you do need to rely on performance-sensitive function calls, then an easy tuning you can do is to make sure that you're calling your functions with exactly as many arguments as they expect -- which is probably often what you would do anyway. Of course optional arguments have their uses as well; like in so many other cases the extra flexibility comes with a (small) performance cost, which is often negligible, but can be taken into consideration when you feel that you have to.
(4) Observing such performance differences can understandably be surprising and sometimes even frustrating. Unfortunately, the nature of optimizations is such that they can't always be applied: they rely on making simplifying assumptions and not covering every case, otherwise they wouldn't be fast any more. We work very hard to give you reliable, predictable performance, with as many fast cases and as few slow cases as possible, and no steep cliffs between them. But we cannot escape the reality that we can't possibly "just make everything fast". (Which of course isn't to say that there's nothing left to do: every additional year of engineering work brings additional performance gains.) If we wanted to avoid all cases where more-or-less similar code exhibits noticeably different performance, then the only way to accomplish that would be to not do any optimizations at all, and instead leave everything at baseline ("slow") implementations -- and I don't think that would make anyone happy.
EDIT to add:
It seems there are major differences between different CPUs here, which probably explains why previous commenters have reported so wildly differing results. On hardware I can get my hands on, I'm seeing:
i7 6600U: 3.3 ms for inlined case, 28 ms for calling
i7 3635QM: 2.8 ms for inlined case, 10 ms for calling
i7 3635QM, up-to-date microcode: 2.8 ms for inlined case, 26 ms for calling
Ryzen 3900X: 2.5 ms for inlined case, 5 ms for calling
This is all with Chrome 83/84 on Linux; it's very much possible that running on Windows or Mac would yield different results (because CPU/microcode/kernel/sandbox are closely interacting with each other).
If you find these hardware differences shocking, read up on "spectre".

Accurately measure a Javascript Function performance while displaying the output to user

As you can see in code below, when I increase the size of the string it leads to a 0 milliseconds difference. And moreover there is an inconsistency as the string count goes on increasing.
Am I doing something wrong here?
let stringIn = document.getElementById('str');
let button = document.querySelector('button');
button.addEventListener('click', () => {
let t1 = performance.now();
functionToTest(stringIn.value);
let t2 = performance.now();
console.log(`time taken is ${t2 - t1}`);
});
function functionToTest(str) {
let total = 0;
for(i of str) {
total ++;
}
return total;
}
<input id="str">
<button type="button">Test string</button>
I tried using await too, but result is the same (see code snippet below). The function enclosing the code below is async:
let stringArr = this.inputString.split(' ');
let longest = '';
const t1 = performance.now();
let length = await new Promise(resolve => {
stringArr.map((item, i) => {
longest = longest.length < item.length ? longest : item;
i === stringArr.length - 1 ? resolve(longest) : '';
});
});
const diff = performance.now() - t1;
console.log(diff);
this.result = `The time taken in mili seconds is ${diff}`;
I've also tried this answer as, but it is also inconsistent.
As a workaround I tried using console.time feature, but It doesn't allow the time to be rendered and isn't accurate as well.
Update: I want to build an interface like jsPerf, which will be quite similar to it but for a different purpose. Mostly I would like to compare different functions which will depend on user inputs.
There are 3 things which may help you understand what happening:
Browsers are reducing performance.now() precision to prevent Meltdown and Spectre attacks, so Chrome gives max 0.1 ms precision, FF 1ms, etc. This makes impossible to measure small timeframes. If function is extremely quick - 0 ms is understandable result. (Thanks #kaiido) Source: paper, additional links here
Any code(including JS) in multithreaded environment will not execute with constant performance (at least due to OS threads switching). So getting consistent values for several single runs is unreachable goal. To get some precise number - function should be executed multiple times, and average value is taken. This will even work with low precision performance.now(). (Boring explanation: if function is much faster than 0.1 ms, and browser often gives 0ms result, but time from time some function run will win a lottery and browser will return 0.1ms... longer functions will win this lottery more often)
There is "optimizing compiler" in most JS engines. It optimizes often used functions. Optimization is expensive, so JS engines optimize only often used functions. This explains performance increase after several runs. At first function is executed in slowest way. After several executions, it is optimized, and performance increases. (should add warmup runs?)
I was able to get non-zero numbers in your code snipet - by copy-pasting 70kb file into input. After 3rd run function was optimized, but even after this - performance is not constant
time taken is 11.49999990593642
time taken is 5.100000067614019
time taken is 2.3999999975785613
time taken is 2.199999988079071
time taken is 2.199999988079071
time taken is 2.099999925121665
time taken is 2.3999999975785613
time taken is 1.7999999690800905
time taken is 1.3000000035390258
time taken is 2.099999925121665
time taken is 1.9000000320374966
time taken is 2.300000051036477
Explanation of 1st point
Lets say two events happened, and the goal is to find time between them.
1st event happened at time A and second event happened at time B. Browser is rounding precise values A and B and returns them.
Several cases to look at:
A B A-B floor(A) floor(B) Ar-Br
12.001 12.003 0.002 12 12 0
11.999 12.001 0.002 11 12 1
Browsers are smarter than we think, there are lot's of improvements and caching techniques take in place for memory allocation, repeatable code execution, on-demand CPU allocation and so on. For instance V8, the JavaScript engine that powers Chrome and Node.js caches code execution cycles and results. Furthermore, your results may get affected by the resources that your browser utilizes upon code execution, so your results even with multiple executions cycles may vary.
As you have mentioned that you are trying to create a jsPerf clone take a look at Benchmark.js, this library is used by the jsPerf development team.
Running performance analysis tests is really hard and I would suggest running them on a Node.js environment with predefined and preallocated resources in order to obtain your results.
You might what to take a look at https://github.com/anywhichway/benchtest which just reuses Mocha unit tests. Note, performance testing at the unit level should only be one part of your performance testing, you should also use simulators that emulate real world conditions and test your code at the application level to assess
network impacts, module interactions, etc.

Is there a faster/more efficient way to call different functions, than using if-else statements, depending on a variable's value range?

As part of a little game I am making, I have an enemy object which fires projectiles at the character object, controlled by the player. The enemy has an hp attribute with a value of 10000, and as this value depletes, I want the projectile-firing patterns to change. This is my current situation:
this.fireOnce = function(){ ... }
this.fireRandomly = function(){ ... }
this.fireAtTarget = function(){ ... }
this.fireWave = function(){ ... }
this.beginFire = function(){
if(hp<3000){
this.fireWave();
}
else if(hp<5000){
this.fireAtTarget();
}
else if(hp<9000){
this.fireRandomly();
}
else{
this.fireOnce();
}
setTimeout(beginFire, 500);
}
The main loop already has enough complexity already, and things get laggy when many projectiles are on the screen. My concern about if-else statements derives from something my professor said about them being fairly expensive (I can't remember the context though so I could be wrong).
During the creation of this little game, I've used the above structure several times for different matters, and considering the functions get called several times each second, I assume that it takes its toll on the game's performance.
One possibility in an other situation would be to use an object containing the functions, but since we are talking about integer ranges, I can't think of something to use as a key.
"fairly expensive" is a relative term. Yes, a conditional branch, if the condition's value is not predicted, can easily cost dozens of clock cycles, meaning that a single CPU can only execute millions of if statements per second.
To verify this, run the following script in the java script runtime you target:
let odd = 0;
for (let i = 0; i < 1000000000; i++) {
if (i % 2) odd++;
}
odd;
This code executes one billion if statements. In Chrome, it takes about 3 seconds on my machine. Firefox is slower, but still executes one million if statements in about 0.2 seconds, and IE one million if statements in 0.1 seconds.
To conclude, there is no modern JavaScript runtime where a few if statements per second would result in a measurable, let alone human-perceptible, degradation of performance. Whatever the source of your performance problem, it's not your use of if statements.
You should always beware premature optimisation, however you might want to make your code a bit clearer (though there is nothing wrong with the OP). You can create more concise code if you can reduce the logic to generate a single value, then use that value as a key to call a method.
E.g. if the break points were 3000, 6000 and 9000 then the code can be derived from hp / 3000:
function Bot(){
// sorted by value
this.fireWave = function(){ // hp <= 3000
console.log('fireWave');
}
this.fireAtTarget = function(){ // hp <= 6000
console.log('fireAtTarget');
}
this.fireRandomly = function(){ // hp <= 9000
console.log('fireRandomly');
}
this.fireOnce = function(){ // hp > 9000
console.log('fireOnce');
}
// Method names in order
var methods = ['fireOnce','fireWave','fireAtTarget','fireRandomly'];
this.beginFire = function(){
this[methods[Math.ceil(hp / 3000)] || methods[0]]();
// Don't do this for testing
// setTimeout(beginFire, 500);
}
}
var bot = new Bot();
var hp = 0;
for (var i=0; i<10; i++) {
// Randomly set hp to value between 0 and 1200
hp = Math.random()*12000 | 0;
console.log('hp: ' + hp);
bot.beginFire();
}
So as long as you can determine a simple mathematic expression to calculate the code, then you can easily determine the method to call.
You could also have a helper method (possibly on the constructor) that does the logic to determine the method to call, like:
function getFireMethod() {
return hp < 3000? 'fireWave' :
hp < 5000? 'fireAtTarget' :
hp < 9000? 'fireRandomly' :
'fireOnce';
}
Which is clear and concise, but again may not have any perceptible impact on performance either way.
In any case, you will need to do testing across a variety of clients to determine whether there is any useful performance gain. Also, include comments in the code to describe the logic.

Do I got number of operations per second in this way?

Look at this code:
function wait(time) {
let i = 0;
let a = Date.now();
let x = a + (time || 0);
let b;
while ((b = Date.now()) <= x) ++i;
return i;
}
If I run it in browser (particularly Google Chrome, but I don't think it matters) in the way like wait(1000), the machine will obviously freeze for a second and then return recalculated value of i.
Let it be 10 000 000 (I'm getting values close to this one). This value varies every time, so lets take an average number.
Did I just got current number of operations per second of the processor in my machine?
Not at all.
What you get is the number of loop cycles completed by the Javascript process in a certain time. Each loop cycle consists of:
Creating a new Date object
Comparing two Date objects
Incrementing a Number
Incrementing the Number variable i is probably the least expensive of these, so the function is not really reporting how much it takes to make the increment.
Aside from that, note that the machine is doing a lot more than running a Javascript process. You will see interference from all sorts of activity going on in the computer at the same time.
When running inside a Javascript process, you're simply too far away from the processor (in terms of software layers) to make that measurement. Beneath Javascript, there's the browser and the operating system, each of which can (and will) make decisions that affect this result.
No. You can get the number of language operations per second, though the actual number of machine operations per second on a whole processor is more complicated.
Firstly the processor is not wholly dedicated to the browser, so it is actually likely switching back and forth between prioritized processes. On top of that memory access is obscured and the processor uses extra operations to manage memory (page flushing, etc.) and this is not gonna be very transparent to you at a given time. On top of that physical properties means that the real clock rate of the processor is dynamic... You can see it's pretty complicated already ;)
To really calculate the number of machine operations per second you need to measure the clock rate of the processor and multiply it by the number of instructions per cycle the processor can perform. Again this varies, but really the manufacturer specs will likely be good enough of an estimate :P.
If you wanted to use a program to measure this, you'd need to somehow dedicate 100% of the processor to your program and have it run a predictable set of instructions with no other hangups (like memory management). Then you need to include the number of instructions it takes to load the program instructions into the code caches. This is not really feasible however.
As others have pointed out, this will not help you determine the number of operations the processor does per second due to the factors that prior answers have pointed out. I do however think that a similar experiment could be set up to estimate the number of operations to be executed by your JavaScript interpreter running on your browser. For example given a function: factorial(n) an operation that runs in O(n). You could execute an operation such as factorial(100) repeatedly over the course of a minute.
function test(){
let start = Date.now();
let end = start + 60 * 1000;
let numberOfExecutions = 0;
while(Date.now() < end){
factorial(100);
numberOfExecutions++;
}
return numberOfExecutions/(60 * 100);
}
The idea here is that factorial is by far the most time consuming function in the code. And since factorial runs in O(n) we know factorial(100) is approximately 100 operations. Note that this will not be exact and that larger numbers will make for better approximations. Also remember that this will estimate the number of operations executed by your interpreter and not your processor.
There is a lot of truth to all previous comments, but I want to invert the reasoning a little bit because I do believe it is easier to understand it like that.
I believe that the fairest way to calculate it is with the most basic loop, and not relying on any dates or functions, and instead calculate the values later.
You will see that the smaller the function, the bigger the initial overload is. That means it takes a small amount of time to start and finish each function, but at a certain point they all start reaching a number that can clearly be seen as close-enough to be considered how many operations per second can JavaScript run.
My example:
const oneMillion = 1_000_000;
const tenMillion = 10_000_000;
const oneHundredMillion = 100_000_000;
const oneBillion = 1_000_000_000;
const tenBillion = 10_000_000_000;
const oneHundredBillion = 100_000_000_000;
const oneTrillion = 1_000_000_000_000;
function runABunchOfTimes(times) {
console.time('timer')
for (let i = 0; i < times; ++i) {}
console.timeEnd('timer')
}
I've tried on a machine that has a lot of load already on it with many processes running, 2020 macbook, these were my results:
at the very end I am taking the time the console showed me it took to run, and I divided the number of runs by it. The oneTrillion and oneBillion runs are virtually the same, however when it goes to oneMillion and 1000 you can see that they are not as performant due to the initial load of creating the for loop in the first place.
We usually try to sway away from O(n^2) and slower functions exactly because we do not want to reach for that maximum. If you were to perform a find inside of a map for an array with all cities in the world (around 10_000 according to google, I haven't counted) we would already each 100_000_000 iterations, and they would certainly not be as simple as just iterating through nothing like in my example. Your code then would take minutes to run, but I am sure you are aware of this and that is why you posted the question in the first place.
Calculating how long it would take is tricky not only because of the above, but also because you cannot predict which device will run your function. Nowadays I can open in my TV, my watch, a raspberry py and none of them would be nearly as fast as the computer I am running from when creating these functions. Sure. But if I were to try to benchmark a device I would use something like the function above since it is the simplest loop operation I could think of.

Categories

Resources