Recently I came across the concept of hidden classes and inline caching used by V8 to optimise js code. Cool.
I understand that objects are represented as hidden classes internally. And two objects may have same properties but different hidden classes (depending upon the order in which properties are assigned).
Also V8 uses inline caching concept to directly check offset to access properties of object rather than using object's hidden class to determine offsets.
Code -
function Point(x, y) {
this.x = x;
this.y = y;
}
function processPoint(point) {
// console.log(point.x, point.y, point.a, point.b);
// let x = point;
}
function main() {
let p1 = new Point(1, 1);
let p2 = new Point(1, 1);
let p3 = new Point(1, 1);
const N = 300000000;
p1.a = 1;
p1.b = 1;
p2.b = 1;
p2.a = 1;
p3.a = 1;
p3.b = 1;
let start_1 = new Date();
for(let i = 0; i< N; i++ ) {
if (i%4 != 0) {
processPoint(p1);
} else {
processPoint(p2)
}
}
let end_1 = new Date();
let t1 = (end_1 - start_1);
let start_2 = new Date();
for(let i = 0; i< N; i++ ) {
if (i%4 != 0) {
processPoint(p1);
} else {
processPoint(p1)
}
}
let end_2 = new Date();
let t2 = (end_2 - start_2);
let start_3 = new Date();
for(let i = 0; i< N; i++ ) {
if (i%4 != 0) {
processPoint(p1);
} else {
processPoint(p3)
}
}
let end_3 = new Date();
let t3 = (end_3 - start_3);
console.log(t1, t2, t3);
}
(function(){
main();
})();
I was expecting results to be like t1 > (t2 = t3) because :
first loop : V8 will try to optimise after running twice but it will soon encounter different hidden class so it will de optimise.
second loop : same object is called all the time so inline caching can be used.
third loop : same as second loop because hidden classes are same.
But results are not satisfying. I got (and similar results running again and again) -
3553 4805 4556
Questions :
Why results were not as expected? Where did my assumptions go wrong?
How can I change this code to demonstrate hidden classes and inline caching performance improvements?
Did I get it all wrong from the starting?
Are hidden classes present just for memory efficiency by letting objects share them?
Any other sites with some simple examples of performance improvements?
I am using node 8.9.4 for testing. Thanks in advance.
Sources :
https://blog.sessionstack.com/how-javascript-works-inside-the-v8-engine-5-tips-on-how-to-write-optimized-code-ac089e62b12e
https://draft.li/blog/2016/12/22/javascript-engines-hidden-classes/
https://richardartoul.github.io/jekyll/update/2015/04/26/hidden-classes.html
and many more..
V8 developer here. The summary is: Microbenchmarking is hard, don't do it.
First off, with your code as posted, I'm seeing 380 380 380 as the output, which is expected, because function processPoint is empty, so all loops do the same work (i.e., no work) no matter which point object you select.
Measuring the performance difference between monomorphic and 2-way polymorphic inline caches is difficult, because it is not large, so you have to be very careful about what else your benchmark is doing. console.log, for example, is so slow that it'll shadow everything else.
You'll also have to be careful about the effects of inlining. When your benchmark has many iterations, the code will get optimized (after running waaaay more than twice), and the optimizing compiler will (to some extent) inline functions, which can allow subsequent optimizations (specifically: eliminating various things) and thereby can significantly change what you're measuring. Writing meaningful microbenchmarks is hard; you won't get around inspecting generated assembly and/or knowing quite a bit about the implementation details of the JavaScript engine you're investigating.
Another thing to keep in mind is where inline caches are, and what state they'll have over time. Disregarding inlining, a function like processPoint doesn't know or care where it's called from. Once its inline caches are polymorphic, they'll remain polymorphic, even if later on in your benchmark (in this case, in the second and third loop) the types stabilize.
Yet another thing to keep in mind when trying to isolate effects is that long-running functions will get compiled in the background while they run, and will then at some point be replaced on the stack ("OSR"), which adds all sorts of noise to your measurements. When you invoke them with different loop lengths for warmup, they'll still get compiled in the background however, and there's no way to reliably wait for that background job. You could resort to command-line flags intended for development, but then you wouldn't be measuring regular behavior any more.
Anyhow, the following is an attempt to craft a test similar to yours that produces plausible results (about 100 180 280 on my machine):
function Point() {}
// These three functions are identical, but they will be called with different
// inputs and hence collect different type feedback:
function processPointMonomorphic(N, point) {
let sum = 0;
for (let i = 0; i < N; i++) {
sum += point.a;
}
return sum;
}
function processPointPolymorphic(N, point) {
let sum = 0;
for (let i = 0; i < N; i++) {
sum += point.a;
}
return sum;
}
function processPointGeneric(N, point) {
let sum = 0;
for (let i = 0; i < N; i++) {
sum += point.a;
}
return sum;
}
let p1 = new Point();
let p2 = new Point();
let p3 = new Point();
let p4 = new Point();
const warmup = 12000;
const N = 100000000;
let sum = 0;
p1.a = 1;
p2.b = 1;
p2.a = 1;
p3.c = 1;
p3.b = 1;
p3.a = 1;
p4.d = 1;
p4.c = 1;
p4.b = 1;
p4.a = 1;
processPointMonomorphic(warmup, p1);
processPointMonomorphic(1, p1);
let start_1 = Date.now();
sum += processPointMonomorphic(N, p1);
let t1 = Date.now() - start_1;
processPointPolymorphic(2, p1);
processPointPolymorphic(2, p2);
processPointPolymorphic(2, p3);
processPointPolymorphic(warmup, p4);
processPointPolymorphic(1, p4);
let start_2 = Date.now();
sum += processPointPolymorphic(N, p1);
let t2 = Date.now() - start_2;
processPointGeneric(warmup, 1);
processPointGeneric(1, 1);
let start_3 = Date.now();
sum += processPointGeneric(N, p1);
let t3 = Date.now() - start_3;
console.log(t1, t2, t3);
Related
I've spent the last 2 days watching youtube videos on neural networks.
In particular, I've been trying to implement a genetic algorithm that will evolve over time, however, most videos seem to be focused on neural networks that are trained, and then used for classification.
Being confused, I decided to simply try to implement the basic structure of the network, and have coded this - in JS, for convenience.
function sigmoid (x) { return 1 / (1 + Math.E ** -x); }
function Brain(inputs, hiddens, outputs) {
this.weights = {
hidden: [],
output: []
};
for (var i = hiddens; i--;) {
this.weights.hidden[i] = [];
for (var w = inputs; w--;) this.weights.hidden[i].push(Math.random());
}
for (var i = outputs; i--;) {
this.weights.output[i] = [];
for (var w = hiddens; w--;) this.weights.output[i].push(Math.random());
}
}
Brain.prototype.compute = function(inputs) {
var hiddenInputs = [];
for (var i = this.weights.hidden.length; i--;) {
var dot = 0;
for (var w = inputs.length; w--;) dot += inputs[w] * this.weights.hidden[i][w];
hiddenInputs[i] = sigmoid(dot);
}
var outputs = [];
for (var i = this.weights.output.length; i--;) {
var dot = 0;
for (var w = this.weights.hidden.length; w--;) dot += hiddenInputs[w] * this.weights.output[i][w];
outputs[i] = sigmoid(dot);
}
return outputs;
}
var brain = new Brain(1,2,1);
brain.compute([1]);
I successfully get values between 0 and 1. And, when I use specific weights, I get the same value each time, for a constant input.
Is the terminology I'm using in code good?
I fear I may simply be observing false positives, and am not actually feeding forward.
Is the sigmoid function appropriately? Should I be using it for a genetic / evolving algorithm?
I noticed that I'm getting results only between 0.5 and 1;
To combine a neural network with a genetic algorithm your best shot is probably NEAT. There is a very good implementation of this algorithm in JS called 'Neataptic', you should be able to fint it on github.
When combining GA with ANN you generally want to not only adjust the weights, but the structure as well.
Sigmoid activation is OK for GA, but in many cases you also want other activation functions, you can find a small list of activation functions on wikipedia or create your own ones.
Disclaimer: I'm aware that it might look as premature optimization, however the question is interesting by itself, so thank you for your patience.
I have two jsfiddles used to measure performance of different kinds of scenarios, and I've stumbled upon one very interesting thing:
https://jsfiddle.net/lu4x/k74oq0de/51/
function timeout() {
var x = 0; // <------------------------------ Using this variable
for (let iteration = 0; iteration < 1; iteration++) {
let count = 0;
let timeA = performance.now();
for (let i = 0; i < map.length; i++) {
x = map[i]; // <------------------------------ and this assign
count += x;
}
let timeB = performance.now();
let delta = timeB - timeA;
average = avg(average, run, delta);
minimum = Math.min(minimum, delta);
if (++run % 100 === 0) log([run, minimum, average]);
}
if (iterate) setTimeout(timeout, 0);
}
Yields this output on my machine:
1000,19.204999999999927,22.00017499999995
900,19.204999999999927,22.074283333333327
800,19.204999999999927,22.13556875000003
700,19.204999999999927,22.222578571428635
600,19.204999999999927,22.271575000000077
500,19.24000000000069,22.486130000000085
400,19.24000000000069,22.41451250000006
300,19.24000000000069,22.209600000000002
200,19.280000000000655,22.322875000000046
100,19.38000000000011,22.719250000000034
While the following fiddle:
https://jsfiddle.net/lu4x/k74oq0de/52/
function timeout() {
for (let iteration = 0; iteration < 1; iteration++) {
let count = 0;
let timeA = performance.now();
for (let i = 0; i < map.length; i++) {
count += map[i]; // <--------------------- Not using anything
}
let timeB = performance.now();
let delta = timeB - timeA;
average = avg(average, run, delta);
minimum = Math.min(minimum, delta);
if (++run % 100 === 0) log([run, minimum, average]);
}
if (iterate) setTimeout(timeout, 0);
}
Yields to the following output:
1000,21.125,24.086139999999997
900,21.125,24.086711111111097
800,21.125,24.068375000000003
700,21.125,24.05558571428569
600,21.165000000000873,24.102566666666636
500,21.165000000000873,24.129959999999983
400,21.165000000000873,24.10507499999996
300,21.170000000000073,24.021016666666657
200,21.170000000000073,24.03779999999997
100,21.300000000000182,24.037549999999992
I've did some other testing and having a difference of 21.125 - 19.204999999999927 = 1.92 milliseconds. And for such kind of test it is dramatic change, the amount of penalty is close to one dereferencing (i.e. x.y) operation.
Why things happen this way.
P.S.
If you put var x inside the loop it will be as slow as not using var x. I'm flattered...
==============
Edit
Here is the same performance test using benchmark.js
https://jsfiddle.net/lu4x/k74oq0de/60/
Benchmark.js confirms what was shown in previous tests, the difference is significant
I am working on an implementation of the A-Star algorithm in javascript. It works, however it takes a very large amount of time to create a path between two very close together points: (1,1) to (6,6) it takes several seconds. I would like to know what mistakes I have made in the algorithm and how to resolve these.
My code:
Node.prototype.genNeighbours = function() {
var right = new Node(this.x + 1, this.y);
var left = new Node(this.x - 1, this.y);
var top = new Node(this.x, this.y + 1);
var bottom = new Node(this.x, this.y - 1);
this.neighbours = [right, left, top, bottom];
}
AStar.prototype.getSmallestNode = function(openarr) {
var comp = 0;
for(var i = 0; i < openarr.length; i++) {
if(openarr[i].f < openarr[comp].f) comp = i
}
return comp;
}
AStar.prototype.calculateRoute = function(start, dest, arr){
var open = new Array();
var closed = new Array();
start.g = 0;
start.h = this.manhattanDistance(start.x, dest.x, start.y, dest.y);
start.f = start.h;
start.genNeighbours();
open.push(start);
while(open.length > 0) {
var currentNode = null;
this.getSmallestNode(open);
currentNode = open[0];
if(this.equals(currentNode,dest)) return currentNode;
currentNode.genNeighbours();
var iOfCurr = open.indexOf(currentNode);
open.splice(iOfCurr, 1);
closed.push(currentNode);
for(var i = 0; i < currentNode.neighbours.length; i++) {
var neighbour = currentNode.neighbours[i];
if(neighbour == null) continue;
var newG = currentNode.g + 1;
if(newG < neighbour.g) {
var iOfNeigh = open.indexOf(neighbour);
var iiOfNeigh = closed.indexOf(neighbour);
open.splice(iOfNeigh, 1);
closed.splice(iiOfNeigh,1);
}
if(open.indexOf(neighbour) == -1 && closed.indexOf(neighbour) == -1) {
neighbour.g = newG;
neighbour.h = this.manhattanDistance(neighbour.x, dest.x, neighbour.y, dest.y);
neighbour.f = neighbour.g + neighbour.h;
neighbour.parent = currentNode;
open.push(neighbour);
}
}
}
}
Edit: I've now resolved the problem. It was due to the fact that I was just calling: open.sort(); which wasn't sorting the nodes by their 'f' value. I wrote a custom function and now the algorithm runs quickly.
A few mistakes I've spotted:
Your set of open nodes is not structured in any way so that retrieving the one with the minimal distance is easy. The usual choice for this is to use a priority queue, but inserting new nodes in a sorted order (instead of open.push(neighbour)) should suffice (at first).
in your getSmallestNode function, you may start the loop at index 1
you are calling getSmallestNode(), but not using its results at all. You're only taking currentNode = open[0]; every time (and then even searching for its position to splice it! It's 0!). With the queue, it's just currentNode = open.shift().
However, the most important thing (that could have gone most wrong) is your getNeighbors() function. It does create entirely new node objects every time it is called - ones that were unheard of before, and are not know to your algorithm (or its closed set). They may be in the same position in your grid as other nodes, but they're different objects (which are compared by reference, not by similarity). This means that indexOf will never find those new neighbors in the closed array, and they will get processed over and over (and over). I won't attempt to calculate the complexity of this implementation, but I'd guess its even worse than exponential.
Typically, the A* algorithm is executed on already existing graphs. An OOP-getNeighbors-function would return the references to the existing node objects, instead of creating new ones with the same coordinates. If you need to dynamically generate the graph, you'll need a lookup structure (two-dimensional array?) to store and retrieve already-generated nodes.
I have a for-loop from 0 to 8,019,000,000 that is extremely slow.
var totalCalcs = 0;
for (var i = 0; i < 8019000000; i++)
totalCalcs++;
window.alert(totalCalcs);
in chrome this takes between 30-60secs.
I also already tried variations like:
var totalCalcs = 0;
for (var i = 8019000000; i--; )
totalCalcs++;
window.alert(totalCalcs);
Didn't do too much difference unfortunately.
Is there anything I can do to speed this up?
Your example is rather trivial, and any answer may not be suitable for whatever code you're actually putting inside of a loop with that many iterations.
If your work can be done in parallel, then we can divide the work between several web workers.
You can read a nice introduction to web workers, and learn how to use them, here:
http://www.html5rocks.com/en/tutorials/workers/basics/
Figuring out how to divide the work is a challenge that depends entirely on what that work is. Because your example is so small, it's easy to divide the work among inline web workers; here is a function to create a worker that will invoke a function asynchronously:
var makeWorker = function (fn, args, callback) {
var fnString = 'self.addEventListener("message", function (e) {self.postMessage((' + fn.toString() + ').apply(this, e.data))});',
blob = new Blob([fnString], { type: 'text/javascript' }),
url = URL.createObjectURL(blob),
worker = new Worker(url);
worker.postMessage(args);
worker.addEventListener('message', function (e) {
URL.revokeObjectURL(url);
callback(e.data);
});
return worker;
};
The work that we want done is adding numbers, so here is a function to do that:
var calculateSubTotal = function (count) {
var sum = 0;
for (var i = 0; i < count; ++i) {
sum++;
}
return sum;
};
And when a worker finishes, we want to add his sum to the total AND tell us the result when all workers are finished, so here is our callback:
var total = 0, count = 0, numWorkers = 1,
workerFinished = function (subTotal) {
total += subTotal;
count++;
if (count == numWorkers) {
console.log(total);
}
};
And finally we can create a worker:
makeWorker(calculateSubTotal, [10], workerFinished); // logs `10` to console
When put together, these pieces can calculate your large sum quickly (depending on how many CPUs your computer has, of course).
I have a complete example on jsfiddle.
Treating your question as a more generic question about speeding up loops with many iterations: you could try Duff's device.
In a test using nodejs the following code decreased the loop time from 108 seconds for your second loop (i--) to 27 seconds
var testVal = 0, iterations = 8019000000;
var n = iterations % 8;
while (n--) {
testVal++;
}
n = parseInt(iterations / 8);
while (n--) {
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
testVal++;
}
I've got the following code inside a <script> tag on a webpage with nothing else on it. I'm afraid I do not presently have it online. As you can see, it adds up all primes under two million, in two different ways, and calculates how long it took on average. The variable howOften is used to do this a number of times so you can average it out. What puzzles me is, for howOften == 1, method 2 is faster, but for howOften == 10, method 1 is. The difference is significant and holds even if you hit F5 a couple of times.
My question is simply: how come?
(This post has been edited to incorporate alf's suggestion. But that made no difference! I'm very much puzzled now.)
(Edited again: with howOften at or over 1000, the times seem stable. Alf's answer seems correct.)
function methodOne(maxN) {
var sum, primes_inv, i, j;
sum = 0;
primes_inv = [];
for ( var i = 2; i < maxN; ++i ) {
if ( primes_inv[i] == undefined ) {
sum += i;
for ( var j = i; j < maxN; j += i ) {
primes_inv[j] = true;
}
}
}
return sum;
}
function methodTwo(maxN) {
var i, j, p, sum, ps, n;
n = ((maxN - 2) / 2);
sum = n * (n + 2);
ps = [];
for(i = 1; i <= n; i++) {
for(j = i; j <= n; j++) {
p = i + j + 2 * i * j;
if(p <= n) {
if(ps[p] == undefined) {
sum -= p * 2 + 1;
ps[p] = true;
}
}
else {
break;
}
}
}
return sum + 2;
}
// ---------- parameters
var howOften = 10;
var maxN = 10000;
console.log('iterations: ', howOften);
console.log('maxN: ', maxN);
// ---------- dry runs for warm-up
for( i = 0; i < 1000; i++ ) {
sum = methodOne(maxN);
sum = methodTwo(maxN);
}
// ---------- method one
var start = (new Date).getTime();
for( i = 0; i < howOften; i++ )
sum = methodOne(maxN);
var stop = (new Date).getTime();
console.log('methodOne: ', (stop - start) / howOften);
// ---------- method two
for( i = 0; i < howOften; i++ )
sum = methodTwo(maxN);
var stop2 = (new Date).getTime();
console.log('methodTwo: ', (stop2 - stop) / howOften);
Well, JS runtime is an optimized JIT compiler. Which means that for a while, your code is interpreted (tint), after that, it gets compiled (tjit), and finally you run a compiled code (trun).
Now what you calculate is most probably (tint+tjit+trun)/N. Given that the only part depending almost-linearly on N is trun, this comparison soes not make much sense, unfortunately.
So the answer is, I don't know. To have a proper answer,
Extract the code you are trying to profile into functions
Run warm-up cycles on these functions, and do not use timing from the warm-up cycles
Run much more than 1..10 times, both for warm-up and measurement
Try swapping the order in which you measure time for algorithms
Get into JS interpretator internals if you can and make sure you understand what happens: do you really measure what you think you do? Is JIT run during the warm-up cycles and not while you measure? Etc., etc.
Update: note also that for 1 cycle, you get run time less than the resolution of the system timer, which means the mistake is probably bigger than the actual values you compare.
methodTwo simply requires that the processor execute fewer calculations. In methodOne your initial for loop is executing maxN times. In methodTwo your initial for loop is executing (maxN -2)/2 times. So in the second method the processor is doing less than half the number of calculations that the first method is doing. This is compounded by the fact that each method contains a nested for loop. So big-O of methodOne is maxN^2. Whereas big-O of methodTwo is ((maxN -2)/2)^2.