I've recently read the original paper about NeuroEvolution
of Augmenting Topologies by Kenneth O. Stanley and am now trying to prototype it myself in JavaScript. I stumbled across a few questions I can't answer.
My questions:
What is the definition of "structural innovation", and how do I store these so I can check if an innovation has already happened before?
However,
by keeping a list of the innovations that occurred in the current generation, it
is possible to ensure that when the same structure arises more than once through independent
mutations in the same generation, each identical mutation is assigned the
same innovation number
Is there a reason for storing the type of a node (input, hidden, output)?
In the original paper, only connections have an innovation number, but in other sources, nodes do as well. Is this necessary for crossover? (This has already been asked here.)
How could I limit the mutation functions to not add recurrent connections?
I think that's it for now. All help is appreciated.
The relevant parts of my code:
Genome
class Genome {
constructor(inputs, outputs) {
this.inputs = inputs;
this.outputs = outputs;
this.nodes = [];
this.connections = [];
for (let i = 0; i < inputs + outputs; i++) {
this.nodes.push(new Node());
}
for (let i = 0; i < inputs; i++) {
for (let o = 0; o < outputs; o++) {
let c = new Connection(this.nodes[i], this.nodes[inputs + o], outputs * i + o);
this.connections.push(c);
}
}
innovation = inputs * outputs;
}
weightMutatePerturb() {
let w = this.connections[Math.floor(random(this.connections.length))].weight;
w += random(-0.5, 0.5);
}
weightMutateCreate() {
this.connections[Math.floor(random(this.connections.length))].weight = random(-2, 2);
}
connectionMutate() {
let i = this.nodes[Math.floor(random(this.nodes.length))];
let o = this.nodes[Math.floor(random(this.inputs, this.nodes.length))];
let c = Connection.exists(this.connections, i, o);
if (c) {
c.enabled = true;
} else {
this.connections.push(new Connection(i, o, innovation));
innovation++;
}
}
nodeMutate() {
let oldCon = this.connections[Math.floor(Math.random(this.connections.length))];
oldCon.enabled = false;
let newNode = new Node();
this.nodes.push(newNode);
this.connections.push(new Connection(oldCon.input, newNode, innovation, 1));
innovation++;
this.connections.push(new Connection(newNode, oldCon.output, innovation, oldCon.weight));
innovation++;
}
}
Node
class Node {
constructor() {
this.value = 0;
this.previousValue = 0;
}
}
Connection
class Connection {
constructor(input, output, innov, weight) {
this.input = input;
this.output = output;
this.innov = innov;
this.weight = weight ? weight : random(-2, 2);
this.enabled = true;
}
static exists(connections, i, o) {
for (let c = 0; c < connections.length; c++) {
if (connections[c].input === i && connections[c].output === o) {
return connections[c];
}
}
return false;
}
}
All answers an sources are welcome. (You are an awesome person!)
First, I would very strongly advice against implementing NEAT yourself. If you take a look at the (many) available implementations, it is quite a large project!
A structural innovation is any new node or connection that is added to a genome and that has not been seen before. Imagine you have input nodes 1, 2, 3 and output nodes 4, 5. If only connection 2-4 is available, introducing connection 3-4 would be an structural innovation. To check for novelty you need to store all seen structures (i.e., a list of all connections and nodes) with a unique ID for each (this is the core idea behind NEAT, actually!). In our example, connection 2-4 may take ID=1, and connection 3-4 would take ID=2. You can see the connection is new in that no other connection in the list connects 2 and 4. Nodes are normally introduced by creating "a stop" in a connection and simply take the next available ID. For example, connection 2-4 would be deleted and you would have connections 2-5 and 5-4, where node ID=5 is created in the process (as well as two new connections). Note the IDs for nodes and connections may be independent (that is: if you use IDs for connections at all).
I'm struggling to think of a hard requirement for this. In principle you could simply store nodes in fixed order (input first, output next, then hidden) and then guess their type given their index, which is how you normally do it anyway for performance reasons (imagine trying to remove a node, you would only want to select a hidden node, so you would restrict search to those indices). Some tasks may be more efficient having that info, though, for example checking for recurrent connections (see 4).
IDs are useful in crossover, as they allow to quickly know which elements are common between two genomes. Whether to have IDs for nodes as well as connections is an open implementation decision. No IDs for connections makes simpler code (connections are identified by the IDs of the nodes they connect). But you lose the ability to tell apart two connections that connect the same nodes. There is an argument that says that a connection between two given nodes does not necessarily mean the same at different times in evolution (see how your quote mentions "in the same generation"). This is probably not a relevant factor, though! As I said, the convenience for IDs for both nodes and connections is still debated in the NEAT community.
In many cases you do not want to allow recurrent connections. The standard way to do this is to check for recurrence every time you try to add a connection. This is a costly step, yes!
If you have more doubts, I recommend you take a look at this implementation by Colin Green for reference. If he is not the person who knows more about NEAT implementation, he comes close.
This is not the average JS question! Thanks for the links, it's a really interesting paper. I can't claim to be an expert, I have only done toy GA problems, but I did read this paper and related ones. Here is what I understand:
I think all you need to worry about is whether a parent, by mutation, produces the same novel gene more than once in a generation. That is, two children, whose gene with the newest innovation number are identical. You can cull those right away. I think they say that it is possible for the same gene to appear in two species at the same time, and they basically say that's fine, that's rare enough not to worry about.
I can find at least one reason: "In NEAT, a bias is a node that can connect to any node other than inputs."
I believe your question is "must nodes have an innovation number to do crossover?" The answer is no. In the original paper (e.g. Figure 4) they show crossover implemented in a way where only connections have innovation numbers.
If you want to change the mutation function to be architecture aware, rather than avoiding recurrent structure, you might want to explicitly add structures you do want. Suppose you want to avoid recurrent connections because you are evolving an image classifier, and you know that convolutions are more suited to the task. In this case, you want your mutation function to be able to add/remove layers (and the needed connections). This was explored in detail last year by Google Brain:
Some of the mutations acting on this DNA are reminiscent of NEAT. However, instead of single nodes, one mutation can insert whole layers—i.e. tens to hundreds of nodes at a time. We also allow for these layers to be removed, so that the evolutionary process can simplify an architecture in addition to complexifying it.
Based on your comment about your motivation for question 4, I think you are mistaken. In the XOR example in the original paper, figure 5, they show a starting phenotype that involves no hidden layer. This starting phenotype is not a solution to the XOR problem, but it provides a good starting point: "NEAT is very consistent in finding a solution. It did not fail once in 100 simulations." That is without any penalization for recurrence.
Related
I'm doing a project on natural selection for cells for fun. Each code has "dna" which is just a set of instructions. The dna can either have REMOVE WASTE, DIGEST FOOD, or REPAIR WALL. I won't really go into detail what they do, because that would take too long. But the only reason evolution really happens is through genetic mutations. I'm wondering if this is possible in javascript, and how to do it. For example, the starting cell has 5 dna strands. But if it reproduces, the child can have 4, or 6. And some of the dna strands can be altered. This is my code so far:
var strands = ["DIGEST FOOD", "REPAIR WALL", "REMOVE WASTE"];
var dna = [];
for (let i = 0; i < 5; i++) {
if (parent) {
// something about the parents dna, and the mutation chance
}
else {
dna.push(strands[Math.floor(Math.random() * 3)]); // if cell doesn't have parent
}
}
I'm just wondering if this is possible in javascript, and how to succesfully do it. Sorry if the question isn't too clear.
Edit: Let me rephrase a little. What I'm trying to achieve is a genetic mutation in the new cell. Like:
if (parent) {
dna.push(parent);
if (Math.random() < 0.5) {
changeStrand(num);
}
if (Math.random() < 0.5) {
addStrand(num);
}
if (Math.random() < 0.5) {
removeStrand(num);
}
}
function changeStrand() {
// change the strand
}
function newStrand(num) {
// add random strands
}
function removeStrand(num) {
// remove random strands
}
or something like that
For a genetic algorithm, you basically want to take two slices from each parent and stitch them together, whilst ensuring the end result is still a valid dna strand.
For a fixed sized DNA sequence (such a N queens positions), the technique would be to pick a random slice point (1-3 | 4-8) and then combine these slices from the parents to create a child.
For your usecase, you need two random slices who sum of sizes adds upto 4-6. So possibly two slices of size 2-3. You could potentially take one from the front, and the other from the back. Else you could first pick a random output size, and then fill it will two random sequences for either parent.
Array.slice() and Array.splice() are probably the functions you want to use.
You can also add in a random mutation to the end result. Viruses at the speed limit of viable genetic evolution have an average of 1 mutation per transcription. Which means some transcriptions won’t have mutations, which is equivalent to allowing some of the parents in the parent generation to survive.
You can also experiment with different variations. Implement these as feature flags, and see what works best in practice.
Also compare with Beam Search, which essentially keeps a copy of the N best results from each generation. You may or may not want to keep the best from the parent generation to survive unmutated.
Another idea is to compute a distance metric between individuals, and add a cost for being too close to an existing member of the population, and this will select for genetic diversity.
In the standard model, variation occurs both by point mutations in the letter sequence and by “crossover” (in which the DNA of an offspring is generated by combining long sections of DNA from each parent).
The analogy to local search algorithms has already been described; the principal difference between stochastic beam search and evolution is the use of sexual reproduction, wherein successors are generated from multiple organisms rather than just one. The actual mechanisms of evolution are, however, far richer than most genetic algorithms allow. For example, mutations can involve reversals, duplications, and movement of large chunks of DNA; some viruses borrow DNA from one organism and insert it in another, and there are transposable genes that do nothing but copy themselves many thousands of times within the genome. There are even genes that poison cells from potential mates that do not carry the gene, thereby increasing their own chances of replication. Most important is the fact that the genes themselves encode the mechanisms whereby the genome is reproduced and translated into an organism. In genetic algorithms, those mechanisms are a separate program that is not represented within the strings being manipulated.
Artificial Intelligence: A Modern Approach. (Third edition) by Stuart Russell and Peter Norvig.
If you want to have random numbers you can use Math.random() for that. In the linked page there are also some examples for getting values between x and y for example:
// Source https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
function getRandomInt(min, max) {
min = Math.ceil(min);
max = Math.floor(max);
return Math.floor(Math.random() * (max - min)) + min;
}
I am not sure this is what you try to achieve - since you already make use of the Math.random() function.
I apologise in advance if I'm too bad at using the search engine and this has already been answered. Please point me in the right direction in that case.
I've recently begun to use the arguments variable in functions, and now I need to slice it. Everywhere I look people are doing things like:
function getArguments(args, start) {
return Array.prototype.slice.call(args, start);
}
And according to MDN this is bad for performance:
You should not slice on arguments because it prevents optimizations in JavaScript engines (V8 for example).
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/arguments
Is there a reason why I don't see anyone doing things like this:
function getArguments(args, start) {
var i, p = 0;
var len = args.length;
var params = [];
for (i = start; i < len; ++i) {
params[p] = args[i];
p += 1;
}
return params;
}
You get the arguments you want, and no slicing is done. So from my point of view, you don't loose anything on this, well maybe it uses a little extra memory and is slightly slower, but not to the point where it really makes a difference, right?
Just wanted to know if my logic here is flawed.
Here is a discuss
and here is introduction
e.g. here uses the inline slice
It appears from the discussion that #Eason posted, (here) that the debate is in the "microptimization" category, ie: most of us will never hit those performance bumps because our code isn't being run through the kind of iterations needed to even appear on the radar.
Here's a good quote that sums it up:
Micro-optimizations like this are always going to be a trade-off
between the code's complexity/readability and its performance.
In many cases, the complexity/readability is more important. In this case, the
very slowest method that was tested netted a runtime of 4.3
microseconds. If you're writing a webservice and you're slicing args
two times per request and then doing 100 ms worth of other work, an
extra 0.0086 ms will not be noticeable and it's not worth the time or
the code pollution to optimize.
These optimizations are most helpful in really hot loops that you're hitting a gajillionty times. Use a
profiler to find your hot code, and optimize your hottest code first,
until the performance you've achieved is satisfactory.
I'm satisfied, and will use Array.prototype.slice.call() unless I detect a performance blip that points to that particular piece of code not hitting the V8 optimizer.
I have created a chess game with Angular and chess.js and am trying to improve its rudimentary AI. The un-improved code currently lives at: https://gist.github.com/dexygen/8a19eba3c58fa6a9d0ff (or https://gist.githubusercontent.com/dexygen/8a19eba3c58fa6a9d0ff/raw/d8ee960cde7d30850c0f00f511619651396f5215/ng-chess)
What the AI currently consists of is checking whether the computer (black) has a move that checkmates (using chess.js' in_checkmate() method), and if so, mating the human (white), otherwise making a random move. To improve this I thought that instead of merely making a random move, I would have the AI check for white's counters to black's responses. Then, if White has checkmate, not including those black responses in the moves to randomly select from.
I would like to improve the AI within makeMove() (which currently merely delegates to makeRandomMove()) but I am finding this to be harder than expected. What I expected to be able to do was, not unlike mateNextMove() (refer to lines 155-168 of the gist), to check for in_checkmate() within a loop, except the loop will be nested to account for black responses and white counters to those responses.
Here is my first attempt at what I expected would work but it does not avoid checkmate when possible.
function makeMove(responses) {
var evaluator = new Chess();
var response;
var allowsMate;
var counters = [];
var candidates = [];
for (var i=0, n=responses.length; i<n; i++) {
response = responses[i];
allowsMate = false;
evaluator.load(chess.fen());
evaluator.move(response);
counters = evaluator.moves();
//console.log(evaluator.ascii());
//console.log(counters);
for (var j=0, k=counters.length; j<k; j++) {
evaluator.move(counters[j]);
if (evaluator.in_checkmate()) {
//console.log('in_checkmate');
allowsMate = true;
break;
}
}
if (!allowsMate) {
candidates.push(response);
}
}
return makeRandomMove(candidates);
}
In order to debug/test taking advantage of a little knowledge helps, specifically attempting an early "Scholar's Mate", see: http://en.wikipedia.org/wiki/Scholar%27s_mate. If Black's random moves make this impractical just start over, the opportunity presents itself as often as not. Qxf7# is the notation for the mating move of Scholars mate both in the wikipedia article and also as returned by chess.moves(). So I've tried to modify the inner for loop as follows:
for (var j=0, k=counters.length; j<k; j++) {
evaluator.move(counters[j]);
if (counters[j] == 'Qxf7#') {
console.log(evaluator.in_checkmate());
}
}
But I've had this return false and allow me to deliver the mate. What am I doing wrong (and who possibly wants to help me on this project)?
It seems to me from the code you posted that you are not undoing the moves you make. When you loop through all possible moves, you make that move, then check for a threat. You should then unmake the move. That is probably why your last test didn't work as well.
I am implementing a bottom-up tree transformer in JavaScript. It will be used for an interpreter for a supercombinator reducer, so this algorithm has to be as fast as possible, as it affects every program built on top of it. This is my current implementation:
function transform(tree,fn){
var root = tree,
node = tree,
child,
parent,
is_node,
dir;
root.dir = 0;
while(true) {
is_node = typeof(node)==="object";
dir = is_node ? node.dir : 2;
if (dir < 2)
child = node[dir],
node.dir++,
child.parent = parent = node,
child.dir = 0,
node = child;
else if ((changed = fn(node))!==undefined)
changed.parent = parent,
changed.dir = 0,
node = changed;
else
if (!parent)
return node;
else
parent[parent.dir-1] = node,
node = parent,
parent = node.parent;
};
};
// TEST
var tree = [[1,2],[[3,4],[5,6]]];
console.log(
JSON.stringify(transform(tree,function(a){
if (a[0]===1) return [3,[5,5]];
if (a[0]===5) return 77;
})) === "[[3,77],[[3,4],77]]");
This is obviously far from optimal. How do I make the transformer the fastest possible? Maybe instead of looking for a faster algorithm, I could manage the memory by myself, and use asm.js.
You have a couple of options, going from easiest-but-slowest to fastest-but-trickiest.
Using regular JavaScript
This is pretty much what you are doing at this point. Looking at your algorithm, I don't see anything that can really be suggested that would show anything more than an insignificant increase in speed.
Using asm.js
Using asm.js might be an option for you. This would offer a speed increase. You don't go in to a lot of details of where this system will be used, but if it works, it shouldn't be terribly difficult to implement something like this. You would likely see performance increases, but depending on how you are planning to use this, it might not be as substantial as you would like (for something like this, you'd probably see somewhere between 50%-500% increase in speed, depending how efficient the code is).
Build it in a different, compiled, typed language.
If speed is really at a premium, depending on your use case, it might be best to write this program (or at least this function) in a different language which is compiled. You could then run this compiled script on the server and communicate with it via web services.
If the number of times you need to transform the tree in a short amount of times is huge, it won't be much of a boost because of the time it would take to send and receive the data. However, if you are just doing relatively few but long-running tree transformation, you could see a huge benefit in performance. A compiled, typed language (C++, Java, etc) will always have better performance than an interpreted, typeless language like JavaScript.
The other benefit of running it on a server is you can generally throw a lot more horsepower at it, since you could write it to be multi-threaded and even run on a cluster of machines instead of just one (for a high-end build). With JavaScript, you are limited to generally one thread and also by the end-users computer.
Let me start with the questions, and then fill in the reasons/background.
Question: Are there any memory profiling tools for JavaScript?
Question: Has anybody tested performance memory management in JavaScript already?
I would like to experiment with performance memory management in JavaScript. In C/C++/Assembly I was able to allocate a region of memory in one giant block, then map my data structures to that area. This had several performance advantages, especially for math heavy applications.
I know I cannot allocate memory and map my own data structures in JavaScript (or Java for that matter). However, I can create a stack/queue/heap with some predetermined number of objects, for example Vector objects. When crunching numbers I often need just a few such objects at any one time, but generate a large number over time. By reusing the old vector objects I can avoid the create/delete time, unnecessary garbage collection time, and potentially large memory footprint while waiting for garbage collection. I also hypothesize that they will all stay fairly close in memory because they were created at the same time and are being accessed frequently.
I would like to test this, but I am coming up short for memory profiling tools. I tried FireBug, but it does not tell you how much memory the JavaScript engine is currently allocating.
I was able to code a simple test for CPU performance (see below). I compared a queue with 10 "Vector" objects to using new/delete each time. To make certain I wasn't just using empty data, I assigned the Vector 6 floating point properties, a three value array (floats), and an 18 character string. Each time I created a vector, using either method, I would set all the values to 0.0.
The results were encouraging. The explicit management method was initially faster, but the javascript engine had some caching and it caught up after running the test a couple times. The most interesting part was that FireBug crashed when I tried to run standard new/delete on on 10 million objects, but worked just fine for my queue method.
If I can find memory profiling tools, I would like to test this on different structures (array, heap, queue, stack). I would also like to test it on a real application, perhaps a super simple ray tracer (quick to code, can test very large data sets with lots of math for nice profiling).
And yes, I did search before creating this question. Everything I found was either a discussion of memory leaks in JavaScript or a discussion of GC vs. Explicit Management.
Thanks,
JB
Standard Method
function setBaseVectorValues(vector) {
vector.x = 0.0;
vector.y = 0.0;
vector.z = 0.0;
vector.theta = 0.0;
vector.phi = 0.0;
vector.magnitude = 0.0;
vector.color = [0.0, 0.0, 0.0];
vector.description = "a blank base vector";
}
function standardCreateObject() {
var vector = new Object();
setBaseVectorValues(vector);
return vector;
}
function standardDeleteObject(obj) {
delete obj;
}
function testStandardMM(count) {
var start = new Date().getTime();
for(i=0; i<count; i++) {
obj = standardCreateObject();
standardDeleteObject(obj);
}
var end = new Date().getTime();
return "Time: " + (end - start)
}
Managed Method
I used the JavaScript queue from http://code.stephenmorley.org/javascript/queues/
function newCreateObject() {
var vector = allocateVector();
setBaseVectorValues(vector);
return vector;
}
function newDeleteObject(obj) {
queue.enqueue(obj);
}
function newInitObjects(bufferSize) {
queue = new Queue()
for(i=0; i<bufferSize; i++) {
queue.enqueue(standardCreateObject());
}
}
function allocateVector() {
var vector
if(queue.isEmpty()) {
vector = new Object();
}else {
vector = queue.dequeue();
}
return vector;
}
function testNewMM(count) {
start = new Date().getTime();
newInitObjects(10);
for(i=0; i<count; i++) {
obj = newCreateObject();
newDeleteObject(obj);
obj = null;
}
end = new Date().getTime();
return "Time: " + (end - start) + "Vectors Available: " + queue.getLength();
}
The chrome inspector has a decent javascript profiling tool. I'd try that...
I have never seen such a tool but, in actuality, javascript [almost] never runs independently; it is [almost] always hosted within another application (e.g. your browser). It does not really matter how much memory is associated with your specific data structures, what matters is how the overall memory consumption of the host application is affected by your scripts.
I would recommend finding a generic memory profiling tool for your OS and pointing it at your browser. Run a single page and profile the browser's change in memory consumption before and after triggering your code.
The only exception to what I said above that I can think of right now is node.js... If you are using node then you can use process.memoryUsage().
Edit: Oooo... After some searching, it appears that Chrome has some sweet tools as well. (+1 for Michael Berkompas). I still stand by my original statement, that it is actually more important to see how the memory usage of the browser process itself is affected, but the elegance of the Chrome tools is impressive.