How to make random mutations - javascript

I'm doing a project on natural selection for cells for fun. Each code has "dna" which is just a set of instructions. The dna can either have REMOVE WASTE, DIGEST FOOD, or REPAIR WALL. I won't really go into detail what they do, because that would take too long. But the only reason evolution really happens is through genetic mutations. I'm wondering if this is possible in javascript, and how to do it. For example, the starting cell has 5 dna strands. But if it reproduces, the child can have 4, or 6. And some of the dna strands can be altered. This is my code so far:
var strands = ["DIGEST FOOD", "REPAIR WALL", "REMOVE WASTE"];
var dna = [];
for (let i = 0; i < 5; i++) {
if (parent) {
// something about the parents dna, and the mutation chance
}
else {
dna.push(strands[Math.floor(Math.random() * 3)]); // if cell doesn't have parent
}
}
I'm just wondering if this is possible in javascript, and how to succesfully do it. Sorry if the question isn't too clear.
Edit: Let me rephrase a little. What I'm trying to achieve is a genetic mutation in the new cell. Like:
if (parent) {
dna.push(parent);
if (Math.random() < 0.5) {
changeStrand(num);
}
if (Math.random() < 0.5) {
addStrand(num);
}
if (Math.random() < 0.5) {
removeStrand(num);
}
}
function changeStrand() {
// change the strand
}
function newStrand(num) {
// add random strands
}
function removeStrand(num) {
// remove random strands
}
or something like that

For a genetic algorithm, you basically want to take two slices from each parent and stitch them together, whilst ensuring the end result is still a valid dna strand.
For a fixed sized DNA sequence (such a N queens positions), the technique would be to pick a random slice point (1-3 | 4-8) and then combine these slices from the parents to create a child.
For your usecase, you need two random slices who sum of sizes adds upto 4-6. So possibly two slices of size 2-3. You could potentially take one from the front, and the other from the back. Else you could first pick a random output size, and then fill it will two random sequences for either parent.
Array.slice() and Array.splice() are probably the functions you want to use.
You can also add in a random mutation to the end result. Viruses at the speed limit of viable genetic evolution have an average of 1 mutation per transcription. Which means some transcriptions won’t have mutations, which is equivalent to allowing some of the parents in the parent generation to survive.
You can also experiment with different variations. Implement these as feature flags, and see what works best in practice.
Also compare with Beam Search, which essentially keeps a copy of the N best results from each generation. You may or may not want to keep the best from the parent generation to survive unmutated.
Another idea is to compute a distance metric between individuals, and add a cost for being too close to an existing member of the population, and this will select for genetic diversity.
In the standard model, variation occurs both by point mutations in the letter sequence and by “crossover” (in which the DNA of an offspring is generated by combining long sections of DNA from each parent).
The analogy to local search algorithms has already been described; the principal difference between stochastic beam search and evolution is the use of sexual reproduction, wherein successors are generated from multiple organisms rather than just one. The actual mechanisms of evolution are, however, far richer than most genetic algorithms allow. For example, mutations can involve reversals, duplications, and movement of large chunks of DNA; some viruses borrow DNA from one organism and insert it in another, and there are transposable genes that do nothing but copy themselves many thousands of times within the genome. There are even genes that poison cells from potential mates that do not carry the gene, thereby increasing their own chances of replication. Most important is the fact that the genes themselves encode the mechanisms whereby the genome is reproduced and translated into an organism. In genetic algorithms, those mechanisms are a separate program that is not represented within the strings being manipulated.
Artificial Intelligence: A Modern Approach. (Third edition) by Stuart Russell and Peter Norvig.

If you want to have random numbers you can use Math.random() for that. In the linked page there are also some examples for getting values between x and y for example:
// Source https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
function getRandomInt(min, max) {
min = Math.ceil(min);
max = Math.floor(max);
return Math.floor(Math.random() * (max - min)) + min;
}
I am not sure this is what you try to achieve - since you already make use of the Math.random() function.

Related

Random IDs in JavaScript

I'm generating random IDs in javascript which serve as unique message identifiers for an analytics suite.
When checking the data (more than 10MM records), there are some minor collisions for some IDs for various reasons (network retries, robots faking data etc), but there is one in particular which has an intriguing number of collisions: akizow-dsrmr3-wicjw1-3jseuy.
The collision rate for the above id is at around 0.0037% while the rate for the other id collisions is under 0.00035% (10 times less) out of a sample of 111MM records from the same day. While the other ids are varying from day to day, this one remains the same, so for a longer period the difference is likely larger than 10x.
This is how the distribution of the top ID collisions looks like
This is the algorithm used to generate the random IDs:
function generateUUID() {
return [
generateUUID4(), generateUUID4(), generateUUID4(), generateUUID4()
].join("-");
}
function generateUUID4() {
return Math.abs(Math.random() * 0xFFFFFFFF | 0).toString(36);
}
I reversed the algorithm and it seems like for akizow-dsrmr3-wicjw1-3jseuy the browser's Math.random() is returning the following four numbers in this order: 0.1488114111471948, 0.19426893796638328, 0.45768366415465334, 0.0499740378116197, but I don't see anything special about them. Also, from the other data I collected it seems to appear especially after a redirect/preload (e.g. google results, ad clicks etc).
So I have 3 hypotheses:
There's a statistical problem with the algorithm that causes this specific collision
Redirects/preloads are somehow messing with the seed of the pseudo-random generator
A robot is smart enough that it fakes all the other data but for some reason is keeping the random id the same. The data comes from different user agents, IPs, countries etc.
Any idea what could cause this collision?

3D Grid for multiple shapes

A few months ago I made a small terrain generator, like Minecraft, for a school project.
The way I did this was by using multiple chunks. Each chunk contained a 3-dimensional array that stored the blocks.
Every position in this array corresponded with the position of the block it contained.
blocks[x, y, z] = new Block();
Now I would like to add different sizes if blocks. However, I can't do that with the way I am storing the blocks right now, because bigger blocks would have to be spread over multiple positions in the 3-dimensional array.
An example of a game with different sizes of blocks (and different shapes) is LEGO Worlds. How does a game like this store all these little blocks?
I hope someone can help me with this.
The language I am using is Javascript in combination with WebGL.
Thanks in advance!
In my experience there are a few different ways of tackling an issue like this, but the one I'd recommend would depend on the amount of time you have to work on this and the scope (how big) you wanted to make this game.
Your Current Approach
At the moment I think your using what most people would consider the most straightforward approach by storing the voxels in a 3D grid
[Source].
But two problems you seem to be having is that there isn't an obvious way to create blocks that are bigger then 1x1 and that a 3D grid for a world space is fairly inefficient in terms of memory usage (As for an array you have to have memory allocated for every cell, including empty space. JavaScript is no different).
An Alternative Approach
An alternative to using a 3D array would be to instead use a different data structure, the full name being a sparse voxel octree.
This to put it simply is a tree data structure that works by subdividing an area of space until everything has been stored.
The 2D form of this where a square sub divides into four smaller quadrants is called a quad tree and likewise a 3D equivalent divides into eight quadrants, called an octree. This approach is generally preferable when possible as its much more efficient because the trees only occupy more memory when its absolutely essential and they can also be packed into a 1D array (Technically a 3D array can be too).
A common tactic used with quad/octrees in some block based games is to take a region of the same kind of voxel that fit into one larger quadrant of the tree is to simply stop sub division there, as there's no reason to go deeper if all the data is the same.
The other optimization they can make is called sparse where regions of empty space (air) are simply deleted since empty space doesn't do anything special and its location can be inferred.
[SVO Source]
[Z Order Curve Source]
Recommended Approach
Unless you have a few months to complete your game and you're at university I seriously wouldn't recommend an SVO (Though reading up about could impress any teachers you have). Instead I'd recommend taking the same approach that Minecraft appears to visibly has. E.G. A door is 1X2 but blocks can only be 1x1, then just make it two blocks.
In the example of a door you would have four unique blocks in total, two for the upper and lower half, and two variations of each being opened or closed.
E.G.
var cubeProgram; // shader program
var cubeVBO; // vertex buffer (I recommend combining vertex & UV coords)
var gl; // rendering context
// Preset list of block ID's
var BLOCK_TYPES = {
DOOR_LOWER_OPEN: 0,
DOOR_UPPER_OPEN: 1,
DOOR_LOWER_CLOSED: 2,
DOOR_UPPER_CLOSED: 3,
}
var BLOCK_MESHES = {
GENERIC_VBO: null,
DOOR_UPPER_VBO: null
DOOR_LOWER_VBO: null
}
// Declare a Door class using ES6 syntax
class Door {
// Assume X & Y are the lower half of the door
constructor(x,y,map) {
if (y - 1 > -1) {
console.error("Error: Top half of the door goes outside the map");
return;
}
this.x = x;
this.y = y;
map[x][y ] = BLOCK_TYPES.DOOR_LOWER_OPEN;
map[x][y-1] = BLOCK_TYPES.DOOR_UPPER_OPEN;
}
}

I need to make my function return a more organic collection of results

Whatever it is I'm doing, I don't know what it's called, but I need help because I know it can be done with math. This is for a simulation I'm building, and the role it plays is very difficult to explain, but it has something to do with defining the properties of an object.
Here is my JavaScript: https://jsfiddle.net/vdocnmzu/
DM.prototype.get = function(coords){
var dist;
val = 0;
for(var j,i = 0; i < this.distortions.length; i += 1){
dist = 0;
for(j = 0; j < coords.length; j += 1){
dist += Math.pow( coords[j] - this.distortions[i].coords[j], 2);
}
dist = Math.pow(dist,.5);
if( dist <= this.distortions[i].range){
val += Math.cos( (dist/this.distortions[i].range) * Math.PI/2 ) * this.distortions[i].amp;//;
}
}
return val;
}
What's happening is this: I have this 3D cube, where I can pick x & y, and get Z(the grayscale pixel color). In this sample code, I'm picking a grid of points across the entire x,y plane of the cube. The "bubbles" you see (you may need to refresh a few times) are multiple points being picked and creating that image.
What I'm trying to do is not have bubbles, but rather, organic flows between bubbles.
Right now, the z value comes from these "distortion points" that each of these 3DCubes have. It can have any amount of these points.
These "distortion points" don't have to be points. They can be sets of points, or lines, or any type of base geometry to define the skeleton of some type of distance function.
I think that distance function is what I'm struggling with, because I only know how to do it with points. I feel like lines would still be too rigid. What's the math associated with doing this with curves? Distance to a curve? Are there more approaches to this? If there's not a good single 1 to pick, it's okay to have a collection as well.
Your question is very complicated to understand. The overall feeling is that your expectations are too high. Some advanced math 101 might help (feel free to google buzzwords):
Defining a curve is an very hard problem that challenged the brightest mathematicians of the history. From the naive approach of the greeks, through the calculus of Newton and Leibniz, passing by Euler and Gauss, to the mathematical analysis of Weisstreiss, the word curve changed meaning several times. The accepted definition nowadays says that curves are continous functions in two variables, where continous is a very special word that has an exact meaning coined in the 19th century (naively is a function without jumps from one value to another). Togheter with the notion of continuity, came the notions of connected, compact, differentiable (and so on) curves, which defined new conditions for special curves. The subject developed to what is now known as topology and mathematical analysis.
Mathematicians usually uses definitions to reproduce a class of ideas that can be brought and thought togheter. To their surprise, the definition of continuity did include really weird functions to be curves: space-filling-curves, fractals!!! They called them monsters at the time.
After this introduction, lets go back to your question. You need a geometrical object to calculate distances from a point. Lets avoid weird curves and go from continous to differentiable. Now it's better. A (conected compact) differentiable function can be expanded in Taylor series, for example, which means that all functions of this class can be written as an infinite sum of polynomial functions. In two dimensions, you need to calculate matrices involved in this expansion (Calculus in many variables is a pre-requisite). Another step further is truncating this expansion in some degree, lets say 3. Then the general curve in this case is: ax + by + cx^2 + dy^2 + ex^3 + fy^3 + gx^2y + hxy^2 + ixy + j = 0 (ab...j are free parameters). Oh! This is reasonable, you might think. Well, actually there is a name for this kind of curve: algebraic curve of deggre 3. This is an active research theme of algebraic geometry, which is a very hard field even among mathematicians. Generally speaking, there are milestone theorems about the general behavior of those curves, which involves singularities and intersection points that are allowed in the general case.
In essence, what you are looking for does not exist, and is a very hard subject. Your algorithm works with points (really cool pictures by the way) and you should baby step it into a straight line. This step already requires you to think about how to calculate distance between a point and a straight line. This is another subject that was developed in general in the 19th century, togheter with mathematical analysis: metric spaces. The straightfoward answer to this question is defining the distance between a point and a line to be the smallest distance from the point to all line points. In this case, it can be shown that the distance is the modulus of the vector that connects the point to the line in a 90 degrees angle. But this is just one definition among infinte possible ones. To be considered a distance (like the one I just described and the euclidean distance) there is a set of axioms that needs to be verified. You can have hyperbolic metrics, discrete metrics, metrics that count words, letters, LotsOfFamousPeople metric spaces... the possibilities are infinite.
So, baby steps. Do it with straight lines and euclidean minimum distance metric. Play around with other metrics you find on google. Understand the axioms and make your own!!! Going to second degree polynomials is already a big challenge, as you have to understand everything that those curves can make (they can really do weird unexpect stuff) and define a distance to it (metric space).
Well thats it! Good luck with your project. Looks really cool!

Calculating bytes per second (the smooth way)

I am looking for a solution to calculate the transmitted bytes per second of a repeatedly invoked function (below). Due to its inaccuracy, I do not want to simply divide the transmitted bytes by the elapsed overall time: it resulted in the inability to display rapid speed changes after running for a few minutes.
The preset (invoked approximately every 50ms):
function uploadProgress(loaded, total){
var bps = ?;
$('#elem').html(bps+' bytes per second');
};
How to obtain the average bytes per second for (only) the last n seconds and is it a good idea?
What other practices for calculating a non-flickering but precise bps value are available?
Your first idea is not bad, it's called a moving average, and providing you call your update function in regular intervals you only need to keep a queue (a FIFO buffer) of a constant length:
var WINDOW_SIZE = 10;
var queue = [];
function updateQueue(newValue) {
// fifo with a fixed length
queue.push(newValue);
if (queue.length > WINDOW_SIZE)
queue.shift();
}
function getAverageValue() {
// if the queue has less than 10 items, decide if you want to calculate
// the average anyway, or return an invalid value to indicate "insufficient data"
if (queue.length < WINDOW_SIZE) {
// you probably don't want to throw if the queue is empty,
// but at least consider returning an 'invalid' value in order to
// display something like "calculating..."
return null;
}
// calculate the average value
var sum = 0;
for (var i = 0; i < queue.length; i++) {
sum += queue[i];
}
return sum / queue.length;
}
// calculate the speed and call `updateQueue` every second or so
var updateTimer = setInterval(..., 1000);
An even simpler way to avoid sudden changes in calculated speed would be to use a low-pass filter. A simple discrete approximation of the PT1 filter would be:
Where u[k] is the input (or actual value) at sample k, y[k] is the output (or filtered value) at sample k, and T is the time constant (larger T means that y will follow u more slowly).
That would be translated to something like:
var speed = null;
var TIME_CONSTANT = 5;
function updateSpeed(newValue) {
if (speed === null) {
speed = newValue;
} else {
speed += (newValue - speed) / TIME_CONSTANT;
}
}
function getFilteredValue() {
return speed;
}
Both solutions will give similar results (for your purpose at least), and the latter one seems a bit simpler (and needs less memory).
Also, I wouldn't update the value that fast. Filtering will only turn "flickering" into "swinging" at a refresh rate of 50ms. I don't think anybody expects to have an upload speed shown at a refresh rate of more than once per second (or even a couple of seconds).
A simple low-pass filter is ok for just making sure that inaccuracies don't build up. But if you think a little deeper about measuring transfer rates, you get into maintaining separate integer counters to do it right.
If you want it to be an exact count, note that there is a simplification available. First, when dealing with rates, arithmetic mean of them is the wrong thing to apply to bytes/sec (sec/byte is more correct - which leads to harmonic mean). The other problem is that they should be weighted. Because of this, simply keeping int64 running totals of bytes versus observation time actually does the right thing - as stupid as it sounds. Normally, you are weighting by 1/n for each w. Look at a neat simplification that happens when you weigh by time:
(w0*b0/t0 + w1*b1/t1 + w2*b2/t2 + ...)/(w0+w1+w2+...)
totalBytes/totalWeight
(b0+b1+b2+...)/(w0+w1+w2+...)
So just keep separate (int64!) totals of bytes and milliseconds. And only divide them as a rendering step to visualize the rate. Note that if you instead used the harmonic mean (which you should do for rates - because you are really averaging sec/byte), then that's the same as the time it takes to send a byte, weighted by how many bytes there were.
1 / (( w0*t0/b0 + w1*t1/b0 + ... )/(w0+w1+w2+...)) =
totalBytes/totalTime
So arithmetic mean weighted by time is same as harmonic mean weighted by bytes. Just keep a running total of bytes in one var, and time in another. There is a deeper reason that this simplistic count actually the right one. Think of integrals. Assuming no concurrency, this is literally just total bytes transferred divided by total observation time. Assume that the computer actually takes 1 step per millisecond, and only sends whole bytes - and that you observe the entire time interval without gaps. There are no approximations.
Notice that if you think about an integral with (msec, byte/msec) as the units for (x,y), the area under the curve is the bytes sent during the observation period (exactly). You will get the same answer no matter how the observations got cut up. (ie: reported 2x as often).
So by simply reporting (size_byte, start_ms,stop_ms), you just accumulate (stop_ms-start_ms) into time and accumulate size_byte per observation. If you want to partition these rates to graph in minute buckets, then just maintain the (byte,ms) pair per minute (of observation).
Note that these are rates experienced for individual transfers. The individual transfers may experience 1MB/s (user point of view). These are the rates that you guarantee to end users.
You can leave it here for simple cases. But doing this counting right, allows for more interesting things.
From the server point of view, load matters. Presume that there were two users experiencing 1MB/s simultaneously. For that statistic, you need to subtract out the double-counted time. If 2 users do 1MB/s simultaneously for 1s, then that's 2MB/s for 1s. You need to effectively reconstruct time overlaps, and subtract out the double-counting of time periods. Explicitly logging at the end of a transfer (size_byte,start_ms,stop_ms) allows you to measure interesting things:
The number of outstanding transfers at any given time (queue length distribution - ie: "am I going to run out of memory?")
The throughput as a function of the number of transfers (throughput for a queue length - ie: "does the website collapse when our ad shows on TV?")
Utilization - ie: "are we overpaying our cloud provider?"
In this situation, all of the accumulated counters are exact integer arithmetic. Subtracting out the double-counted time suddenly gets you into more complicated algorithms (when computed efficiently and in real-time).
Use a decaying average, then you won't have to keep the old values around.
UPDATE: Basically it's a formula like this:
average = new_value * factor + average_old * (100 - factor);
You don't have to keep any old values around, they're all in the there at smaller and smaller proportions. You have to choose a value for factor that are appropriate to the mix of new and old values you want, and how often the average gets updated.
This is how the Unix "load average" is calculated I believe.

Connecting Rooms

I've created a simple algorithm for a game I'm working on that creates a cave like structure. The algorithm outputs a 2 dimensional array of bits that represent the open area's. Example:
000000000000000000000000
010010000000000111100000
011110000000011111111000
011111110000011111111100
011111111001111111111110
011000000000001111000000
000000000000000000000000
(0's represent wall, 1's represent open areas)
The problem is that the algorithm can sometimes create a cave that has 2 non connected sections (as in the above example). I've written a function that gives me an array of arrays that contain all the x, y positions of the open spots for each area
My question is, given a number of lists that contain all of the x,y coordinates for each open area what is the fastest way to "connect" these area's be a corridor that is a minimum of 2 thickness wide.
(I'm writing this in javascript but even just pseudo code will help me out)
I've tried comparing the distances from every point in one area to every other area in another area, finding the two points that have the closest distance then cutting out a path from those 2 two points but this approach is way to slow I'm hoping there is another way.
Given two caves A and B, choose a point x in A and y in B (at random will do, the two closest or locally closest is better). Drill a corridor of thickness 2 between A and B (use Bresenham's algorithm). If you have multiple disconnected caves, do the above for each edge (A,B) of the minimal spanning tree of the graph of all the caves (edge weight is the length of the corridor you'll drill if you choose this edge).
Edit for the edit: to approximate the distance between two caves, you can use hill climbing. It will return the global minimum for convex caves in O(n) rather than the naive O(n2). For non-convex caves, do multiple iterations of hill climbing with initial guess chosen in random.
If you need the exactly minimal solution, you can consider first building the frontiers of your caves and then applying O(nm) algorithm. This will eliminate the need to compare distances between interior points of your caves. Then as soon as you know the distances between each pair of caves, you build the minimal spanning tree, then you drill your tunnels.
Since I don't know too much from your description, here are some hints I would consider:
How do you look for the pair of nearest points? Do you use a naive brute-force approach and thus obtain a run time of O(n*n)? Or are you using a more efficient variant taking O(n log n) time?
If you have obtained the closest points, I'd use a simple line-drawing algorithm.
Another approach might be that you generate a structure that definitely has only one single connected area. Therefore you could do the following: First you take a random cell (x,y) and set it to 1. Then, you traverse all it's neighbours and for each of them you randomly set it to 1 or leave it at 0. For each cell set to 1, you do the same, i.e. you traverse it's neighbours and set them randomly to 1 or 0. This guarantees that you won't have two separate areas.
An algorithm to ensure this could be the following (in python):
def setCell(x,y,A):
if x>=len(A) or y>=len(A[0]) or x<0 or y<0:
return
A[x][y] = 1
def getCell(x,y,A):
if x>=len(A) or y>=len(A[0]) or x<0 or y<0:
return 1
return A[x][y]
def generate(height, width):
A = [[0 for _ in xrange(width)] for _ in xrange(height)]
from random import randint
import Queue
(x,y) = (randint(0, height-1), randint(0, width-1))
setCell (x,y,A)
q = Queue.Queue()
q.put((x,y))
while not q.empty():
(x,y) = q.get()
for (nx, ny) in [(x+1,y), (x-1,y), (x,y+1), (x,y-1)]:
if randint(0,8)<=6:
if getCell(nx,ny,A)==0:
setCell(nx,ny,A)
if randint(0,2)<=1:
q.put((nx,ny))
return A
def printField(A):
for l in A:
for c in l:
print (" " if c==1 else "X"),
print ""
Then printField(generate(20,30)) does the job. Probably you'll have to adjust the parameters for random stuff so it fits your needs.

Categories

Resources