Calculating bytes per second (the smooth way) - javascript

I am looking for a solution to calculate the transmitted bytes per second of a repeatedly invoked function (below). Due to its inaccuracy, I do not want to simply divide the transmitted bytes by the elapsed overall time: it resulted in the inability to display rapid speed changes after running for a few minutes.
The preset (invoked approximately every 50ms):
function uploadProgress(loaded, total){
var bps = ?;
$('#elem').html(bps+' bytes per second');
};
How to obtain the average bytes per second for (only) the last n seconds and is it a good idea?
What other practices for calculating a non-flickering but precise bps value are available?

Your first idea is not bad, it's called a moving average, and providing you call your update function in regular intervals you only need to keep a queue (a FIFO buffer) of a constant length:
var WINDOW_SIZE = 10;
var queue = [];
function updateQueue(newValue) {
// fifo with a fixed length
queue.push(newValue);
if (queue.length > WINDOW_SIZE)
queue.shift();
}
function getAverageValue() {
// if the queue has less than 10 items, decide if you want to calculate
// the average anyway, or return an invalid value to indicate "insufficient data"
if (queue.length < WINDOW_SIZE) {
// you probably don't want to throw if the queue is empty,
// but at least consider returning an 'invalid' value in order to
// display something like "calculating..."
return null;
}
// calculate the average value
var sum = 0;
for (var i = 0; i < queue.length; i++) {
sum += queue[i];
}
return sum / queue.length;
}
// calculate the speed and call `updateQueue` every second or so
var updateTimer = setInterval(..., 1000);
An even simpler way to avoid sudden changes in calculated speed would be to use a low-pass filter. A simple discrete approximation of the PT1 filter would be:
Where u[k] is the input (or actual value) at sample k, y[k] is the output (or filtered value) at sample k, and T is the time constant (larger T means that y will follow u more slowly).
That would be translated to something like:
var speed = null;
var TIME_CONSTANT = 5;
function updateSpeed(newValue) {
if (speed === null) {
speed = newValue;
} else {
speed += (newValue - speed) / TIME_CONSTANT;
}
}
function getFilteredValue() {
return speed;
}
Both solutions will give similar results (for your purpose at least), and the latter one seems a bit simpler (and needs less memory).
Also, I wouldn't update the value that fast. Filtering will only turn "flickering" into "swinging" at a refresh rate of 50ms. I don't think anybody expects to have an upload speed shown at a refresh rate of more than once per second (or even a couple of seconds).

A simple low-pass filter is ok for just making sure that inaccuracies don't build up. But if you think a little deeper about measuring transfer rates, you get into maintaining separate integer counters to do it right.
If you want it to be an exact count, note that there is a simplification available. First, when dealing with rates, arithmetic mean of them is the wrong thing to apply to bytes/sec (sec/byte is more correct - which leads to harmonic mean). The other problem is that they should be weighted. Because of this, simply keeping int64 running totals of bytes versus observation time actually does the right thing - as stupid as it sounds. Normally, you are weighting by 1/n for each w. Look at a neat simplification that happens when you weigh by time:
(w0*b0/t0 + w1*b1/t1 + w2*b2/t2 + ...)/(w0+w1+w2+...)
totalBytes/totalWeight
(b0+b1+b2+...)/(w0+w1+w2+...)
So just keep separate (int64!) totals of bytes and milliseconds. And only divide them as a rendering step to visualize the rate. Note that if you instead used the harmonic mean (which you should do for rates - because you are really averaging sec/byte), then that's the same as the time it takes to send a byte, weighted by how many bytes there were.
1 / (( w0*t0/b0 + w1*t1/b0 + ... )/(w0+w1+w2+...)) =
totalBytes/totalTime
So arithmetic mean weighted by time is same as harmonic mean weighted by bytes. Just keep a running total of bytes in one var, and time in another. There is a deeper reason that this simplistic count actually the right one. Think of integrals. Assuming no concurrency, this is literally just total bytes transferred divided by total observation time. Assume that the computer actually takes 1 step per millisecond, and only sends whole bytes - and that you observe the entire time interval without gaps. There are no approximations.
Notice that if you think about an integral with (msec, byte/msec) as the units for (x,y), the area under the curve is the bytes sent during the observation period (exactly). You will get the same answer no matter how the observations got cut up. (ie: reported 2x as often).
So by simply reporting (size_byte, start_ms,stop_ms), you just accumulate (stop_ms-start_ms) into time and accumulate size_byte per observation. If you want to partition these rates to graph in minute buckets, then just maintain the (byte,ms) pair per minute (of observation).
Note that these are rates experienced for individual transfers. The individual transfers may experience 1MB/s (user point of view). These are the rates that you guarantee to end users.
You can leave it here for simple cases. But doing this counting right, allows for more interesting things.
From the server point of view, load matters. Presume that there were two users experiencing 1MB/s simultaneously. For that statistic, you need to subtract out the double-counted time. If 2 users do 1MB/s simultaneously for 1s, then that's 2MB/s for 1s. You need to effectively reconstruct time overlaps, and subtract out the double-counting of time periods. Explicitly logging at the end of a transfer (size_byte,start_ms,stop_ms) allows you to measure interesting things:
The number of outstanding transfers at any given time (queue length distribution - ie: "am I going to run out of memory?")
The throughput as a function of the number of transfers (throughput for a queue length - ie: "does the website collapse when our ad shows on TV?")
Utilization - ie: "are we overpaying our cloud provider?"
In this situation, all of the accumulated counters are exact integer arithmetic. Subtracting out the double-counted time suddenly gets you into more complicated algorithms (when computed efficiently and in real-time).

Use a decaying average, then you won't have to keep the old values around.
UPDATE: Basically it's a formula like this:
average = new_value * factor + average_old * (100 - factor);
You don't have to keep any old values around, they're all in the there at smaller and smaller proportions. You have to choose a value for factor that are appropriate to the mix of new and old values you want, and how often the average gets updated.
This is how the Unix "load average" is calculated I believe.

Related

Is there a reliable way to measure performance of code in JavaScript?

If you look a below code you will see measuring of performance of very simple for loop.
var res = 0;
function runs(){
var a1 = performance.now();
var x = 0;
for(var i=0;i<10**9;i++) {
x++;
}
var a2 = performance.now();
res += (a2-a1);
}
for(var j=0;j<10;j++){
runs();
}
console.log(`=${res/10}`);
Additionally, just for a good measure, this will run 10 times and average results. Now, issue with this is that it is not reliable, it highly depends on your CPU, memory and other programs running on your device.
First time, may run 9s and second 23s, then subsequent call can result 8s.
Is there a way to measure performance regardless of CPU, memory and everything else?
I am after something that will give relative number of FLOPS or any other measure that when you compare two codes you will exactly know that one code executes faster than the other.
For instance for loop with 1005 will always show slower than one with 1000 iterations.
Note: Saying FLOPS is wrong in this context as it means - floating point operations per second.
I would like to exclude time completely, I only need FPOs. Meaning I do not care about seconds, just about reliably knowing that regardless of device if you have same code it will always take let say 2000 FPOs to execute.

Time complexity of splitting strings and sorting [duplicate]

I have gone through Google and Stack Overflow search, but nowhere I was able to find a clear and straightforward explanation for how to calculate time complexity.
What do I know already?
Say for code as simple as the one below:
char h = 'y'; // This will be executed 1 time
int abc = 0; // This will be executed 1 time
Say for a loop like the one below:
for (int i = 0; i < N; i++) {
Console.Write('Hello, World!!');
}
int i=0; This will be executed only once.
The time is actually calculated to i=0 and not the declaration.
i < N; This will be executed N+1 times
i++ This will be executed N times
So the number of operations required by this loop are {1+(N+1)+N} = 2N+2. (But this still may be wrong, as I am not confident about my understanding.)
OK, so these small basic calculations I think I know, but in most cases I have seen the time complexity as O(N), O(n^2), O(log n), O(n!), and many others.
How to find time complexity of an algorithm
You add up how many machine instructions it will execute as a function of the size of its input, and then simplify the expression to the largest (when N is very large) term and can include any simplifying constant factor.
For example, lets see how we simplify 2N + 2 machine instructions to describe this as just O(N).
Why do we remove the two 2s ?
We are interested in the performance of the algorithm as N becomes large.
Consider the two terms 2N and 2.
What is the relative influence of these two terms as N becomes large? Suppose N is a million.
Then the first term is 2 million and the second term is only 2.
For this reason, we drop all but the largest terms for large N.
So, now we have gone from 2N + 2 to 2N.
Traditionally, we are only interested in performance up to constant factors.
This means that we don't really care if there is some constant multiple of difference in performance when N is large. The unit of 2N is not well-defined in the first place anyway. So we can multiply or divide by a constant factor to get to the simplest expression.
So 2N becomes just N.
This is an excellent article: Time complexity of algorithm
The below answer is copied from above (in case the excellent link goes bust)
The most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:
statement;
Is constant. The running time of the statement will not change in relation to N.
for ( i = 0; i < N; i++ )
statement;
Is linear. The running time of the loop is directly proportional to N. When N doubles, so does the running time.
for ( i = 0; i < N; i++ ) {
for ( j = 0; j < N; j++ )
statement;
}
Is quadratic. The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
while ( low <= high ) {
mid = ( low + high ) / 2;
if ( target < list[mid] )
high = mid - 1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}
Is logarithmic. The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
void quicksort (int list[], int left, int right)
{
int pivot = partition (list, left, right);
quicksort(list, left, pivot - 1);
quicksort(list, pivot + 1, right);
}
Is N * log (N). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.
In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they're not nearly as common. Big O notation is described as O ( <type> ) where <type> is the measure. The quicksort algorithm would be described as O (N * log(N )).
Note that none of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it's also more complex that I've shown. There are also other notations such as big omega, little o, and big theta. You probably won't encounter them outside of an algorithm analysis course. ;)
Taken from here - Introduction to Time Complexity of an Algorithm
1. Introduction
In computer science, the time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the string representing the input.
2. Big O notation
The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms. When expressed this way, the time complexity is said to be described asymptotically, i.e., as the input size goes to infinity.
For example, if the time required by an algorithm on all inputs of size n is at most 5n3 + 3n, the asymptotic time complexity is O(n3). More on that later.
A few more examples:
1 = O(n)
n = O(n2)
log(n) = O(n)
2 n + 1 = O(n)
3. O(1) constant time:
An algorithm is said to run in constant time if it requires the same amount of time regardless of the input size.
Examples:
array: accessing any element
fixed-size stack: push and pop methods
fixed-size queue: enqueue and dequeue methods
4. O(n) linear time
An algorithm is said to run in linear time if its time execution is directly proportional to the input size, i.e. time grows linearly as input size increases.
Consider the following examples. Below I am linearly searching for an element, and this has a time complexity of O(n).
int find = 66;
var numbers = new int[] { 33, 435, 36, 37, 43, 45, 66, 656, 2232 };
for (int i = 0; i < numbers.Length - 1; i++)
{
if(find == numbers[i])
{
return;
}
}
More Examples:
Array: Linear Search, Traversing, Find minimum etc
ArrayList: contains method
Queue: contains method
5. O(log n) logarithmic time:
An algorithm is said to run in logarithmic time if its time execution is proportional to the logarithm of the input size.
Example: Binary Search
Recall the "twenty questions" game - the task is to guess the value of a hidden number in an interval. Each time you make a guess, you are told whether your guess is too high or too low. Twenty questions game implies a strategy that uses your guess number to halve the interval size. This is an example of the general problem-solving method known as binary search.
6. O(n2) quadratic time
An algorithm is said to run in quadratic time if its time execution is proportional to the square of the input size.
Examples:
Bubble Sort
Selection Sort
Insertion Sort
7. Some useful links
Big-O Misconceptions
Determining The Complexity Of Algorithm
Big O Cheat Sheet
Several examples of loop.
O(n) time complexity of a loop is considered as O(n) if the loop variables is incremented / decremented by a constant amount. For example following functions have O(n) time complexity.
// Here c is a positive integer constant
for (int i = 1; i <= n; i += c) {
// some O(1) expressions
}
for (int i = n; i > 0; i -= c) {
// some O(1) expressions
}
O(nc) time complexity of nested loops is equal to the number of times the innermost statement is executed. For example, the following sample loops have O(n2) time complexity
for (int i = 1; i <=n; i += c) {
for (int j = 1; j <=n; j += c) {
// some O(1) expressions
}
}
for (int i = n; i > 0; i += c) {
for (int j = i+1; j <=n; j += c) {
// some O(1) expressions
}
For example, selection sort and insertion sort have O(n2) time complexity.
O(log n) time complexity of a loop is considered as O(log n) if the loop variables is divided / multiplied by a constant amount.
for (int i = 1; i <=n; i *= c) {
// some O(1) expressions
}
for (int i = n; i > 0; i /= c) {
// some O(1) expressions
}
For example, [binary search][3] has _O(log n)_ time complexity.
O(log log n) time complexity of a loop is considered as O(log log n) if the loop variables is reduced / increased exponentially by a constant amount.
// Here c is a constant greater than 1
for (int i = 2; i <=n; i = pow(i, c)) {
// some O(1) expressions
}
//Here fun is sqrt or cuberoot or any other constant root
for (int i = n; i > 0; i = fun(i)) {
// some O(1) expressions
}
One example of time complexity analysis
int fun(int n)
{
for (int i = 1; i <= n; i++)
{
for (int j = 1; j < n; j += i)
{
// Some O(1) task
}
}
}
Analysis:
For i = 1, the inner loop is executed n times.
For i = 2, the inner loop is executed approximately n/2 times.
For i = 3, the inner loop is executed approximately n/3 times.
For i = 4, the inner loop is executed approximately n/4 times.
…………………………………………………….
For i = n, the inner loop is executed approximately n/n times.
So the total time complexity of the above algorithm is (n + n/2 + n/3 + … + n/n), which becomes n * (1/1 + 1/2 + 1/3 + … + 1/n)
The important thing about series (1/1 + 1/2 + 1/3 + … + 1/n) is around to O(log n). So the time complexity of the above code is O(n·log n).
References:
1
2
3
Time complexity with examples
1 - Basic operations (arithmetic, comparisons, accessing array’s elements, assignment): The running time is always constant O(1)
Example:
read(x) // O(1)
a = 10; // O(1)
a = 1,000,000,000,000,000,000 // O(1)
2 - If then else statement: Only taking the maximum running time from two or more possible statements.
Example:
age = read(x) // (1+1) = 2
if age < 17 then begin // 1
status = "Not allowed!"; // 1
end else begin
status = "Welcome! Please come in"; // 1
visitors = visitors + 1; // 1+1 = 2
end;
So, the complexity of the above pseudo code is T(n) = 2 + 1 + max(1, 1+2) = 6. Thus, its big oh is still constant T(n) = O(1).
3 - Looping (for, while, repeat): Running time for this statement is the number of loops multiplied by the number of operations inside that looping.
Example:
total = 0; // 1
for i = 1 to n do begin // (1+1)*n = 2n
total = total + i; // (1+1)*n = 2n
end;
writeln(total); // 1
So, its complexity is T(n) = 1+4n+1 = 4n + 2. Thus, T(n) = O(n).
4 - Nested loop (looping inside looping): Since there is at least one looping inside the main looping, running time of this statement used O(n^2) or O(n^3).
Example:
for i = 1 to n do begin // (1+1)*n = 2n
for j = 1 to n do begin // (1+1)n*n = 2n^2
x = x + 1; // (1+1)n*n = 2n^2
print(x); // (n*n) = n^2
end;
end;
Common running time
There are some common running times when analyzing an algorithm:
O(1) – Constant time
Constant time means the running time is constant, it’s not affected by the input size.
O(n) – Linear time
When an algorithm accepts n input size, it would perform n operations as well.
O(log n) – Logarithmic time
Algorithm that has running time O(log n) is slight faster than O(n). Commonly, algorithm divides the problem into sub problems with the same size. Example: binary search algorithm, binary conversion algorithm.
O(n log n) – Linearithmic time
This running time is often found in "divide & conquer algorithms" which divide the problem into sub problems recursively and then merge them in n time. Example: Merge Sort algorithm.
O(n2) – Quadratic time
Look Bubble Sort algorithm!
O(n3) – Cubic time
It has the same principle with O(n2).
O(2n) – Exponential time
It is very slow as input get larger, if n = 1,000,000, T(n) would be 21,000,000. Brute Force algorithm has this running time.
O(n!) – Factorial time
The slowest!!! Example: Travelling salesman problem (TSP)
It is taken from this article. It is very well explained and you should give it a read.
When you're analyzing code, you have to analyse it line by line, counting every operation/recognizing time complexity. In the end, you have to sum it to get whole picture.
For example, you can have one simple loop with linear complexity, but later in that same program you can have a triple loop that has cubic complexity, so your program will have cubic complexity. Function order of growth comes into play right here.
Let's look at what are possibilities for time complexity of an algorithm, you can see order of growth I mentioned above:
Constant time has an order of growth 1, for example: a = b + c.
Logarithmic time has an order of growth log N. It usually occurs when you're dividing something in half (binary search, trees, and even loops), or multiplying something in same way.
Linear. The order of growth is N, for example
int p = 0;
for (int i = 1; i < N; i++)
p = p + 2;
Linearithmic. The order of growth is n·log N. It usually occurs in divide-and-conquer algorithms.
Cubic. The order of growth is N3. A classic example is a triple loop where you check all triplets:
int x = 0;
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
x = x + 2
Exponential. The order of growth is 2N. It usually occurs when you do exhaustive search, for example, check subsets of some set.
Loosely speaking, time complexity is a way of summarising how the number of operations or run-time of an algorithm grows as the input size increases.
Like most things in life, a cocktail party can help us understand.
O(N)
When you arrive at the party, you have to shake everyone's hand (do an operation on every item). As the number of attendees N increases, the time/work it will take you to shake everyone's hand increases as O(N).
Why O(N) and not cN?
There's variation in the amount of time it takes to shake hands with people. You could average this out and capture it in a constant c. But the fundamental operation here --- shaking hands with everyone --- would always be proportional to O(N), no matter what c was. When debating whether we should go to a cocktail party, we're often more interested in the fact that we'll have to meet everyone than in the minute details of what those meetings look like.
O(N^2)
The host of the cocktail party wants you to play a silly game where everyone meets everyone else. Therefore, you must meet N-1 other people and, because the next person has already met you, they must meet N-2 people, and so on. The sum of this series is x^2/2+x/2. As the number of attendees grows, the x^2 term gets big fast, so we just drop everything else.
O(N^3)
You have to meet everyone else and, during each meeting, you must talk about everyone else in the room.
O(1)
The host wants to announce something. They ding a wineglass and speak loudly. Everyone hears them. It turns out it doesn't matter how many attendees there are, this operation always takes the same amount of time.
O(log N)
The host has laid everyone out at the table in alphabetical order. Where is Dan? You reason that he must be somewhere between Adam and Mandy (certainly not between Mandy and Zach!). Given that, is he between George and Mandy? No. He must be between Adam and Fred, and between Cindy and Fred. And so on... we can efficiently locate Dan by looking at half the set and then half of that set. Ultimately, we look at O(log_2 N) individuals.
O(N log N)
You could find where to sit down at the table using the algorithm above. If a large number of people came to the table, one at a time, and all did this, that would take O(N log N) time. This turns out to be how long it takes to sort any collection of items when they must be compared.
Best/Worst Case
You arrive at the party and need to find Inigo - how long will it take? It depends on when you arrive. If everyone is milling around you've hit the worst-case: it will take O(N) time. However, if everyone is sitting down at the table, it will take only O(log N) time. Or maybe you can leverage the host's wineglass-shouting power and it will take only O(1) time.
Assuming the host is unavailable, we can say that the Inigo-finding algorithm has a lower-bound of O(log N) and an upper-bound of O(N), depending on the state of the party when you arrive.
Space & Communication
The same ideas can be applied to understanding how algorithms use space or communication.
Knuth has written a nice paper about the former entitled "The Complexity of Songs".
Theorem 2: There exist arbitrarily long songs of complexity O(1).
PROOF: (due to Casey and the Sunshine Band). Consider the songs Sk defined by (15), but with
V_k = 'That's the way,' U 'I like it, ' U
U = 'uh huh,' 'uh huh'
for all k.
For the mathematically-minded people: The master theorem is another useful thing to know when studying complexity.
O(n) is big O notation used for writing time complexity of an algorithm. When you add up the number of executions in an algorithm, you'll get an expression in result like 2N+2. In this expression, N is the dominating term (the term having largest effect on expression if its value increases or decreases). Now O(N) is the time complexity while N is dominating term.
Example
For i = 1 to n;
j = 0;
while(j <= n);
j = j + 1;
Here the total number of executions for the inner loop are n+1 and the total number of executions for the outer loop are n(n+1)/2, so the total number of executions for the whole algorithm are n + 1 + n(n+1/2) = (n2 + 3n)/2.
Here n^2 is the dominating term so the time complexity for this algorithm is O(n2).
Other answers concentrate on the big-O-notation and practical examples. I want to answer the question by emphasizing the theoretical view. The explanation below is necessarily lacking in details; an excellent source to learn computational complexity theory is Introduction to the Theory of Computation by Michael Sipser.
Turing Machines
The most widespread model to investigate any question about computation is a Turing machine. A Turing machine has a one dimensional tape consisting of symbols which is used as a memory device. It has a tapehead which is used to write and read from the tape. It has a transition table determining the machine's behaviour, which is a fixed hardware component that is decided when the machine is created. A Turing machine works at discrete time steps doing the following:
It reads the symbol under the tapehead.
Depending on the symbol and its internal state, which can only take finitely many values, it reads three values s, σ, and X from its transition table, where s is an internal state, σ is a symbol, and X is either Right or Left.
It changes its internal state to s.
It changes the symbol it has read to σ.
It moves the tapehead one step according to the direction in X.
Turing machines are powerful models of computation. They can do everything that your digital computer can do. They were introduced before the advent of digital modern computers by the father of theoretical computer science and mathematician: Alan Turing.
Time Complexity
It is hard to define the time complexity of a single problem like "Does white have a winning strategy in chess?" because there is a machine which runs for a single step giving the correct answer: Either the machine which says directly 'No' or directly 'Yes'. To make it work we instead define the time complexity of a family of problems L each of which has a size, usually the length of the problem description. Then we take a Turing machine M which correctly solves every problem in that family. When M is given a problem of this family of size n, it solves it in finitely many steps. Let us call f(n) the longest possible time it takes M to solve problems of size n. Then we say that the time complexity of L is O(f(n)), which means that there is a Turing machine which will solve an instance of it of size n in at most C.f(n) time where C is a constant independent of n.
Isn't it dependent on the machines? Can digital computers do it faster?
Yes! Some problems can be solved faster by other models of computation, for example two tape Turing machines solve some problems faster than those with a single tape. This is why theoreticians prefer to use robust complexity classes such as NL, P, NP, PSPACE, EXPTIME, etc. For example, P is the class of decision problems whose time complexity is O(p(n)) where p is a polynomial. The class P do not change even if you add ten thousand tapes to your Turing machine, or use other types of theoretical models such as random access machines.
A Difference in Theory and Practice
It is usually assumed that the time complexity of integer addition is O(1). This assumption makes sense in practice because computers use a fixed number of bits to store numbers for many applications. There is no reason to assume such a thing in theory, so time complexity of addition is O(k) where k is the number of bits needed to express the integer.
Finding The Time Complexity of a Class of Problems
The straightforward way to show the time complexity of a problem is O(f(n)) is to construct a Turing machine which solves it in O(f(n)) time. Creating Turing machines for complex problems is not trivial; one needs some familiarity with them. A transition table for a Turing machine is rarely given, and it is described in high level. It becomes easier to see how long it will take a machine to halt as one gets themselves familiar with them.
Showing that a problem is not O(f(n)) time complexity is another story... Even though there are some results like the time hierarchy theorem, there are many open problems here. For example whether problems in NP are in P, i.e. solvable in polynomial time, is one of the seven millennium prize problems in mathematics, whose solver will be awarded 1 million dollars.

How to make random mutations

I'm doing a project on natural selection for cells for fun. Each code has "dna" which is just a set of instructions. The dna can either have REMOVE WASTE, DIGEST FOOD, or REPAIR WALL. I won't really go into detail what they do, because that would take too long. But the only reason evolution really happens is through genetic mutations. I'm wondering if this is possible in javascript, and how to do it. For example, the starting cell has 5 dna strands. But if it reproduces, the child can have 4, or 6. And some of the dna strands can be altered. This is my code so far:
var strands = ["DIGEST FOOD", "REPAIR WALL", "REMOVE WASTE"];
var dna = [];
for (let i = 0; i < 5; i++) {
if (parent) {
// something about the parents dna, and the mutation chance
}
else {
dna.push(strands[Math.floor(Math.random() * 3)]); // if cell doesn't have parent
}
}
I'm just wondering if this is possible in javascript, and how to succesfully do it. Sorry if the question isn't too clear.
Edit: Let me rephrase a little. What I'm trying to achieve is a genetic mutation in the new cell. Like:
if (parent) {
dna.push(parent);
if (Math.random() < 0.5) {
changeStrand(num);
}
if (Math.random() < 0.5) {
addStrand(num);
}
if (Math.random() < 0.5) {
removeStrand(num);
}
}
function changeStrand() {
// change the strand
}
function newStrand(num) {
// add random strands
}
function removeStrand(num) {
// remove random strands
}
or something like that
For a genetic algorithm, you basically want to take two slices from each parent and stitch them together, whilst ensuring the end result is still a valid dna strand.
For a fixed sized DNA sequence (such a N queens positions), the technique would be to pick a random slice point (1-3 | 4-8) and then combine these slices from the parents to create a child.
For your usecase, you need two random slices who sum of sizes adds upto 4-6. So possibly two slices of size 2-3. You could potentially take one from the front, and the other from the back. Else you could first pick a random output size, and then fill it will two random sequences for either parent.
Array.slice() and Array.splice() are probably the functions you want to use.
You can also add in a random mutation to the end result. Viruses at the speed limit of viable genetic evolution have an average of 1 mutation per transcription. Which means some transcriptions won’t have mutations, which is equivalent to allowing some of the parents in the parent generation to survive.
You can also experiment with different variations. Implement these as feature flags, and see what works best in practice.
Also compare with Beam Search, which essentially keeps a copy of the N best results from each generation. You may or may not want to keep the best from the parent generation to survive unmutated.
Another idea is to compute a distance metric between individuals, and add a cost for being too close to an existing member of the population, and this will select for genetic diversity.
In the standard model, variation occurs both by point mutations in the letter sequence and by “crossover” (in which the DNA of an offspring is generated by combining long sections of DNA from each parent).
The analogy to local search algorithms has already been described; the principal difference between stochastic beam search and evolution is the use of sexual reproduction, wherein successors are generated from multiple organisms rather than just one. The actual mechanisms of evolution are, however, far richer than most genetic algorithms allow. For example, mutations can involve reversals, duplications, and movement of large chunks of DNA; some viruses borrow DNA from one organism and insert it in another, and there are transposable genes that do nothing but copy themselves many thousands of times within the genome. There are even genes that poison cells from potential mates that do not carry the gene, thereby increasing their own chances of replication. Most important is the fact that the genes themselves encode the mechanisms whereby the genome is reproduced and translated into an organism. In genetic algorithms, those mechanisms are a separate program that is not represented within the strings being manipulated.
Artificial Intelligence: A Modern Approach. (Third edition) by Stuart Russell and Peter Norvig.
If you want to have random numbers you can use Math.random() for that. In the linked page there are also some examples for getting values between x and y for example:
// Source https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
function getRandomInt(min, max) {
min = Math.ceil(min);
max = Math.floor(max);
return Math.floor(Math.random() * (max - min)) + min;
}
I am not sure this is what you try to achieve - since you already make use of the Math.random() function.

A puzzle about time complexity calculation and real time consumption

While learning the basic knowledge of algorithm, I find puzzle about the time complexity calculation and the real time consumption when run the codes.
The demo codes specify the problem.
function calcDemo1(){
var init = 0;
for(var i=0;i<40;i++){
init += 0;
}
return init;
}
function calcDemo2(){
var init = 0;
init += (0+1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16+17+18+19+20+21+22+23+24+25+26+27+28+29+30+31+32+33+34+35+36+37+38+39);
return init;
}
Does calcDemo1's time complexity is O(1) even if it's a "for loop"?
In case their time complexity were both O(1), do they take the same amount of time in the worst-case scenario when run the code?
The relative question is here
Both of them have constant time complexity. O(1) time complexity.
For case-1 there is an for loop but it runs 40 times. So it will be of constant time complexity.
In the second case, no for loop is there, but it is still contant time addition. So it is O(1) again.
It doesn't mean that if there is for loop it's complexity can't be constant.
As reply to the comment, yes even if we increase the hardcoded value then the time complexity won't change. It will be still O(1).
Constant time complexity O(1) means that the execution takes the same amount of time, no matter how big the input.
Linear time complexity O(n) means that execution takes longer to the same degree that the input size increases.
Your examples
It depends on what you define as your input. In my analysis below I am assuming that your input is the number of times to loop/add a number (40). If there is no input at all, then any code will just be of constant time complexity O(1).
calcDemo1 most likely has linear complexity because the javascript compiler isn't smart enough to figure out that it could just skip the loop. It will actually increase i 40 times and then add 0 to init 40 times (or at least check if it actually has to add anything). Therefore, each rotation of the loop actually takes some time and, say, looping 4000 times would 100 times longer than 40 times.
calcDemo2 also has linear complexity O(n): Adding 1 million numbers should take roughly 1000 times as long as just adding 1000 numbers.

Do I got number of operations per second in this way?

Look at this code:
function wait(time) {
let i = 0;
let a = Date.now();
let x = a + (time || 0);
let b;
while ((b = Date.now()) <= x) ++i;
return i;
}
If I run it in browser (particularly Google Chrome, but I don't think it matters) in the way like wait(1000), the machine will obviously freeze for a second and then return recalculated value of i.
Let it be 10 000 000 (I'm getting values close to this one). This value varies every time, so lets take an average number.
Did I just got current number of operations per second of the processor in my machine?
Not at all.
What you get is the number of loop cycles completed by the Javascript process in a certain time. Each loop cycle consists of:
Creating a new Date object
Comparing two Date objects
Incrementing a Number
Incrementing the Number variable i is probably the least expensive of these, so the function is not really reporting how much it takes to make the increment.
Aside from that, note that the machine is doing a lot more than running a Javascript process. You will see interference from all sorts of activity going on in the computer at the same time.
When running inside a Javascript process, you're simply too far away from the processor (in terms of software layers) to make that measurement. Beneath Javascript, there's the browser and the operating system, each of which can (and will) make decisions that affect this result.
No. You can get the number of language operations per second, though the actual number of machine operations per second on a whole processor is more complicated.
Firstly the processor is not wholly dedicated to the browser, so it is actually likely switching back and forth between prioritized processes. On top of that memory access is obscured and the processor uses extra operations to manage memory (page flushing, etc.) and this is not gonna be very transparent to you at a given time. On top of that physical properties means that the real clock rate of the processor is dynamic... You can see it's pretty complicated already ;)
To really calculate the number of machine operations per second you need to measure the clock rate of the processor and multiply it by the number of instructions per cycle the processor can perform. Again this varies, but really the manufacturer specs will likely be good enough of an estimate :P.
If you wanted to use a program to measure this, you'd need to somehow dedicate 100% of the processor to your program and have it run a predictable set of instructions with no other hangups (like memory management). Then you need to include the number of instructions it takes to load the program instructions into the code caches. This is not really feasible however.
As others have pointed out, this will not help you determine the number of operations the processor does per second due to the factors that prior answers have pointed out. I do however think that a similar experiment could be set up to estimate the number of operations to be executed by your JavaScript interpreter running on your browser. For example given a function: factorial(n) an operation that runs in O(n). You could execute an operation such as factorial(100) repeatedly over the course of a minute.
function test(){
let start = Date.now();
let end = start + 60 * 1000;
let numberOfExecutions = 0;
while(Date.now() < end){
factorial(100);
numberOfExecutions++;
}
return numberOfExecutions/(60 * 100);
}
The idea here is that factorial is by far the most time consuming function in the code. And since factorial runs in O(n) we know factorial(100) is approximately 100 operations. Note that this will not be exact and that larger numbers will make for better approximations. Also remember that this will estimate the number of operations executed by your interpreter and not your processor.
There is a lot of truth to all previous comments, but I want to invert the reasoning a little bit because I do believe it is easier to understand it like that.
I believe that the fairest way to calculate it is with the most basic loop, and not relying on any dates or functions, and instead calculate the values later.
You will see that the smaller the function, the bigger the initial overload is. That means it takes a small amount of time to start and finish each function, but at a certain point they all start reaching a number that can clearly be seen as close-enough to be considered how many operations per second can JavaScript run.
My example:
const oneMillion = 1_000_000;
const tenMillion = 10_000_000;
const oneHundredMillion = 100_000_000;
const oneBillion = 1_000_000_000;
const tenBillion = 10_000_000_000;
const oneHundredBillion = 100_000_000_000;
const oneTrillion = 1_000_000_000_000;
function runABunchOfTimes(times) {
console.time('timer')
for (let i = 0; i < times; ++i) {}
console.timeEnd('timer')
}
I've tried on a machine that has a lot of load already on it with many processes running, 2020 macbook, these were my results:
at the very end I am taking the time the console showed me it took to run, and I divided the number of runs by it. The oneTrillion and oneBillion runs are virtually the same, however when it goes to oneMillion and 1000 you can see that they are not as performant due to the initial load of creating the for loop in the first place.
We usually try to sway away from O(n^2) and slower functions exactly because we do not want to reach for that maximum. If you were to perform a find inside of a map for an array with all cities in the world (around 10_000 according to google, I haven't counted) we would already each 100_000_000 iterations, and they would certainly not be as simple as just iterating through nothing like in my example. Your code then would take minutes to run, but I am sure you are aware of this and that is why you posted the question in the first place.
Calculating how long it would take is tricky not only because of the above, but also because you cannot predict which device will run your function. Nowadays I can open in my TV, my watch, a raspberry py and none of them would be nearly as fast as the computer I am running from when creating these functions. Sure. But if I were to try to benchmark a device I would use something like the function above since it is the simplest loop operation I could think of.

Categories

Resources