JS - How to check if 2 images (their hash) are similar - javascript

GOAL
Finding a good way to check if 2 image are similar compairing their hash profiles. The hash is a simple array containing 0 and 1 values.
INTRO
I have 2 images. They are the same image but with some little differences: one has a different brightness, rotation and shot.
What I want to do is create a Javascript method to compare the 2 images and calculate a percentage value that tells how much they are similar.
WHAT I'VE DONE
After uploading the 2 images into a html5 canvas to get their image data, I've used the pHash algorithm (www.phash.org) to obtain their hash rapresentation.
The hash is an array containing 0 and 1 values that recreates the image in a "simplified" form.
I've also created a JS script that generates a html table with black cells where the array contains 1.The result is the following screenshot (the image is a Van Gogh picture):
Screenshot
Now, what I should do is to compare the 2 arrays for obtaining a percentage value to know "how much" they are similar.
The most part of the hash Javascript algorithms I've found googling already have a compare algorithm: the hamming distance algorithm. It's very simple and fast, but not very precise. In fact, the hamming distance algorithm says that the 2 images in my screenshot have a 67% of similarity.
THE QUESTION
Starting with 2 simple arrays, with the same length, filled with 0 and 1 values: what could be a good algorithm to determine similarity more precisely?
NOTES
- Pure Javascript development, no third party plugins or framework.
- No need of a complex algorithm to find the right similarity when the 2 images are the same but they are very different (strong rotation, totaly different colors, etc.).
Thanx
PHASH CODE
// Size is the image size (for example 128px)
var pixels = [];
for (var i=0;i<imgData.data.length;i+=4){
var j = (i==0) ? 0 : i/4;
var y = Math.floor(j/size);
var x = j-(y*size);
var pixelPos = x + (y*size);
var r = imgData.data[i];
var g = imgData.data[i+1];
var b = imgData.data[i+2];
var gs = Math.floor((r*0.299)+(g*0.587)+(b*0.114));
pixels[pixelPos] = gs;
}
var avg = Math.floor( array_sum(pixels) / pixels.length );
var hash = [];
array.forEach(pixels, function(px,i){
if(px > avg){
hash[i] = 1;
} else{
hash[i] = 0;
}
});
return hash;
HAMMING DISTANCE CODE
// hash1 and hash2 are the arrays of the "coded" images.
var similarity = hash1.length;
array.forEach(hash1, function(val,key){
if(hash1[key] != hash2[key]){
similarity--;
}
});
var percentage = (similarity/hash1.length*100).toFixed(2);
NOTE: array.forEach is not pure javascript. Consider it as a replace of: for (var i = 0; i < array.length; i++).

I'm using blockhash, it seems pretty good so far, only false positives I get are when half the pictures are of the same background color, which is to be expected =/
http://blockhash.io/
BlockHash may be slower than yours but it should be more accurate.
What you do is just calculate the greyscale of EACH pixels, and just compare it to the average to create your hash.
What BlockHash does is split the picture in small rectangles of equal size and averages the sum of the RGB values of the pixels inside them and compares them to 4 horizontal medians.
So it is normal that it takes longer, but it is still pretty efficient and accurate.
I'm doing it with pictures of a good resolution, at minimum 1000x800, and use 16bits. This gives a 64 character long hexadecimal hash. When using the hamming distance provided by the same library, I see good results when using a 10 similarity threshold.
Your idea of using greyscale isn't bad at all. But you should average out portions of the image instead of comparing each pixels. That way you can compare a thumbnail version to its original, and get pretty much the same phash!

I don't know if this can do the trick, but you can just compare the 0 and 1 similarities between arrays :
const arr1 = [1,1,1,1,1,1,1,1,1,1],
arr2 = [0,0,0,0,0,0,0,0,0,0],
arr3 = [0,1,0,1,0,1,0,1,0,1],
arr4 = [1,1,1,0,1,1,1,0,1,1]
const howSimilar = (a1,a2) => {
let similarity = 0
a1.forEach( (elem,index) => {
if(a2[index]==elem) similarity++
})
let percentage = parseInt(similarity/arr1.length*100) + "%"
console.log(percentage)
}
howSimilar(arr1,arr2) // 0%
howSimilar(arr1,arr3) // 50%
howSimilar(arr1,arr4) // 80%

Related

How to generate a random distribution to make characters (in a word/phrase) appear more naturally

This is the effect I am trying to achieve:
http://codepen.io/anon/pen/ENzQem
I am trying to generate an effect where the letters of a string get revealed gradually and randomly. See the codepen link above for a demonstration of this.
However, I'm finding it difficult to show each character naturally if I just simply randomly generate numbers for each delay separately.
If I simply do a Math.random() to generate a number independently for each character, sometimes adjacent letters will have similar delay numbers, and as such the effect will look chunky, with two letters side-by-side appearing at the same rate.
This is the naive solution with separate random number generators:
renderSpans(text) {
const textArray = text.split('');
return textArray.map((letter, index) => {
const transitionTime = 2000;
const delay = parseInt(Math.random() * transitionTime, 10);
const styles = {
opacity: this.props.show ? '1' : '0',
transition: `opacity ${transitionTime}ms`,
transitionDelay: `${delay}ms`,
};
return <span style={styles}>{letter}</span>;
});
}
I need an algorithm to generate an array of numbers that I can use as the delay for each of the characters, regardless of the length of the input string.
My first thought is to use a sinusoidal wave of some sort, with a bunch of randomness put in, but I'm not sure about the specifics on this. I am sure there's a much more well-accepted way to generate natural-looking noise in mathematics.
Can someone point me to some well-known algorithms for my use case? Maybe something to do with Perlin noise or the like?
Thanks in advance.
It turns out I was overthinking the problem.
I ended up creating the following function:
const getRandoms = (length, threshold) => {
const tooClose = (a, b) => Math.abs(a - b) < threshold;
const result = [];
let random;
for (let i = 0; i < length; i += 1) {
random = Math.random();
if (i !== 0) {
const prev = result[i - 1];
while (tooClose(random, prev)) {
random = Math.random();
}
}
result.push(random);
}
return result;
};
I originally wrote a version of this function that uses reduce rather than a for-loop, but ultimately decided that a for-loop would be clearer and simpler.
How it works
This function procedurally builds up an array of random numbers, given the required length and a "closeness" threshold. Each of the resulting random numbers will have a difference greater than the threshold when compared with the previous number in the array.
For the first item in the array, we simply generate a random number and push it in.
For each of the subsequent items in the array, we:
Generate a new random number,
Compare this number to the previous number in the array,
If they are too close (i.e. difference < a threshold value), then generate a new random number and compare again,
If they are not too close, then push it into the results array and continue.
I have found that a threshold of 0.2 seem to work pretty well for my use case.

Memory-efficient downsampling (charting) of a growing array

A node process of mine receives a sample point every half a second, and I want to update the history chart of all the sample points I receive.
The chart should be an array which contains the downsampled history of all points from 0 to the current point.
In other words, the maximum length of the array should be l. If I received more sample points than l, I want the chart array to be a downsampled-to-l version of the whole history.
To express it with code:
const CHART_LENGTH = 2048
createChart(CHART_LENGTH)
onReceivePoint = function(p) {
// p can be considered a number
const chart = addPointToChart(p)
// chart is an array representing all the samples received, from 0 to now
console.assert(chart.length <= CHART_LENGTH)
}
I already have a working downsampling function with number arrays:
function downsample (arr, density) {
let i, j, p, _i, _len
const downsampled = []
for (i = _i = 0, _len = arr.length; _i < _len; i = ++_i) {
p = arr[i]
j = ~~(i / arr.length * density)
if (downsampled[j] == null) downsampled[j] = 0
downsampled[j] += Math.abs(arr[i] * density / arr.length)
}
return downsampled
}
One trivial way of doing this would obviously be saving all the points I receive into an array, and apply the downsample function whenever the array grows. This would work, but, since this piece of code would run in a server, possibly for months and months in a row, it would eventually make the supporting array grow so much that the process would go out of memory.
The question is: Is there a way to construct the chart array re-using the previous contents of the chart itself, to avoid mantaining a growing data structure? In other words, is there a constant memory complexity solution to this problem?
Please note that the chart must contain the whole history since sample point #0 at any moment, so charting the last n points would not be acceptable.
The only operation that does not distort the data and that can be used several times is aggregation of an integer number of adjacent samples. You probably want 2.
More specifically: If you find that adding a new sample will exceed the array bounds, do the following: Start at the beginning of the array and average two subsequent samples. This will reduce the array size by 2 and you have space to add new samples. Doing so, you should keep track of the current cluster size c(the amount of samples that constitute one entry in the array). You start with one. Every reduction multiplies the cluster size by two.
Now the problem is that you cannot add new samples directly to the array any more because they have a completely different scale. Instead, you should average the next c samples to a new entry. It turns out that it is sufficient to store the number of samples n in the current cluster to do this. So if you add a new sample s, you would do the following.
n++
if n = 1
append s to array
else
//update the average
last array element += (s - last array element) / n
if n = c
n = 0 //start a new cluster
So the memory that you actually need is the following:
the history array with predefined length
the number of elements in the history array
the current cluster size c
the number of elements in the current cluster n
The size of the additional memory does not depend on the total number of samples, hence O(1).

Find a specific string in a cdata string using javascript

I've been reading up a bit on using data types in javascript, specifically CData. I have a specific use case with a numeric string I'm running a regex pattern on. It's already fairly performant for what I'm doing, but I'm interested in possibly making it more performant for larger applications.
I am representing multi-dimensional models as a single string of integers (doesn't have to be integers, but that's worked for me so far). I represent empty space with 0, occupied space as 1, and each successive dimensional divide with an integer, beggining with 2 for 2-dimensions.
3 1D:
000
3x3 2D:
00020002000
3x3x3 3D:
00020002000300020002000300020002000
There's a bunch of stuff involved with making the regex pattern, but essentially it looks like this for 2D (this is a super-dumbed-down version for ease):
var gridWidth = //total width of our grid
var columns = //width of our object to place in grid
var rows = //height of our object to place in grid
var grid = 00020002000;
// (0{number of columns})+(([0-2]{difference in width of grid and object})(0{number of columns again)).repeat(number of rows)
var reg = RegExp(("(0{" + columns + "})" + ("([0-2]{" + (gridWidth + 1 - columns) + "})(0{"+columns+"})").repeat(rows-1)) + "");
grid = grid.replace(reg, function(){
//the last 2 argument's aren't part of our grouping
var l = arguments.length - 2;
r = "";
for (var i = 1; i<l; ++i){
if (i%2){
r+= "1".repeat(columns); //repeat prototyped, just repeats string x times
} else {
r+= arguments[i];
}
}
return r;
});
CData integers seem to be somehow more performant than javascript strings from what I'm reading, though I'm not experienced with C or the finer points of higher-level programming. I'm a javascript code monkey - feel free to tell me I'm WAY off base with my train of thought.
So my question is, is it possible to take my grid (which is essentially an integer), turn it into/store it as CData, and run my regex pattern against it somehow in an effort to increase performance processing large numbers of object in a very large grid space?
(side note: I have been able to place 10000 objects of random sizes between 1x1 and 4x4 in a grid using divs in an average of about 14000ms in chrome, so it's performant for basic grid layouts [registers as 0ms sometimes with only a few dozen objects on a small grid.] Handling placing objects more efficiently may inspire greater uses)
"More performant" is all relative to what you're trying to accomplish.
The CData spec you were reading from is a draft. In the meantime, keep it simple - try arrays. The regex seems like a novel idea, but also seems to me that it would be quite difficult to maintain.

Better random function in JavaScript

I'm currently making a Conway's Game of Life reproduction in JavaScript and I've noticed that the function Math.random() is always returning a certain pattern. Here's a sample of a randomized result in a 100x100 grid:
Does anyone knows how to get better randomized numbers?
ApplyRandom: function() {
var $this = Evolution;
var total = $this.Settings.grid_x * $this.Settings.grid_y;
var range = parseInt(total * ($this.Settings.randomPercentage / 100));
for(var i = 0; i < total; i++) {
$this.Infos.grid[i] = false;
}
for(var i = 0; i < range; i++) {
var random = Math.floor((Math.random() * total) + 1);
$this.Infos.grid[random] = true;
}
$this.PrintGrid();
},
[UPDATE]
I've created a jsFiddle here: http://jsfiddle.net/5Xrs7/1/
[UPDATE]
It seems that Math.random() was OK after all (thanks raina77ow). Sorry folks! :(. If you are interested by the result, here's an updated version of the game: http://jsfiddle.net/sAKFQ/
(But I think there's some bugs left...)
This line in your code...
var position = (y * 10) + x;
... is what's causing this 'non-randomness'. It really should be...
var position = (y * $this.Settings.grid_x) + x;
I suppose 10 was the original size of this grid, that's why it's here. But that's clearly wrong: you should choose your position based on the current size of the grid.
As a sidenote, no offence, but I still consider the algorithm given in #JayC answer to be superior to yours. And it's quite easy to implement, just change two loops in ApplyRandom function to a single one:
var bias = $this.Settings.randomPercentage / 100;
for (var i = 0; i < total; i++) {
$this.Infos.grid[i] = Math.random() < bias;
}
With this change, you will no longer suffer from the side effect of reusing the same numbers in var random = Math.floor((Math.random() * total) + 1); line, which lowered the actual cell fillrate in your original code.
Math.random is a pseudo random method, that's why you're getting those results. A by pass i often use is to catch the mouse cursor position in order to add some salt to the Math.random results :
Math.random=(function(rand) {
var salt=0;
document.addEventListener('mousemove',function(event) {
salt=event.pageX*event.pageY;
});
return function() { return (rand()+(1/(1+salt)))%1; };
})(Math.random);
It's not completly random, but a bit more ;)
A better solution is probably not to randomly pick points and paint them black, but to go through each and every point, decide what the odds are that it should be filled, and then fill accordingly. (That is, if you want it on average %20 percent chance of it being filled, generate your random number r and fill when r < 0.2 I've seen a Life simulator in WebGL and that's kinda what it does to initialize...IIRC.
Edit: Here's another reason to consider alternate methods of painting. While randomly selecting pixels might end up in less work and less invocation of your random number generator, which might be a good thing, depending upon what you want. As it is, you seem to have selected a way that, at most some percentage of your pixels will be filled. IF you had kept track of the pixels being filled, and chose to fill another pixel if one was already filled, essentially all your doing is shuffling an exact percentage of black pixels among your white pixels. Do it my way, and the percentage of pixels selected will follow a binomial distribution. Sometimes the percentage filled will be a little more, sometimes a little less. The set of all shufflings is a strict subset of the possibilities generated this kind of picking (which, also strictly speaking, contains all possibilities for painting the board, just with astronomically low odds of getting most of them). Simply put, randomly choosing for every pixel would allow more variance.
Then again, I could modify the shuffle algorithm to pick a percentage of pixels based upon numbers generated from a binomial probability distribution function with a defined expected/mean value instead of the expected/mean value itself, and I honestly don't know that it'd be any different--at least theoretically--than running the odds for every pixel with the expected/mean value itself. There's a lot that could be done.
console.log(window.crypto.getRandomValues(new Uint8Array(32))); //return 32 random bytes
This return a random bytes with crypto-strength: https://developer.mozilla.org/en/docs/Web/API/Crypto/getRandomValues
You can try
JavaScript Crypto Library (BSD license). It is supposed to have a good random number generator. See here an example of usage.
Stanford JavaScript Crypto Library (BSD or GPL license). See documentation for random numbers.
For a discussion of strength of Math.random(), see this question.
The implementation of Math.random probably is based on a linear congruential generator, one weakness of which is that a random number depends on the earlier value, producing predictable patterns like this, depending on the choice of the constants in the algorithm. A famous example of the effect of poor choice of constants can be seen in RANDU.
The Mersenne Twister random number generator does not have this weakness. You can find an implementation of MT in JavaScript for example here: https://gist.github.com/banksean/300494
Update: Seeing your code, you have a problem in the code that renders the grid. This line:
var position = (y * 10) + x;
Should be:
var position = (y * grid_x) + x;
With this fix there is no discernible pattern.
You can using the part of sha256 hash from timestamp including nanoseconds:
console.log(window.performance.now()); //return nanoseconds inside
This can be encoded as string,
then you can get hash, using this: http://geraintluff.github.io/sha256/
salt = parseInt(sha256(previous_salt_string).substring(0, 12), 16);
//48 bits number < 2^53-1
then, using function from #nfroidure,
write gen_salt function before, use sha256 hash there,
and write gen_salt call to eventListener.
You can use sha256(previous_salt) + mouse coordinate, as string to get randomized hash.

board game win situation - searching algorithm

I'm looking for possibly efficient algorithm to detect "win" situation in a gomoku (five-in-a-row) game, played on a 19x19 board. Win situation happens when one of the players manages to get five and NO MORE than five "stones" in a row (horizontal, diagonal or vertical).
I have the following data easily accessible:
previous moves ("stones") of both players stored in a 2d array (can be also json notation object), with variables "B" and "W" to difference players from each other,
"coordinates" of the incoming move (move.x, move.y),
number of moves each player did
I'm doing it in javascript, but any solution that doesn't use low-level stuff like memory allocation nor higher-level (python) array operations would be good.
I've found similiar question ( Detect winning game in nought and crosses ), but solutions given there only refer to small boards (5x5 etc).
A simple to understand solution without excessive loops (only pseudocode provided, let me know if you need more explanation):
I assume your 2-d array runs like this:
board = [
[...],
[...],
[...],
...
];
I.e. the inner arrays represent the horizontal rows of the board.
I also assume that the array is populated by "b", "w", and "x", representing black pieces, white pieces, and empty squares, respectively.
My solution is somewhat divide-and-conquer, so I've divided it into the 3 cases below. Bear with me, it may seem more complex than simply running multiple nested loops at first, but the concept is easy to understand, read, and with the right approach, quite simple to code.
Horizontal lines
Let's first consider the case of detecting a win situation ONLY if the line is horizontal - this is the easiest. First, join a row into a single string, using something like board[0].join(""). Do this for each row. You end up with an array like this:
rows = [
"bxwwwbx...",
"xxxwbxx...",
"wwbbbbx...",
...
]
Now join THIS array, but inserting an "x" between elements to separate each row: rows.join("x").
Now you have one long string representing your board, and it's simply a matter of applying a regexp to find consecutive "w" or "b" of exactly 5 length: superString.test(/(b{5,5})|(w{5,5})/). If the test returns true you have a win situation. If not, let's move on to vertical lines.
Vertical lines
You want to reuse the above code, so create a function testRows for it. Testing for vertical lines is exactly the same process, but you want to transpose the board, so that rows become columns and columns become rows. Then you apply the same testRows function. Transposing can be done by copying values into a new 2-d array, or by writing a simple getCol function and using that within testRows.
Diagonal lines
Again, we want to reuse the `testRows' function. A diagonal such as this:
b x x x x
x b x x x
x x b x x
x x x b x
x x x x b
Can be converted to a vertical such as this:
b x x x x
b x x x
b x x
b x
b
By shifting row i by i positions. Now it's a matter of transposing and we are back at testing for horizontals. You'll need to do the same for diagonals that go the other way, but this time shift row i by length - 1 - i positions, or in your case, 18 - i positions.
Functional javascript
As a side note, my solution fits nicely with functional programming, which means that it can be quite easily coded if you have functional programming tools with you, though it's not necessary. I recommend using underscore.js as it's quite likely you'll need basic tools like map, reduce and filter in many different game algorithms. For example, my section on testing horizontal lines can be written in one line of javascript with the use of map:
_(board).map(function (row) {return row.join("")}).join("x").test(/(b{5,5})|(w{5,5})/);
Even though this is a really old question I want to provide my answer because I took a deeper look into this problem today and solved it in a much (much) more efficient way.
I'm using a bit board, which is used in most of the board games and engines (chess engines) due to the efficiency, to represent my field.
You can do everything you need in this game with bitwise operations.
A bit can just have 2 states (0 and 1) however what we need are 3 states e.g. p1, p2 or empty.
To solve this problem we're going to have 2 boards instead, one for each player.
Another problem is that Gomoku has a lot of fields (19x19) and there is no number type that has that many bits to represent the field.
We will use an array of numbers to represent each line and just use the first lsb 15bits of it.
Vertical rows
A simplified board of player 1 could look like this
000000
101100
001000
011000
000000
Lets say we want to detect 3 in a row. We take the first 3 rows(0-2) and took at them.
000000
001100
101000
With the & (AND) operator you can check if there is a 1 in every row.
var result = field[player][0] & field[player][1] & field[player][2];
In this case the result will be 0 which means no winner. Lets continue... The next step is to take rows 1-3
101100
001000
011000
Apply the AND again and that we will get is 001000. We don't have to care what number this is, just if it's 0 or not. (result != 0)
Horizontal rows
Ok now we can detect vertical rows. To detect the horizontal rows we need to save another 2 boards, again one for each player. But we need to invert x and y axis. Then we can do the same check again to detect horizontal lines. Your array would then be:
//[player][hORv][rows]
var field[2][2][19];
Diagonals :)
The trickiest part are the diagonals of course but with a simple trick you can do the same check as above. A simple board:
000000
010000
001000
000100
000000
Basically we do the same as above but before we do that we need to shift the rows. Lets say we're at row 1-3.
010000
001000
000100
The first row stays as it is. Then you shift the second row one to the left and the third 2 to the left.
var r0 = field[0][0][i];
var r1 = field[0][0][i+1] << 1;
var r2 = field[0][0][i+2] << 2;
What you will get is:
010000
010000
010000
Apply AND you can have your win detection. To get the other diagonal direction just do it again, but instead of shifting to the left <<, shift to the right >>
I hopes this helps someone.
untested:
int left = max(0, move.x-5), right = min(width-1, move.x+5), top = max(0, move.y-5), bottom = min(width-1, move.y+5);
// check the primary diagonal (top-left to bottom-right)
for (int x = left, y = top; x <= right && y <= bottom; x++, y++) {
for (int count = 0; x <= right && y <= bottom && stones[x][y] == lastPlayer; x++, y++, count++) {
if (count >= 5) return true;
}
}
// check the secondary diagonal (top-right to bottom-left)
// ...
// check the horizontal
// ...
// check the vertical
// ...
return false;
alternatively, if you don't like the nested loops (untested):
// check the primary diagonal (top-left to bottom-right)
int count = 0, maxCount = 0;
for (int x = left, y = top; x <= right && y <= bottom; x++, y++) {
if (count < 5) {
count = stones[x][y] == lastPlayer ? count + 1 : 0;
} else {
return true;
}
}

Categories

Resources