Find a specific string in a cdata string using javascript

Find a specific string in a cdata string using javascript - javascript

I've been reading up a bit on using data types in javascript, specifically CData. I have a specific use case with a numeric string I'm running a regex pattern on. It's already fairly performant for what I'm doing, but I'm interested in possibly making it more performant for larger applications.
I am representing multi-dimensional models as a single string of integers (doesn't have to be integers, but that's worked for me so far). I represent empty space with 0, occupied space as 1, and each successive dimensional divide with an integer, beggining with 2 for 2-dimensions.
3 1D:
000
3x3 2D:
00020002000
3x3x3 3D:
00020002000300020002000300020002000
There's a bunch of stuff involved with making the regex pattern, but essentially it looks like this for 2D (this is a super-dumbed-down version for ease):
var gridWidth = //total width of our grid
var columns = //width of our object to place in grid
var rows = //height of our object to place in grid
var grid = 00020002000;
// (0{number of columns})+(([0-2]{difference in width of grid and object})(0{number of columns again)).repeat(number of rows)
var reg = RegExp(("(0{" + columns + "})" + ("([0-2]{" + (gridWidth + 1 - columns) + "})(0{"+columns+"})").repeat(rows-1)) + "");
grid = grid.replace(reg, function(){
//the last 2 argument's aren't part of our grouping
var l = arguments.length - 2;
r = "";
for (var i = 1; i<l; ++i){
if (i%2){
r+= "1".repeat(columns); //repeat prototyped, just repeats string x times
} else {
r+= arguments[i];
}
}
return r;
});
CData integers seem to be somehow more performant than javascript strings from what I'm reading, though I'm not experienced with C or the finer points of higher-level programming. I'm a javascript code monkey - feel free to tell me I'm WAY off base with my train of thought.
So my question is, is it possible to take my grid (which is essentially an integer), turn it into/store it as CData, and run my regex pattern against it somehow in an effort to increase performance processing large numbers of object in a very large grid space?
(side note: I have been able to place 10000 objects of random sizes between 1x1 and 4x4 in a grid using divs in an average of about 14000ms in chrome, so it's performant for basic grid layouts [registers as 0ms sometimes with only a few dozen objects on a small grid.] Handling placing objects more efficiently may inspire greater uses)

"More performant" is all relative to what you're trying to accomplish.
The CData spec you were reading from is a draft. In the meantime, keep it simple - try arrays. The regex seems like a novel idea, but also seems to me that it would be quite difficult to maintain.

Related

Efficient way to compute the median of an array of canvas in JavaScript

I have an array of N HTMLCanvasElements that come from N frames of a video, and I want to compute the "median canvas" in the sense that every component (r, g, b, opacity) of every pixel is the median of the corresponding component in all the canvases.
The video frames are 1280x720, so that the pixels data for every canvas (obtained with canvas.getContext('2d').getImageData(0, 0, canvas.width, canvas.height).data) is a Uint8ClampedArray of length 3.686.400.
The naive way to compute the median is to:
prepare a result Uint8ClampedArray of length 3.686.400
prepare a temporary Uint8ClampedArray of length N
loop from 0 to 3.686.399
a) loop over the N canvases to fill the array
b) compute the median of the array
c) store the median to the result array
But it's very slow, even for 4 canvases.
Is there an efficient way (or existing code) to do that? My question is very similar to Find median of list of images, but I need to to this in JavaScript, not Python.
Note: for b), I use d3.median() which doesn't work on typed arrays, as far as I understand, so that it implies converting to numbers, then converting back to Uint8Clamped.
Note 2: I don't know much of GLSL shaders, but maybe using the GPU would be a way to get faster results. It would require to pass data from the CPU to the GPU though, which takes time if done repeatedly.
Note 3: the naive solution is there: https://observablehq.com/#severo/compute-the-approximate-median-image-of-a-video

You wrote
I use d3.median() which doesn't work on typed arrays…
Although that is not exactly true it leads into the right direction. Internally d3.median() uses the d3.quantile() method which starts off like this:
export default function quantile(values, p, valueof) {
values = Float64Array.from(numbers(values, valueof));
As you can see, this in fact does make use of typed arrays, it is just not your Uint8ClampedArray but a Float64Array instead. Because floating-point arithmetic is much more computation-intensive than its integer counterpart (including the conversion itself) this has a dramatic effect on the performance of your code. Doing this some 3 million times in a tight loop kills the efficiency of your solution.
Since you are retrieving all your pixel values from a Uint8ClampedArray you can be sure that you are always dealing with integers, though. That said, it is fairly easy to build a custom function median(values) derived from d3.median() and d3.quantile():
function median(values) {
// No conversion to floating point values needed.
if (!(n = values.length)) return;
if (n < 2) return d3.min(values);
var n,
i = (n - 1) * 0.5,
i0 = Math.floor(i),
value0 = d3.max(d3.quickselect(values, i0).subarray(0, i0 + 1)),
value1 = d3.min(values.subarray(i0 + 1));
return value0 + (value1 - value0) * (i - i0);
}
On top of getting rid of the problematic conversion on the first line this implementation additionally applies some more micro-optimizations because in your case you are always looking for the 2-quantile (i.e. the median). That might not seem much at first, but doing this multiple million times in a loop it does make a difference.
With minimal changes to your own code you can call it like this:
// medianImageData.data[i] = d3.median(arr); Instead of this use line below.
medianImageData.data[i] = median(arr);
Have a look at my working fork of your Observable notebook.

JS - How to check if 2 images (their hash) are similar

GOAL
Finding a good way to check if 2 image are similar compairing their hash profiles. The hash is a simple array containing 0 and 1 values.
INTRO
I have 2 images. They are the same image but with some little differences: one has a different brightness, rotation and shot.
What I want to do is create a Javascript method to compare the 2 images and calculate a percentage value that tells how much they are similar.
WHAT I'VE DONE
After uploading the 2 images into a html5 canvas to get their image data, I've used the pHash algorithm (www.phash.org) to obtain their hash rapresentation.
The hash is an array containing 0 and 1 values that recreates the image in a "simplified" form.
I've also created a JS script that generates a html table with black cells where the array contains 1.The result is the following screenshot (the image is a Van Gogh picture):
Screenshot
Now, what I should do is to compare the 2 arrays for obtaining a percentage value to know "how much" they are similar.
The most part of the hash Javascript algorithms I've found googling already have a compare algorithm: the hamming distance algorithm. It's very simple and fast, but not very precise. In fact, the hamming distance algorithm says that the 2 images in my screenshot have a 67% of similarity.
THE QUESTION
Starting with 2 simple arrays, with the same length, filled with 0 and 1 values: what could be a good algorithm to determine similarity more precisely?
NOTES
- Pure Javascript development, no third party plugins or framework.
- No need of a complex algorithm to find the right similarity when the 2 images are the same but they are very different (strong rotation, totaly different colors, etc.).
Thanx
PHASH CODE
// Size is the image size (for example 128px)
var pixels = [];
for (var i=0;i<imgData.data.length;i+=4){
var j = (i==0) ? 0 : i/4;
var y = Math.floor(j/size);
var x = j-(y*size);
var pixelPos = x + (y*size);
var r = imgData.data[i];
var g = imgData.data[i+1];
var b = imgData.data[i+2];
var gs = Math.floor((r*0.299)+(g*0.587)+(b*0.114));
pixels[pixelPos] = gs;
}
var avg = Math.floor( array_sum(pixels) / pixels.length );
var hash = [];
array.forEach(pixels, function(px,i){
if(px > avg){
hash[i] = 1;
} else{
hash[i] = 0;
}
});
return hash;
HAMMING DISTANCE CODE
// hash1 and hash2 are the arrays of the "coded" images.
var similarity = hash1.length;
array.forEach(hash1, function(val,key){
if(hash1[key] != hash2[key]){
similarity--;
}
});
var percentage = (similarity/hash1.length*100).toFixed(2);
NOTE: array.forEach is not pure javascript. Consider it as a replace of: for (var i = 0; i < array.length; i++).

I'm using blockhash, it seems pretty good so far, only false positives I get are when half the pictures are of the same background color, which is to be expected =/
http://blockhash.io/
BlockHash may be slower than yours but it should be more accurate.
What you do is just calculate the greyscale of EACH pixels, and just compare it to the average to create your hash.
What BlockHash does is split the picture in small rectangles of equal size and averages the sum of the RGB values of the pixels inside them and compares them to 4 horizontal medians.
So it is normal that it takes longer, but it is still pretty efficient and accurate.
I'm doing it with pictures of a good resolution, at minimum 1000x800, and use 16bits. This gives a 64 character long hexadecimal hash. When using the hamming distance provided by the same library, I see good results when using a 10 similarity threshold.
Your idea of using greyscale isn't bad at all. But you should average out portions of the image instead of comparing each pixels. That way you can compare a thumbnail version to its original, and get pretty much the same phash!

I don't know if this can do the trick, but you can just compare the 0 and 1 similarities between arrays :
const arr1 = [1,1,1,1,1,1,1,1,1,1],
arr2 = [0,0,0,0,0,0,0,0,0,0],
arr3 = [0,1,0,1,0,1,0,1,0,1],
arr4 = [1,1,1,0,1,1,1,0,1,1]
const howSimilar = (a1,a2) => {
let similarity = 0
a1.forEach( (elem,index) => {
if(a2[index]==elem) similarity++
})
let percentage = parseInt(similarity/arr1.length*100) + "%"
console.log(percentage)
}
howSimilar(arr1,arr2) // 0%
howSimilar(arr1,arr3) // 50%
howSimilar(arr1,arr4) // 80%

2 dimensional array vs simple array

I'd like to know what would be faster at execution time and which cost less memory than the other solution.
I was doing a sudoku when I asked this question myself. As you know sudoku is a 9 x 9 grid array and generaly all solvers around sudoku are implementing array[9][9]. I presume it's because it looks like the grid you're used to play.
My question is simple, as the grid is always a square (ex: 9x9), what's the fastest and lowest memory consumption between :
- 2Dimensions : Array[9][9]
- Single dimension : Array[81]
Accessing values are in both cases calculated (if Array starts at index 0 and you need the 5th column and 6th row on a 9x9 grid) :
- Couple of coordinates for 2D Array (ex : Array[5-1][6-1])
- single calculated position (Array[((6-1)*9) + (5-1)])
Is there any ways to test this?

As stated in the comments the one array approach is the cheapest (memory wise)
As to speed, timeit is your friend:
import timeit
one_array = timeit.timeit(setup="a = [0]*81;s=3;x=2;y=1;", stmt='a[s*9+y*3+x]')
multi_array = timeit.timeit(setup="a = [[[0]*3]*3]*9;s=3;x=2;y=1;", stmt='a[s][x][y]')
print (one_array)
print (multi_array)
if one_array < multi_array:
print('one_array is faster')
else:
print("multi_array is faster!")
0.21741794539802967
0.13626013606615175
multi_array is faster!
at least in python ...

Better random function in JavaScript

I'm currently making a Conway's Game of Life reproduction in JavaScript and I've noticed that the function Math.random() is always returning a certain pattern. Here's a sample of a randomized result in a 100x100 grid:
Does anyone knows how to get better randomized numbers?
ApplyRandom: function() {
var $this = Evolution;
var total = $this.Settings.grid_x * $this.Settings.grid_y;
var range = parseInt(total * ($this.Settings.randomPercentage / 100));
for(var i = 0; i < total; i++) {
$this.Infos.grid[i] = false;
}
for(var i = 0; i < range; i++) {
var random = Math.floor((Math.random() * total) + 1);
$this.Infos.grid[random] = true;
}
$this.PrintGrid();
},
[UPDATE]
I've created a jsFiddle here: http://jsfiddle.net/5Xrs7/1/
[UPDATE]
It seems that Math.random() was OK after all (thanks raina77ow). Sorry folks! :(. If you are interested by the result, here's an updated version of the game: http://jsfiddle.net/sAKFQ/
(But I think there's some bugs left...)

This line in your code...
var position = (y * 10) + x;
... is what's causing this 'non-randomness'. It really should be...
var position = (y * $this.Settings.grid_x) + x;
I suppose 10 was the original size of this grid, that's why it's here. But that's clearly wrong: you should choose your position based on the current size of the grid.
As a sidenote, no offence, but I still consider the algorithm given in #JayC answer to be superior to yours. And it's quite easy to implement, just change two loops in ApplyRandom function to a single one:
var bias = $this.Settings.randomPercentage / 100;
for (var i = 0; i < total; i++) {
$this.Infos.grid[i] = Math.random() < bias;
}
With this change, you will no longer suffer from the side effect of reusing the same numbers in var random = Math.floor((Math.random() * total) + 1); line, which lowered the actual cell fillrate in your original code.

Math.random is a pseudo random method, that's why you're getting those results. A by pass i often use is to catch the mouse cursor position in order to add some salt to the Math.random results :
Math.random=(function(rand) {
var salt=0;
document.addEventListener('mousemove',function(event) {
salt=event.pageX*event.pageY;
});
return function() { return (rand()+(1/(1+salt)))%1; };
})(Math.random);
It's not completly random, but a bit more ;)

A better solution is probably not to randomly pick points and paint them black, but to go through each and every point, decide what the odds are that it should be filled, and then fill accordingly. (That is, if you want it on average %20 percent chance of it being filled, generate your random number r and fill when r < 0.2 I've seen a Life simulator in WebGL and that's kinda what it does to initialize...IIRC.
Edit: Here's another reason to consider alternate methods of painting. While randomly selecting pixels might end up in less work and less invocation of your random number generator, which might be a good thing, depending upon what you want. As it is, you seem to have selected a way that, at most some percentage of your pixels will be filled. IF you had kept track of the pixels being filled, and chose to fill another pixel if one was already filled, essentially all your doing is shuffling an exact percentage of black pixels among your white pixels. Do it my way, and the percentage of pixels selected will follow a binomial distribution. Sometimes the percentage filled will be a little more, sometimes a little less. The set of all shufflings is a strict subset of the possibilities generated this kind of picking (which, also strictly speaking, contains all possibilities for painting the board, just with astronomically low odds of getting most of them). Simply put, randomly choosing for every pixel would allow more variance.
Then again, I could modify the shuffle algorithm to pick a percentage of pixels based upon numbers generated from a binomial probability distribution function with a defined expected/mean value instead of the expected/mean value itself, and I honestly don't know that it'd be any different--at least theoretically--than running the odds for every pixel with the expected/mean value itself. There's a lot that could be done.

console.log(window.crypto.getRandomValues(new Uint8Array(32))); //return 32 random bytes
This return a random bytes with crypto-strength: https://developer.mozilla.org/en/docs/Web/API/Crypto/getRandomValues

You can try
JavaScript Crypto Library (BSD license). It is supposed to have a good random number generator. See here an example of usage.
Stanford JavaScript Crypto Library (BSD or GPL license). See documentation for random numbers.
For a discussion of strength of Math.random(), see this question.

The implementation of Math.random probably is based on a linear congruential generator, one weakness of which is that a random number depends on the earlier value, producing predictable patterns like this, depending on the choice of the constants in the algorithm. A famous example of the effect of poor choice of constants can be seen in RANDU.
The Mersenne Twister random number generator does not have this weakness. You can find an implementation of MT in JavaScript for example here: https://gist.github.com/banksean/300494
Update: Seeing your code, you have a problem in the code that renders the grid. This line:
var position = (y * 10) + x;
Should be:
var position = (y * grid_x) + x;
With this fix there is no discernible pattern.

You can using the part of sha256 hash from timestamp including nanoseconds:
console.log(window.performance.now()); //return nanoseconds inside
This can be encoded as string,
then you can get hash, using this: http://geraintluff.github.io/sha256/
salt = parseInt(sha256(previous_salt_string).substring(0, 12), 16);
//48 bits number < 2^53-1
then, using function from #nfroidure,
write gen_salt function before, use sha256 hash there,
and write gen_salt call to eventListener.
You can use sha256(previous_salt) + mouse coordinate, as string to get randomized hash.

Javascript bit map for simple collision detection

I need help/advice for improving/commenting my current design please :)
This relates to collision detection in a simple game: Dynamic bodies (moving ones) might collide with static bodies (i.e. ground, walls). I'm porting my Obj-C model to Javascript and am facing memory/performance questions as to my way of implementing this.
I'm using a very basic approach: An array of arrays represents my level in terms of physic opacity.
bit set to 0: Transparent area, bodies can go through
bit set to 1: Opaque area, bodies collide
Testing the transparency/opacity of a pixel simply goes as follows:
if (grid[x][y]) {
// collide!
}
My knowledge of JS is pretty limited in termes of performance/memory and can't evaluate how good this approach is :) No idea of the efficiency of using arrays that being said.
Just imagine a 1000-pixel wide level that's 600px high. It's a small level but this already means an array containing 1000 arrays each containing up to 600 entries. Besides, I've not found a way to ensure I create a 1bit-sized element like low-level languages have.
Using the following, can I be sure an entry isn't something "else" than a bit?
grid[x][y] = true;
grid[x][y] = false;
Thanks for your time and comments/advices!
J.

If you have an 1000x600 grid, you can guarantee you have at least 601 arrays in memory (1001 if you do it the other way round).
Rather than doing this, I would consider using either 1 array, or (preferrably) one object with a mapping scheme.
var map = {};
map["1x1"] = 1;
map["1x3"] = 1;
// assume no-hits are empty and free to move through
function canGoIn(x, y) {
return map.hasOwnProperty(x + "x" + y);
};
Alternately;
var map = [];
var width = 600;
map.push(0);
map.push(1);
// etc
function canGoIn(x, y) {
return map[(x * width) + y] == 1;
}

a boolean value won't be stored as just one bit, and that is also true for any other language I know (C included).
If you are having memory issues, you should consider implementing a bitarray like this one: https://github.com/bramstein/bit-array/blob/master/lib/bit-array.js
You will have to make your 2d array into a simple vector and converting your x, y coordinates like this: offset = x + (y * width);
Browsing an array will still lead to a multiplication to evaluate the offset so using a vector is equivalent to arrays.
But I suspect that calling a function (in case your using a bit-array) and doing some evaluations inside will lead to poorer performances.
I don't think you can gain performances and save memory at the same time.

Develop Reference

JavaScript is the programming language of the Web.

Find a specific string in a cdata string using javascript - javascript

"More performant" is all relative to what you're trying to accomplish. The CData spec you were reading from is a draft. In the meantime, keep it simple - try arrays. The regex seems like a novel idea, but also seems to me that it would be quite difficult to maintain.

Related

Efficient way to compute the median of an array of canvas in JavaScript

JS - How to check if 2 images (their hash) are similar

2 dimensional array vs simple array

Better random function in JavaScript

Javascript bit map for simple collision detection

Categories

Resources