Random IDs in JavaScript - javascript

I'm generating random IDs in javascript which serve as unique message identifiers for an analytics suite.
When checking the data (more than 10MM records), there are some minor collisions for some IDs for various reasons (network retries, robots faking data etc), but there is one in particular which has an intriguing number of collisions: akizow-dsrmr3-wicjw1-3jseuy.
The collision rate for the above id is at around 0.0037% while the rate for the other id collisions is under 0.00035% (10 times less) out of a sample of 111MM records from the same day. While the other ids are varying from day to day, this one remains the same, so for a longer period the difference is likely larger than 10x.
This is how the distribution of the top ID collisions looks like
This is the algorithm used to generate the random IDs:
function generateUUID() {
return [
generateUUID4(), generateUUID4(), generateUUID4(), generateUUID4()
].join("-");
}
function generateUUID4() {
return Math.abs(Math.random() * 0xFFFFFFFF | 0).toString(36);
}
I reversed the algorithm and it seems like for akizow-dsrmr3-wicjw1-3jseuy the browser's Math.random() is returning the following four numbers in this order: 0.1488114111471948, 0.19426893796638328, 0.45768366415465334, 0.0499740378116197, but I don't see anything special about them. Also, from the other data I collected it seems to appear especially after a redirect/preload (e.g. google results, ad clicks etc).
So I have 3 hypotheses:
There's a statistical problem with the algorithm that causes this specific collision
Redirects/preloads are somehow messing with the seed of the pseudo-random generator
A robot is smart enough that it fakes all the other data but for some reason is keeping the random id the same. The data comes from different user agents, IPs, countries etc.
Any idea what could cause this collision?

Related

Randomize numbers in a random order script

I have searched and although I see this addressed in Python and PHP, I don't see it using JS. This script randomizes order on refresh and I would like the numbers to be randomly generated as well as the order. The order part is already working, but I don't want it limited to numbers I input (150, 375 and 900 in this example). I'm going to have about 50 variables but I've just listed three to keep it short here.
I would also like to understand how to control the place values permitted, so I could have xx,xx,xx, etc./ xxx,xxx,xxx etc. and also how to state a range, so it could include xx,xxx,xxxx
Lastly I would be very keen to understand how to limit the generated numbers to meet certain profiles, like all tens (670,320,980,410,650), twenty-fives (375,925,400,850,175, etc.). This would be extremely helpful.
I'm sorry this is just beyond my ability, and I have tried and tried. I operate a school in India where I teach math, and we have had great success teaching number recognition and counting in English and Hindi using printed charts of numbers that fit different criteria like this, and I would like to adapt it to a mobile app, just to let you know why I'm asking. Many thanks.
var contents=new Array()
contents[0]='150'
contents[1]='375'
contents[2]='900'
*/
var spacing = "<br />"
var the_one
var z = 0
while (z<contents.length){
the_one=Math.floor(Math.random()*contents.length)
if (contents[the_one]!="_selected!"){
document.write(contents[the_one]+spacing)
contents[the_one]="_selected!"
z++
}
}

How to make random mutations

I'm doing a project on natural selection for cells for fun. Each code has "dna" which is just a set of instructions. The dna can either have REMOVE WASTE, DIGEST FOOD, or REPAIR WALL. I won't really go into detail what they do, because that would take too long. But the only reason evolution really happens is through genetic mutations. I'm wondering if this is possible in javascript, and how to do it. For example, the starting cell has 5 dna strands. But if it reproduces, the child can have 4, or 6. And some of the dna strands can be altered. This is my code so far:
var strands = ["DIGEST FOOD", "REPAIR WALL", "REMOVE WASTE"];
var dna = [];
for (let i = 0; i < 5; i++) {
if (parent) {
// something about the parents dna, and the mutation chance
}
else {
dna.push(strands[Math.floor(Math.random() * 3)]); // if cell doesn't have parent
}
}
I'm just wondering if this is possible in javascript, and how to succesfully do it. Sorry if the question isn't too clear.
Edit: Let me rephrase a little. What I'm trying to achieve is a genetic mutation in the new cell. Like:
if (parent) {
dna.push(parent);
if (Math.random() < 0.5) {
changeStrand(num);
}
if (Math.random() < 0.5) {
addStrand(num);
}
if (Math.random() < 0.5) {
removeStrand(num);
}
}
function changeStrand() {
// change the strand
}
function newStrand(num) {
// add random strands
}
function removeStrand(num) {
// remove random strands
}
or something like that
For a genetic algorithm, you basically want to take two slices from each parent and stitch them together, whilst ensuring the end result is still a valid dna strand.
For a fixed sized DNA sequence (such a N queens positions), the technique would be to pick a random slice point (1-3 | 4-8) and then combine these slices from the parents to create a child.
For your usecase, you need two random slices who sum of sizes adds upto 4-6. So possibly two slices of size 2-3. You could potentially take one from the front, and the other from the back. Else you could first pick a random output size, and then fill it will two random sequences for either parent.
Array.slice() and Array.splice() are probably the functions you want to use.
You can also add in a random mutation to the end result. Viruses at the speed limit of viable genetic evolution have an average of 1 mutation per transcription. Which means some transcriptions won’t have mutations, which is equivalent to allowing some of the parents in the parent generation to survive.
You can also experiment with different variations. Implement these as feature flags, and see what works best in practice.
Also compare with Beam Search, which essentially keeps a copy of the N best results from each generation. You may or may not want to keep the best from the parent generation to survive unmutated.
Another idea is to compute a distance metric between individuals, and add a cost for being too close to an existing member of the population, and this will select for genetic diversity.
In the standard model, variation occurs both by point mutations in the letter sequence and by “crossover” (in which the DNA of an offspring is generated by combining long sections of DNA from each parent).
The analogy to local search algorithms has already been described; the principal difference between stochastic beam search and evolution is the use of sexual reproduction, wherein successors are generated from multiple organisms rather than just one. The actual mechanisms of evolution are, however, far richer than most genetic algorithms allow. For example, mutations can involve reversals, duplications, and movement of large chunks of DNA; some viruses borrow DNA from one organism and insert it in another, and there are transposable genes that do nothing but copy themselves many thousands of times within the genome. There are even genes that poison cells from potential mates that do not carry the gene, thereby increasing their own chances of replication. Most important is the fact that the genes themselves encode the mechanisms whereby the genome is reproduced and translated into an organism. In genetic algorithms, those mechanisms are a separate program that is not represented within the strings being manipulated.
Artificial Intelligence: A Modern Approach. (Third edition) by Stuart Russell and Peter Norvig.
If you want to have random numbers you can use Math.random() for that. In the linked page there are also some examples for getting values between x and y for example:
// Source https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/random
function getRandomInt(min, max) {
min = Math.ceil(min);
max = Math.floor(max);
return Math.floor(Math.random() * (max - min)) + min;
}
I am not sure this is what you try to achieve - since you already make use of the Math.random() function.

Leaderboard ranking with Firebase

I have project that I need to display a leaderboard of the top 20, and if the user not in the leaderboard they will appear in the 21st place with their current ranking.
Is there efficient way to this?
I am using Cloud Firestore as a database. I believe it was mistake to choose it instead of MongoDB but I am in the middle of the project so I must do it with Cloud Firestore.
The app will be use by 30K users. Is there any way to do it without getting all the 30k users?
this.authProvider.afs.collection('profiles', ref => ref.where('status', '==', 1)
.where('point', '>', 0)
.orderBy('point', 'desc').limit(20))
This is code I did to get the top 20 but what will be the best practice for getting current logged in user rank if they are not in the top 20?
Finding an arbitrary player's rank in leaderboard, in a manner that scales is a common hard problem with databases.
There are a few factors that will drive the solution you'll need to pick, such as:
Total Number players
Rate that individual players add scores
Rate that new scores are added (concurrent players * above)
Score range: Bounded or Unbounded
Score distribution (uniform, or are their 'hot scores')
Simplistic approach
The typical simplistic approach is to count all players with a higher score, eg SELECT count(id) FROM players WHERE score > {playerScore}.
This method works at low scale, but as your player base grows, it quickly becomes both slow and resource expensive (both in MongoDB and Cloud Firestore).
Cloud Firestore doesn't natively support count as it's a non-scalable operation. You'll need to implement it on the client-side by simply counting the returned documents. Alternatively, you could use Cloud Functions for Firebase to do the aggregation on the server-side to avoid the extra bandwidth of returning documents.
Periodic Update
Rather than giving them a live ranking, change it to only updating every so often, such as every hour. For example, if you look at Stack Overflow's rankings, they are only updated daily.
For this approach, you could schedule a function, or schedule App Engine if it takes longer than 540 seconds to run. The function would write out the player list as in a ladder collection with a new rank field populated with the players rank. When a player views the ladder now, you can easily get the top X + the players own rank in O(X) time.
Better yet, you could further optimize and explicitly write out the top X as a single document as well, so to retrieve the ladder you only need to read 2 documents, top-X & player, saving on money and making it faster.
This approach would really work for any number of players and any write rate since it's done out of band. You might need to adjust the frequency though as you grow depending on your willingness to pay. 30K players each hour would be $0.072 per hour($1.73 per day) unless you did optimizations (e.g, ignore all 0 score players since you know they are tied last).
Inverted Index
In this method, we'll create somewhat of an inverted index. This method works if there is a bounded score range that is significantly smaller want the number of players (e.g, 0-999 scores vs 30K players). It could also work for an unbounded score range where the number of unique scores was still significantly smaller than the number of players.
Using a separate collection called 'scores', you have a document for each individual score (non-existent if no-one has that score) with a field called player_count.
When a player gets a new total score, you'll do 1-2 writes in the scores collection. One write is to +1 to player_count for their new score and if it isn't their first time -1 to their old score. This approach works for both "Your latest score is your current score" and "Your highest score is your current score" style ladders.
Finding out a player's exact rank is as easy as something like SELECT sum(player_count)+1 FROM scores WHERE score > {playerScore}.
Since Cloud Firestore doesn't support sum(), you'd do the above but sum on the client side. The +1 is because the sum is the number of players above you, so adding 1 gives you that player's rank.
Using this approach, you'll need to read a maximum of 999 documents, averaging 500ish to get a players rank, although in practice this will be less if you delete scores that have zero players.
Write rate of new scores is important to understand as you'll only be able to update an individual score once every 2 seconds* on average, which for a perfectly distributed score range from 0-999 would mean 500 new scores/second**. You can increase this by using distributed counters for each score.
* Only 1 new score per 2 seconds since each score generates 2 writes
** Assuming average game time of 2 minute, 500 new scores/second could support 60000 concurrent players without distributed counters. If you're using a "Highest score is your current score" this will be much higher in practice.
Sharded N-ary Tree
This is by far the hardest approach, but could allow you to have both faster and real-time ranking positions for all players. It can be thought of as a read-optimized version of of the Inverted Index approach above, whereas the Inverted Index approach above is a write optimized version of this.
You can follow this related article for 'Fast and Reliable Ranking in Datastore' on a general approach that is applicable. For this approach, you'll want to have a bounded score (it's possible with unbounded, but will require changes from the below).
I wouldn't recommend this approach as you'll need to do distributed counters for the top level nodes for any ladder with semi-frequent updates, which would likely negate the read-time benefits.
Final thoughts
Depending on how often you display the leaderboard for players, you could combine approaches to optimize this a lot more.
Combining 'Inverted Index' with 'Periodic Update' at a shorter time frame can give you O(1) ranking access for all players.
As long as over all players the leaderboard is viewed > 4 times over the duration of the 'Periodic Update' you'll save money and have a faster leaderboard.
Essentially each period, say 5-15 minutes you read all documents from scores in descending order. Using this, keep a running total of players_count. Re-write each score into a new collection called scores_ranking with a new field players_above. This new field contains the running total excluding the current scores player_count.
To get a player's rank, all you need to do now is read the document of the player's score from score_ranking -> Their rank is players_above + 1.
One solution not mentioned here which I'm about to implement in my online game and may be usable in your use case, is to estimate the user's rank if they're not in any visible leaderboard because frankly the user isn't going to know (or care?) whether they're ranked 22,882nd or 22,838th.
If 20th place has a score of 250 points and there are 32,000 players total, then each point below 250 is worth on average 127 places, though you may want to use some sort of curve so as they move up a point toward bottom of the visible leaderboard they don't jump exactly 127 places each time - most of the jumps in rank should be closer to zero points.
It's up to you whether you want to identify this estimated ranking as an estimation or not, and you could add some a random salt to the number so it looks authentic:
// Real rank: 22,838
// Display to user:
player rank: ~22.8k // rounded
player rank: 22,882nd // rounded with random salt of 44
I'll be doing the latter.
Alternative perspective - NoSQL and document stores make this type of task overly complex. If you used Postgres this is pretty simple using a count function. Firebase is tempting because it's easy to get going with but use cases like this are when relational databases shine. Supabase is worth a look https://supabase.io/ similar to firebase so you can get going quickly with a backend but its opensource and built on Postgres so you get a relational database.
A solution that hasn't been mentioned by Dan is the use of security rules combined with Google Cloud Functions.
Create the highscore's map. Example:
highScores (top20)
Then:
Give the users write/read access to highScores.
Give the document/map highScores the smallest score in a property.
Let the users only write to highScores if his score > smallest score.
Create a write trigger in Google Cloud Functions that will activate when a new highScore is written. In that function, delete the smallest score.
This looks to me the easiest option. It is realtime as well.
You could do something with cloud storage. So manually have a file that has all the users' scores (in order), and then you just read that file and find the position of the score in that file.
Then to write to the file, you could set up a CRON job to periodically add all documents with a flag isWrittenToFile false, add them all to the file (and mark them as true). That way you won't eat up your writes. And reading a file every time the user wants to view their position is probably not that intensive. It could be done from a cloud function.
2022 Updated and Working Answer
To solve the problem of having a leaderboards with user and points, and to know your position in this leaderboards in an less problematic way, I have this solution:
1) You should create your Firestorage Document like this
In my case, I have a document perMission that has for each user a field, with the userId as property and the respective leaderboard points as value.
It will be easier to update the values inside my Javascript code.
For example, whenever an user completed a mission (update it's points):
import { doc, setDoc, increment } from "firebase/firestore";
const docRef = doc(db, 'leaderboards', 'perMission');
setDoc(docRef, { [userId]: increment(1) }, { merge: true });
The increment value can be as you want. In my case I run this code every time the user completes a mission, increasing the value.
2) To get the position inside the leaderboards
So here in your client side, to get your position, you have to order the values and then loop through them to get your position inside the leaderboards.
Here you can also use the object to get all the users and its respective points, ordered. But here I am not doing this, I am only interested in my position.
The code is commented explaining each block.
// Values coming from the database.
const leaderboards = {
userId1: 1,
userId2: 20,
userId3: 30,
userId4: 12,
userId5: 40,
userId6: 2
};
// Values coming from your user.
const myUser = "userId4";
const myPoints = leaderboards[myUser];
// Sort the values in decrescent mode.
const sortedLeaderboardsPoints = Object.values(leaderboards).sort(
(a, b) => b - a
);
// To get your specific position
const myPosition = sortedLeaderboardsPoints.reduce(
(previous, current, index) => {
if (myPoints === current) {
return index + 1 + previous;
}
return previous;
},
0
);
// Will print: [40, 30, 20, 12, 2, 1]
console.log(sortedLeaderboardsPoints);
// Will print: 4
console.log(myPosition);
You can now use your position, even if the array is super big, the logic is running in the client side. So be careful with that. You can also improve the client side code, to reduce the array, limit it, etc.
But be aware that you should do the rest of the code in your client side, and not Firebase side.
This answer is mainly to show you how to store and use the database in a "good way".

Next steps to do with the mfccs, in voice recognition web based

I am working on urdu (language spoken in pakistan, india, bangladesh) voice recognition to translate urdu speech into urdu words. So far i did nothing but just have found meyda javascript library for extracting mfccs from data frames. Some document says that for ASR there needs first 12 or 13 mfccs out of 26. During the test, i have separate 46 phonemes(/b/, /g/, /d/ ...) in a folder in wav extension. After running meyda proccess on one of the phoneme, it creates 4 to 5 frames per phoneme, where each frame contain the mfccs each of first 12 values. Due to less than 10 reputation, post images are disabled. but you can the image on the following link. The image contain 7 frames of phoneme /b/. each frame includes 13 mfccs. The Red long vertical line value is 438, others or 48, 38 etc.
http://realnfo.com/images/b.png
My question is that whether i need to save these frames(mfccs) in the database as predefined phoneme for /b/ and the same i do for all the other phonemes and then tie the microphone, meyda will extract the mfccs per frame, and i will programmed the javascript that the extracted frame mfcc will be matched with the predefined frames mfccs by using Dynamic Time Warping. And at the end will get the smallest distance for specific phoneme.
The proffesional way after mfccs are HMM and GMM but i dont know how to deal with. i studied so many documents about HMM and GMM but waste.
co-author of Meyda here.
That seems like a pretty difficult use case. If you already know how to split the buffers up into phonemes, you can run the MFCC extraction on those buffers, and use k Nearest Neighbour (or some better classification algorithm) for what I would imagine would be reasonable success rate.
A rough sketch:
const Meyda = require('meyda');
// I can't find a real KNN library because npm is down.
// I'm just using this as a placeholder for a real one.
const knn = require('knn');
// dataset should be a collection of labelled mfcc sets
const nearestPhoneme = knn(dataset);
const buffer = [...]; // a buffer containing a phoneme
let nearestPhonemes = []; // an array to store your phoneme matches
for(let i = 0; i < buffer.length; i += Meyda.bufferSize) {
nearestPhonemes.push(nearestPhoneme(Meyda.extract('mfcc', buffer)));
}
After this for loop, nearestPhonemes contains an array of the best guesses for phonemes for each frame of the audio. You could then pick the most commonly occurring phoneme in that array (the mode). I would also imagine that averaging the mfccs across the whole frame may yield a more robust result. It's certainly something you'll have to play around with and experiment with to find the most optimal solution.
Hope that helps! If you open source your code, I would love to see it.

Find in Multidiamentional Array

I have an multi dimensional array as
[
{"EventDate":"20110421221932","LONGITUDE":"-75.61481666666670","LATITUDE":"38.35916666666670","BothConnectionsDown":false},
{"EventDate":"20110421222228","LONGITUDE":"-75.61456666666670","LATITUDE":"38.35946666666670","BothConnectionsDown":false}
]
Is there any plugin available to search for combination of LONGITUDE,LATITUDE?
Thanks in advance
for (var i in VehCommLost) {
var item = VehCommLost[i];
if (item.LONGITUDE == 1 && item.LATITUDE == 2) {
//gotcha
break;
}
}
this is json string..which programming language u r using with js??
by the way try with parseJSON
Are the latitudes and longitudes completely random? or are they points along a path, so there is some notion of sequence?
If there is some ordering of the points in the array, perhaps a search algorithm could be faster.
For example:
if the inner array is up to 10,000 elements, test item 5000
if that value is too high, focus on 1-4999;
if too low, focus on 5001-10000, else 5000 is the right anwser
repeat until the range shrinks to the vicinity, making a straight loop through the remaining values quick enough.
After sleeping on it, it seems to me most likely that the solution to your problem lies in recasting the problem.
Since it is a requirement of the system that you are able to find a point quickly, I'd suggest that a large array is the wrong data structure to support that requirement. It maybe necessary to have an array, but perhaps there could also be another mechanism to make the search rapid.
As I understand it you are trying to locate points near a known lat-long.
What if, in addition to the array, you had a hash keyed on lat-long, with the value being an array of indexes into the huge array?
Latitude and Longitude can be expressed at different degrees of precision, such as 141.438754 or 141.4
The precision relates to the size of the grid square.
With some knowledge of the business domain, it should be possible to select a reasonably-sized grid such that several points fit inside but not too many to search.
So the hash is keyed on lat-long coords such as '141.4#23.2' with the value being a smaller array of indexes [3456,3478,4579,6344] using the indexes we can easily access the items in the large array.
Suppose we need to find 141.438754#23.2i7643 : we can reduce the precision to '141.4#23.2' and see if there is an array for that grid square.
If not, widen the search to the (3*3-1=) 8 adjacent grids (plus or minus one unit).
If not, widen to the (=5*5-9) 16 grid squares one unit away. And so on...
Depending on how the original data is stored and processed, it may be possible to generate the hash server-side, which would be preferable. If you needed to generate the hash client-side, it might be worth doing if you reused it for many searches, but would be kind of pointless if you used the data only once.
Could you comment on the possibility of recasting the problem in a different way, perhaps along these lines?

Categories

Resources