In my application I've got a list of words that are being shown to the user one by one. The exact number may differ, but it can get as low as about 10. It's ok if the words repeat. To select a word I currently use Math.random
const select = (arr) => arr[Math.floor(Math.random() * arr.length)]
The problem is that Math.random generates, well, random numbers, that may sometimes form clusters, so the user may see the same words several times in a row. I have a simple check that new word is different than the previous one, but I want to improve it. The question is, how can one make random numbers from that range (0 - arr.length-1) more spread, like if it was a human who would choose random words (i.e. they would repeat rarer). Or, for this particular case, how to make the user see as many different words in given time as possible?
Related
I'm working on a big machine learning/nlp project and I'm stuck at a small part of it. (PM me, if you want to know what I'm working on exactly.)
I try to code a program in Javascript that learns to generate valid words, only by using all letters of the alphabet.
What I have is a database of 500K different words. It's a big JS object, structured like this (the words are german):
database = {
"um": {id: 1, word: "um", freq: 10938},
"oder": {id: 2, word: "oder", freq: 10257},
"Er": {id: 3, word: "Er", freq: 9323},
...
}
"freq" means frequency obviously. (Maybe this value sometimes gets important but I currently don't use it, so just ignore it.)
The way my program currently works is:
In the first iteration, it generates a completely random word between 2 and 13 letters long and searches for it in the database. If it's there, every letter in the word gets a good rating, if it's not there, they get a bad rating. Also the word length gets rated. If the word is valid, its word length gets a good rating, if it's not, its word length gets a bad rating.
In the iterations after that first one, it doesn't generate a word with random letters and a random word length. It uses probabilities based on the ratings of the letters and the word length.
For example, let's say it found the words "the", "so" and "if" after the first 100 iterations. So the letters "t", "h", "e" and the letters "s", "o", and the letters "i", "f" are good rated, and the word length of 2 and 3 is also good rated. So the word generated in the next iteration will more likely contain these good rated letters than bad rated letters.
Of course, the program also checks if the currently generated word already was generated and if so, then this word doesn't get rated again and it generates a new one.
In theory it should learn the optimal letter frequency and the optimal word-length-frequency by its own and sometimes only generate valid words.
Yeah. Of course this doesn't work. It gets better for the first few iterations, but as soon as it has found all the 2-lettered words it gets worse. I think my whole way how I do this is wrong. I've actually tried it out and have a (not so beautiful) graph after 5000 iterations for you:
Red line: wrong words generated
Green line: right words generated
Yeah. What is the problem here? Am I doing machine learning wrong? And do you have a solution? Some algorithm or trie system?
PS: I'm aware of this, but it's not in JS, I don't understand it and I can't comment on it.
An alternative method would be to use a Markov Model.
Start by counting up the letter frequencies and also word length frequencies in your dictionary. Then, to create a word:
Pick a weighted random number (see below) between 1 and the maximum existing word length. That's how many letters you're going to generate.
For each letter in the word, pick a weighted random letter and add it to the word.
That's an order-0 Markov model. It's based on the frequency of letters that occur in the corpus. It will probably give you results that are similar to the system you have.
You'll get better results from an order-1 Markov model, where instead of computing letter frequencies, you compute bigram (two-letter permutations) frequencies. So to pick the first letter, you choose only from the bigrams that are used to begin words. For subsequent letters, you choose a letter that follows the previously generated letter. That's going to give you somewhat better results than an order-0 model.
An order-2 model is surprisingly effective. See my blog post, Shakespeare vs. Markov, for an example.
A weighted random number is a number selected "at random," but skewed to reflect some distribution. In the English language, for example, the letter 'e' occurs approximately 12.7% of the time. 't' occurs 9.06% of the time, etc. See https://en.wikipedia.org/wiki/Letter_frequency. So you'd want your weighted random number generator's output to approximate that distribution. Or, in your case, you'd want it to approximate the distribution in your corpus. See Weighted random numbers for an example of how that's done.
This question already has answers here:
How to match numbers between X and Y with regexp?
(7 answers)
Closed 7 years ago.
First of all, i know Regular expressions isn't the best tool to achieve what I want here. I have done enough research to know that bit. Still, The problem I am stuck in requires me to make up a regex to find the values between some lower and upper bound values.
So here is the problem, I have a large set of data, let's say ranging between 1 and 1000000. That data is not under my direct control, I cannot manipulate the data directly. Only way of finding out (searching) some values from that data is regex.. Now, the user can give two values, a minimum value and a maximum value and I need to construct a regex based on these two values and then query the large data set using the regex to get all the values lying between the set range. So, if my data contains [1,5,7,9,15,30,45,87] and user sets the range min:10, max:40. The regex should filter out values 15, 30.
From whatever I have searched, I know it is very much possible to build a regex for finding out values between fixed values (if we know them beforehand) for example, values between 1 to 100 can be found by:
^(100|[1-9][0-9]?)$
But what gets so tricky about my problem is that the input range can be anything from pretty much 1 digit values to up to 10 digit values. 10000-550000 can be an example user input for a large data set.
I know this will require some complex logic and loops involved on the basis of number of digits in the lower bound and number of digits in the upper bound of the range and then some recursive or other magical logic to build a regex that covers all the number lying in that range.
I've been filling up pages to come up with a logic but I'm afraid it surpasses my knowledge of regex. If anyone has ever done something like this before or try to point me in the right direction or attempt it him/herself - it'll be quite helpful. Thanks.
The language I will be using this in is JavaScript and I read somewhere that JS doesn't support conditional regex, keeping that in mind, solution doesn't have to be in specific to a language.
If your task is to get numbers between min and max value from the dataset, you can try filter method.
Var filteredResults = Dataset.filter(function(item){
If(item < max && item > min)
Return item
}
)
I'm running a bit out of ideas how to realize a small project.
What I have:
- a list of users including their ID and name
What I want to achieve:
- I want to combine each user on this list with another user such that no user is assigned to more than one user and no user is assigned to herself.
- The combination has to be random and has to take past combinations into account
My idea so far:
- I have this information:
User (A,B,C,D) (the actual number of users ranges between 50 and 400)
Possible combinations: (A-B,A-C,A-D,B-C,B-D,C-D)
Random draw(1): (A-B, C-D)
Random draw(2): (A-D, B-C)
Random draw(3): (A-C, B-D)
I was able to get all possible combinations using a join of the user table with itself.
I guess I can take previous draws into account by storing the draws in a separate table and limit the possible combinations to those that are not in this special table.
What I can't do:
- I don't know how to randomly draw from the list of possible combinations such that every user is part of only one combination per draw (e.g. A-B,A-D in the same draw is not allowed)
- I try to use sql or a bit php for this (maybe javascript)
Thanks for any help.
An easy solution:
Create a temporary table with a row for each pairing. Loop over the list of users skipping a random number of empty rows from 1 to the number of empty rows -- insert user.
Easy solution number 2:
Given N users. Assign each user a unique random number from 1 to N (remove randomly from the set of all numbers from 1 to N). Pair each user with from 1-N/2 with user from N/2+1 to N.
The solution is the problem "Bergr (s) table", see Wikipedia: http://en.wikipedia.org/wiki/Round-robin_tournament.
The latest (best) solution is from Professor Frončeka (Dalibor Froncek, a professor at the University of Minnesota in the US).
For custom solutions, look in the table of solutions n ^ 2.
i have a word from A to Z. all word should in small latter (Capital not include) and 1 to 9 (included all special word who can be used in email address (just for a test)).
how i can generate unique 1 lacs text who never repeat itself. can anyone solve this puzzle.
i want a another thing that all words should not more then 10 char and not should minimum 6 char long
Put the characters in an array. Copy the array as the source of a new line. Randomly slice words from the array and put them in the line (use Math.random() * array.length | 0). Keep going for the required number of words.
You can also just use a string and charAt(index) if you only want single characters, but you have to keep cutting out the character that you select which is likely less efficient than using array.slice.
Whatever suits though, since performance is likely irrelevant.
Implement the “Word Decoder” game. This game will present the player with a series of scrambled words (up to 20 words) and challenge him/her to attempt to unscramble them. Each time a new word is displayed, and a text input is provided for the user to write the unscrambled word.
Once the player thinks the word has been properly decoded, he clicks on the “Check answer” button. If the player’s answer is correct, his score is increased by one. If his answer is not correct, he is notified and he is then given a different word.
i understood the Question , but i dont know how to generate it , or even how to start it!!
any help please?
To start, try breaking down the problem into things you'll need; think nouns and verbs. This is simply rewriting the problem in new terms. You need:
word: just a string, but it's a noun you'll need, so list it.
dictionary: a collection of words to choose from (during testing, you don't need many)
display: these become HTML elements, since you're working with JS
scrambled word
text input
submit button to check answer
score
"wrong answer" notifier
to scramble a word
to compare words: how can you compare two words to see if one is a permutation of the other? Do it right and anagrams aren't a problem.
to check an answer
to increment score
to notify user of incorrect answer
to present a new scrambled word
Any item beginning with "to" is a verb; anything else is a noun. Nouns become objects, verbs become methods/functions.
The above is mostly a top-down approach, in contrast with bottom-up (note that top-down vs bottom-up isn't an either-or proposition). Other approaches that might help with not knowing where to start are test driven development or its offshoot, behavior driven development. With these you start by defining, in code, what the program should do, then fill in the details to make it do that.
A hint on comparing words: the problem is basically defining an equivalence class—two strings are equivalent if one is a permutation of the other. The permutations of a string, taken together, form the equivalence class for that string; two strings are in the same equivalence class if the strings are equivalent. As the linked document points out, equivalence classes are well represented by picking a single element of the class as the class representative. Lastly, you can turn the equivalence class definition around: two strings are permutations of each other if they are in the same equivalence class.
Look into loading a dictionary via XHR.
there are tons of those available online [http://www.mieliestronk.com/wordlist.html NOTE: it contains some swear words, if you're going to be doing this for academic purposes, since its your homework, you should look for a "clean" list]...
For scrambling the word: make your string into a char array, then find an array shuffle function [they are simple to write, I wrote one for my implementation of Bogosort]...
function shuffle(b)
{
var a = b.concat([]); //makes a copy of B, b won't be changed...
var final = [];
while(a.length != 0)
{
//0 -> a length-1
var targIndex = Math.round((a.length-1)*(Math.random()));
var value = a[targIndex]
a.remove(targIndex);
final.push(value);
}
return final;
}
When the user is done inputting, simply compare input with the answer [case insensitive, ignore spaces] As stated in comments, there are also the possibility of anagrams, so be sure to check for those... perhaps, you could simply verify the word exists in the dictionary.