Can i generate unique text 1 lacs time in javascript through any method who does not need any framework? - javascript

i have a word from A to Z. all word should in small latter (Capital not include) and 1 to 9 (included all special word who can be used in email address (just for a test)).
how i can generate unique 1 lacs text who never repeat itself. can anyone solve this puzzle.
i want a another thing that all words should not more then 10 char and not should minimum 6 char long

Put the characters in an array. Copy the array as the source of a new line. Randomly slice words from the array and put them in the line (use Math.random() * array.length | 0). Keep going for the required number of words.
You can also just use a string and charAt(index) if you only want single characters, but you have to keep cutting out the character that you select which is likely less efficient than using array.slice.
Whatever suits though, since performance is likely irrelevant.

Related

How do I combine 2 regex patterns into 1 and use it within a function

I have a regEx for checking a number is less than 15 significant figures, Borrowed from this SO answer
/^-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=.{1,16}0*$)(?:\d+[.,]\d+)‌​)).+$/
The the other is used to check that same number is upto 2 decimal places(truncate)
/^-?(\d*\.?\d{0,2}).*/
I have almost 0 regex skill.
Question: How do I combine the 2 regexes to do the work of both, AND not just either OR( accomplished by | character - i am not sure if it achieves same function as combining both)
something like:
/^-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=.{1,16}0*$)(?:\d+[.,]\d+)‌​)).+$ <AND&&NOTOR>(\d*\.?\d{0,2}).*/
Thanks in advance
EDIT: edit moved to a seperate SO question
If you add only one condition of maximum 2 decimal places to first regex, try this..
^-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=[,.\d]{1,16}0*$)(?:\d+[.,]\d{1,2}$))).+$
Demo,,, in which I only changed original \d+ to d{1,2}$
Edited for the reguest to extract 15 significant figures and capture group 1 ($1). Try this which is wrapped to capture group 1 ($1) and limited 15 significant figures to be extracted easily.
^(-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=[,.\d]{1,16}0*$)(?:\d+[.,]\d{1,2}$))).{1,16}).*$
Demo,,, in which changed to .{1,16} from .+$.
If the number matches, then able to be replaced $1, but if not so, replaced nothing, thus remains original unmatched number.
Therefore, if you want to extract 15 significant figures by replacing with $1 only when your condition is satisfied, try this regex to your function.
^(-?(?=\d{1,15}(?:[.,]0+)?0*$|(?:(?=[,.\d]{1,16}0*$)(?:\d+[.,]\d{1,2}$))).{1,16}).*$|^.*$
Demo,,, in which all numbers are matched, but only the numbers satisfying your condition are captured to $1 in format of 15 significant figures.

Given a dictionary and a list of letters, make a program learn to generate valid words | Javascript

I'm working on a big machine learning/nlp project and I'm stuck at a small part of it. (PM me, if you want to know what I'm working on exactly.)
I try to code a program in Javascript that learns to generate valid words, only by using all letters of the alphabet.
What I have is a database of 500K different words. It's a big JS object, structured like this (the words are german):
database = {
"um": {id: 1, word: "um", freq: 10938},
"oder": {id: 2, word: "oder", freq: 10257},
"Er": {id: 3, word: "Er", freq: 9323},
...
}
"freq" means frequency obviously. (Maybe this value sometimes gets important but I currently don't use it, so just ignore it.)
The way my program currently works is:
In the first iteration, it generates a completely random word between 2 and 13 letters long and searches for it in the database. If it's there, every letter in the word gets a good rating, if it's not there, they get a bad rating. Also the word length gets rated. If the word is valid, its word length gets a good rating, if it's not, its word length gets a bad rating.
In the iterations after that first one, it doesn't generate a word with random letters and a random word length. It uses probabilities based on the ratings of the letters and the word length.
For example, let's say it found the words "the", "so" and "if" after the first 100 iterations. So the letters "t", "h", "e" and the letters "s", "o", and the letters "i", "f" are good rated, and the word length of 2 and 3 is also good rated. So the word generated in the next iteration will more likely contain these good rated letters than bad rated letters.
Of course, the program also checks if the currently generated word already was generated and if so, then this word doesn't get rated again and it generates a new one.
In theory it should learn the optimal letter frequency and the optimal word-length-frequency by its own and sometimes only generate valid words.
Yeah. Of course this doesn't work. It gets better for the first few iterations, but as soon as it has found all the 2-lettered words it gets worse. I think my whole way how I do this is wrong. I've actually tried it out and have a (not so beautiful) graph after 5000 iterations for you:
Red line: wrong words generated
Green line: right words generated
Yeah. What is the problem here? Am I doing machine learning wrong? And do you have a solution? Some algorithm or trie system?
PS: I'm aware of this, but it's not in JS, I don't understand it and I can't comment on it.
An alternative method would be to use a Markov Model.
Start by counting up the letter frequencies and also word length frequencies in your dictionary. Then, to create a word:
Pick a weighted random number (see below) between 1 and the maximum existing word length. That's how many letters you're going to generate.
For each letter in the word, pick a weighted random letter and add it to the word.
That's an order-0 Markov model. It's based on the frequency of letters that occur in the corpus. It will probably give you results that are similar to the system you have.
You'll get better results from an order-1 Markov model, where instead of computing letter frequencies, you compute bigram (two-letter permutations) frequencies. So to pick the first letter, you choose only from the bigrams that are used to begin words. For subsequent letters, you choose a letter that follows the previously generated letter. That's going to give you somewhat better results than an order-0 model.
An order-2 model is surprisingly effective. See my blog post, Shakespeare vs. Markov, for an example.
A weighted random number is a number selected "at random," but skewed to reflect some distribution. In the English language, for example, the letter 'e' occurs approximately 12.7% of the time. 't' occurs 9.06% of the time, etc. See https://en.wikipedia.org/wiki/Letter_frequency. So you'd want your weighted random number generator's output to approximate that distribution. Or, in your case, you'd want it to approximate the distribution in your corpus. See Weighted random numbers for an example of how that's done.

Grab multiple numbers 1-10 from string

I am parsing a string of multiple numbers between 1 and 10 with the eventual goal of adding them to a set.
There will be multiple concatenated numbers after a text identifier such as {text}12345678910.
I am currently using match(/\d/g) to grab the numbers but it separates 1 and 0 in 10. I then look for 0 in my String Array, see if there's a 1 in the element before it, turn it into a 10 and delete the other entry. Not very elegant.
How can I clean up my matching code? I definitely don't need to use regex for this, but it makes grabbing the numbers fairly easy.
You could just match with this regex:
/10|\d/g
(instead of the one you use currently, not additionally)
Regex is executed left-to-right, so first it finds any occurrences of 10, and then of other digits (so using, for example /\d|10/g or even /\d|(10)/g won't work either).

Can this numeric range regex be refactored?

I need to match a number range:
-9223372036854775808 to 9223372036854775807
^(?:922337203685477580[0-7]|9223372036854775[0-7]\d{2}|922337203685477[0-4]\d{3}|92233720368547[0-6]\d{4}|9223372036854[0-6]\d{5}|922337203685[0-3]\d{6}|92233720368[0-4]\d{7}|9223372036[0-7]\d{8}|922337203[0-5]\d{9}|92233720[0-2]\d{10}|922337[0-1]\d{12}|92233[0-6]\d{13}|9223[0-2]\d{14}|922[0-2]\d{15}|92[0-1]\d{16}|9[01]\d{17}|[1-8]\d{18}|\d{0,18}|-(?:922337203685477580[0-8]|9223372036854775[0-7]\d{2}|922337203685477[0-4]\d{3}|92233720368547[0-6]\d{4}|9223372036854[0-6]\d{5}|922337203685[0-3]\d{6}|92233720368[0-4]\d{7}|9223372036[0-7]\d{8}|922337203[0-5]\d{9}|92233720[0-2]\d{10}|922337[0-1]\d{12}|92233[0-6]\d{13}|9223[0-2]\d{14}|922[0-2]\d{15}|92[0-1]\d{16}|9[01]\d{17}|[1-8]\d{18}|\d{0,18}))?$
// space for easier copy and paste
Yes, I know it sounds crazy, but there's a long story behind this. I can't figure out how to do this in JavaScript by just checking a range, because of the size of the number, and this must be accurate.
Here's the thought process in breaking this thing down. I just started with the max number and worked my way down, then worked on the negative by just adding the - in the regex. You'll obviously have to copy and paste this thing somewhere to see it all. Also, could be mistakes. Made my head nearly explode.
9,223,372,036,854,775,807
922337203685477580[0-7]
9223372036854775[0-7][0-9]{2}
922337203685477[0-4][0-9]{3}
92233720368547[0-6][0-9]{4}
9223372036854[0-6][0-9]{5}
922337203685[0-3][0-9]{6}
92233720368[0-4][0-9]{7}
9223372036[0-7][0-9]{8}
922337203[0-5][0-9]{9}
92233720[0-2][0-9]{10}
922337[0-1][0-9]{12}
92233[0-6][0-9]{13}
9223[0-2][0-9]{14}
922[0-2][0-9]{15}
92[0-1][0-9]{16}
9[01][0-9]{17}
[1-8][0-9]{18}
[0-9]{0,18}
There's a single digit different in the negative vs. positive, so you'll see where I had to basically duplicate most of this.
So a few question:
Did I do this right?
If not, what's a better way?
Can this be done without regular expressions considering the size of the number? I need to validate client-side.
Can it be refactored and still retain strict rules?
Suggestions appreciated :)
Can this be done without regular expressions considering the size of the number?
It can be done in a series of if statements using only string operations (no need to convert to numbers).
all strings that don't match [0-9]{1,19} are out
all candidates that are of length 18 or less are good
for length 19 you can work with string comparison to see if they are numerically less than your upper limit
tweak the above to take care of negative numbers
Your regex is correct.
This is a shorter version
^(?:-9223372036854775808|-?(?:\d{0,18}|(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}))$
Regex demo
How to generate that regex without mistake:
Input max number:
9223372036854775807
Output:
9223372036854775807
922337203685477580
92233720368547758
9223372036854775
922337203685477
92233720368547
9223372036854
922337203685
92233720368
9223372036
922337203
92233720
9223372
922337
92233
9223
922
92
9
Replace last number letter
9->remove all line
8->9
7->[8-9]
6->[7-9]
5->[6-9]
4->[5-9]
3->[4-9]
2->[3-9]
1->[2-9]
0->[1-9]
Output:
922337203685477580[8-9]
92233720368547758[1-9]
92233720368547759
922337203685477[6-9]
92233720368547[8-9]
9223372036854[8-9]
922337203685[5-9]
92233720368[6-9]
92233720369
922337203[7-9]
92233720[4-9]
9223372[1-9]
922337[3-9]
92233[8-9]
9223[4-9]
922[4-9]
92[3-9]
9[3-9]
Regex [Output]
922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9]
Add these[output] to regex
(?!output)\d{19}
Will become [output2]
(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}
Matches \d{19} <= 9223372036854775807
Add
^(?:-9223372036854775808|-?(?:\d{0,18}|[output2]))$
^(?:-9223372036854775808|-?(?:\d{0,18}|(?!922337203685477580[8-9]|92233720368547758[1-9]|92233720368547759|922337203685477[6-9]|92233720368547[8-9]|9223372036854[8-9]|922337203685[5-9]|92233720368[6-9]|92233720369|922337203[7-9]|92233720[4-9]|9223372[1-9]|922337[3-9]|92233[8-9]|9223[4-9]|922[4-9]|92[3-9]|9[3-9])\d{19}))$
Will match
-9223372036854775808 or
+/- \d{0,18} or
+/- \d{19} <= 9223372036854775807
Demo

Performance of regex within jQuery data selector: dependance on certain string length

The setup: I have a div with a bunch of radio buttons, each of which has been associated with a custom attribute and value using $(element).data(attr_name,attr_value);. When an underlying data structure is changed, I iterate over the fields and set the appropriate buttons to checked:true by using the ':data' selector found here: https://stackoverflow.com/a/2895933/1214731
$($('#style-options').find(':radio').filter(':data('+key+'=='+value+')'))
.prop('checked',true).button('refresh');
This works great: it finds the appropriate elements, even with floating-point values.
Performance depends on value:
I noticed that when I clicked on certain buttons, the page took fractionally longer to respond (for most buttons there was no noticeable delay). Digging a little deeper, this seems to be occurring when certain floating point values are being searched for.
Using chrome dev tools, I logged the following:
> key='fill-opacity';
"fill-opacity"
> value=.2*2;
0.4
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 43.352ms undefined
> value=.2*3;
0.6000000000000001
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 10322.866ms undefined
The difference in speed is a factor of >200!
Next, I tried typing the number in manually (e.g. decimal place, six, 14x zeros, one) - same speed. All numbers with the same number of digits were the same speed. I then reduced the number of digits progressively:
# of digits time (ms)
16 10300
15 5185
14 2665
13 1314
12 673
11 359
10 202
9 116
8 77
7 60
6 50
5 41
4 39
I quickly ruled out the equality check between numeric and string - no dependence on string length there.
The regex execution is strongly dependent on string length
In the linked answer above, the regex that parses the data string is this:
var matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
The string passed in is of the form [name operator value]. The length of name doesn't seem to make much difference; the length of value has a big impact on speed however.
Specific questions:
1) Why does the length of name have minimal effect on performance, while the length of value has a large effect?
2) Doubling the execution time with each additional character in name seems excessive - is this just a characteristic of the particular regex the linked solution uses, or is it a more general feature?
3) How can I improve performance without sacrificing a lot of flexibility? I'd like to still be able to pass arguments as a single string to a jQuery selector so type checking up front seems difficult, though I'm open to suggestions.
Basic test code for regex matching speeds:
matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.1111111111111)}; console.timeEnd('regex')
regex: 538.018ms
//add an extra digit - doubles duration of test
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.11111111111111)}; console.timeEnd('regex')
regex: 1078.742ms
//add a bunch to the length of 'name' - minimal effect
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('xxxxxxxxxxxxxxxxxxxx=='+.11111111111111)}; console.timeEnd('regex')
regex: 1084.367ms
A characteristic of regexp matching is that they are greedy. If you try to match the expression a.*b to the string abcd, it will happen in these steps:
the first "a" will match
the .* will match the second char, then the third, till the end of the string
reaching the end of the string, there is still a "b" to matched, the matching will fail
the regexp processing starts to backtrack
the last char will be "unmatched" and it will try to match "b" to "d". Fails again. More backtracking
tries to match "b" to "c". Fail. Backtrack.
match "b" to "b". Success. Matching ends.
Although you matched just a few chars, you iterated all the string. If you have more than one greedy operator, you can easily get an input string that will match with exponential complexity.
Understanding backtracking will prevent a lot of errors and performance problems. For example, 'a.*b' will match all the string 'abbbbbbb', instead of just the first 'ab'.
The easiest way to prevent these kind of errors in modern regexp engines, is to use the non-greedy version of the operators * and +. They are usually represented by the same operators followed by a question mark: *? and +?.
I confess that I really didn't stop to debug the complicate regexp that you posted, but I believe that the problem is before matching the '=' symbol. The greedy operator is in this subexpression:
(?:\\\.|[^.,])+\.?)+
I'd try to change it to a non-greedy version:
(?:\\\.|[^.,])+\.?)+?
but this is really just a wild guess. I'm using pattern recognition to solve the problem :-) It makes sense because it is backtracking for each character of the "value" till matching the operator. The name is matched linearly.
This regular expression is just too complex for my taste. I love regular expressions, but it looks like this one matches this famous quotation:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

Categories

Resources