in javascript checking items in a string variable against an English dictionary

in javascript checking items in a string variable against an English dictionary - javascript

I have the function below that selects randomly letters from an array of all English letters (plus space, plus <br/>) and stores them in a string variable.
The function has a loop that generates 2,500 random characters. I would like to check which of the sequences of letters between two spaces (i.e. " " or a space and a return (" " or <br/>) or two <br/> constitute legitimate English words.
How do I do this? In particular,
do I need to download an English dictionary
how do I compare all of my strings of characters against it
retain the words that are legitimate? Here is the code of the
function.
JS
function statement() {
var letters = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", " ", "<br/>"];
for (var i = 0; i <= 2500; i++) {
var random_letter = Math.floor(Math.random() * letters.length);
result += letters[random_letter];
}
document.getElementById("random1").innerHTML += result;
}

At first: you need a wordlist. If you work with a Unix OS you most probably have a medium sized wordlist in /usr/share/dict/words but you'll find many on the net, one short search gave http://www-01.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt with about 100k words. These are already sorted so you can just put them in an Array word by word. Because it is sorted you can search it with a fast binary search.
You look for words between spaces and linebreaks but it is a bit ambiguous if e.g. word<br/>word2 word3<br/> counts as word, word2, word3 or word, word2 word3 so word2 word3 would be a kind of sentence? I assume the first variation: single words, no sentences from now on for simplicity.
In the loop that produces the string, I would check every single generated character if it is space/linebreak, take the string generated so far and check that against the dictionary.
If you insist to work on the final generated string only you have to split it into single words, maybe with a regex because it is simple here:
var s = ("word1 word2<br/> <br/><br/> word3 <br/>").replace(/<br[/]>/g,"").split(/[\s]+/);
s.join(",");
//word1,word2,word3,
// the ^ last one is empty
Then loop over the array and check every word you found against the dictionary. Because the last entry in my poor version of splitting is empty and others might be empty, too (if you played around with it), you need to check every one if it is has something in it.
Instead of doing the dictionary search with a simple array and a binary search you can leave that hassle to the JavaScript engine by (ab)using an {key:value} object. (this example is sorted because of C&P from an already sorted wordlist, it does not needed to be sorted)
dict = {Banach:0,Bancroft:0,Bandung:0,"Banga lore":0,Bangkok:0,Bangladesh:0,
Bangladeshi:0,Bangor:0,Bangui:0,Banjarmasin:0,Banjul:0,Banks:0,
Banneker:0,Deleon:0,Delgado:0,Delhi:0,Delia:0,Delibes:0,Delicious:0,
Delilah:0,Delius:0,Dell:0,Dell:0,Della:0,Delmar:0,Delmarva:0,Delmer:0,
Delmonico:0,Delores:0,Deloris:0,Hanukkahs:0,Hapsburg:0,Harare:0,
Harbin:0,Hardin:0,Harding:0,Hardy:0,Hargreaves:0,Harlan:0,Harlem:0,
Harlequin:0,Harley:0,Harlow:0,Harmon:0};
dict.hasOwnProperty("Delores"); // true
dict.hasOwnProperty("foo"); // false
You use the key as the value here. The actual value is wasted in this case but you can use them for further refinement, e.g.: mark nouns, verbs, adjectives, etc. You are not restricted to numbers, of course, you can use everything, even the complete dictionary entry of that word (with pictures and music even, but that is another story).

this is working code implemented with node js
1) Yes, you do have to download the words, I get them from a url using the request module.
You basically have to turn the list of words from that website into an array by splitting it with a regualr expression, and also split your random generated strings with another regex. Then, you do a for loop to go through each of them, one at a time, comparing each real word to each generated word. Like this:
var request = require("request");
request("http://www.math.sjsu.edu/~foster/dictionary.txt", function(err,status, resp){
var letters = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", " ", "<br/>"];
letters = letters.map(function(letter){ return letter.toLowerCase()});
//console.log(wordsArr);
var result = "";
var c = 0;
for (var i = 0; i <= 10000; i++) {
var random_letter = Math.floor(Math.random() * letters.length);
result += letters[random_letter];
c++;
}
if(c >= 2500){
var randomResultArr = result.split(/<br\/>|<br\/><br\/>|\s/);
console.log(randomResultArr);
var matchesArr = getWordMatches(wordsArr = resp.split("\n"), result.split(/<br\/>|<br\/><br\/>|\s/));
console.log("MATCHES: " + matchesArr);
}
});
function getWordMatches(wordsArr, resultArr){
var matchesArr = [];
console.log("WORDSARR ", wordsArr.length);
console.log("RES ARR ", resultArr.length);
for(var i = 0; i < wordsArr.length; i++ ){
for(var x = 0; x < resultArr.length; x++){
if( (wordsArr[i] === resultArr[x]) && wordsArr[i] !== "" ){
matchesArr.push(wordsArr[i]);
console.log("WORD MATCH : " + wordsArr[i]);
}
}
}
return matchesArr;
}
here is a sample output of it, I had to up the 2500 to 10000 to get more matches:
WORDSARR 349901
RES ARR 747
WORD MATCH : giros
WORD MATCH : goad
WORD MATCH : kaqa
WORD MATCH : lome
WORD MATCH : mibs
MATCHES: giros,goad,kaqa,lome,mibs

Related

What am I missing? Lambda Challenge

I'm trying to do the Lambda challenge and the output is what its asking but its still not correct?
The problem is:
"Modify the function to transform the given string into an array where each character in the array is in the same index as it was in the string, and return it."
function convertStringToArray(s) {
var output = Array.from("hello");
return output;
}
/* Do not modify code below this line */
const exampleString = 'hello';
const stringAsArray = convertStringToArray(exampleString);
console.log(stringAsArray, '<-- should be ["h", "e", "l", "l", "o"]');
Output
["h", "e", "l", "l", "o"] <-- should be ["h", "e", "l", "l", "o"]
I did exactly what it wanted so why am I stuck?

Your code takes an argument of s - which it ignores.
I highly suspect that the lambda challenge is passing different values to your function (only using "hello" as an example) and your code is failing because it only ever returns ["h", "e", "l", "l", "o"] even when s is "world".
Try using s instead of "hello" on the first line of your function.

Picking a random index from an array is always returning me the same index?

var letters = ["a", "b", "c", "d", "e", "f", "g", "h"]
var letter = letter[Math.round(Math.random()*(quotes.length))]
Every time it just returns the last letter, g, not a random one from the array.
What am I doing wrong?

(quotes.length)
What is quotes? You want letters.
You're doing var letter = letter, but letter hasn't been defined yet. Also, when choosing a random element from an array, use Math.floor instead of Math.round:
const letters = ["a", "b", "c", "d", "e", "f", "g", "h"];
const letter = letters[Math.floor(Math.random()*letters.length)];
console.log(letter);

Probably need to fix a few typos in your code:
var letters = ["a", "b", "c", "d", "e", "f", "g", "h"]
var letter = letters[Math.round(Math.random()*(letters.length))]
Also, you may not want to use Math.round since it can cause an out-of-bound array access error. You should try Math.floor instead.

Fix the typos and you should be fine
var letter = letters[Math.round(Math.random()*(letters.length))]

split all character using javascript except '\t','\b','\n'

var str = "\tToday we are going to\b a farm to pick\r" i want to";
str.split('');
var str = "\tToday we are going to\b a farm to pick\r";
var result = str.split('');
console.log(result);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="result"></div>
But in result '\t', '\r','\b' replacing by symbol, like for '\t' its taking big space.
Help to get as array like
[ "\t", "T", "o", "d", "a", "y", " ", "w", "e", " ", "a", "r", "e", " ", "g", "o", "i", "n", "g", " ", "t", "o", "\b", " ", "a", " ", "f", "a", "r", "m", " ", "t", "o", " ", "p", "i", "c", "k", "\r"]
Thanks

But in result '\t', '\r','\b' replacing by symbol, like for '\t' its taking big space
If you're outputting this to a terminal console, that's just because \t is a tab, and the console output is outputting a tab, which the terminal is interpreting as a number of spaces. Similar (but different, and possibly more surprising) things will happen with \r and \b.
Help to get as array like...
That's what you're getting, it's just that when you output to the console, the output is interpreted by the terminal.
If you want to see actual backslashes, you could run the entries through JSON.stringify, but I suspect you don't. I think you probably want the array you already have.

Javascript Split difference

Javascript
var sitename="Welcome to JavaScript Kit"
var words=sitename.split(" ") //split using blank space as delimiter
for (var i=0; i<words.length; i++)
alert(words[i])
//4 alerts: "Welcome", "to", "JavaScript", and "Kit"
And
var sitename="Welcome to JavaScript Kit"
var words=sitename.split("") //split using blank space as delimiter
for (var i=0; i<words.length; i++)
alert(words[i])
//6 alerts: "W", "e", "l", "c","o","m"
What is the difference between
var words=sitename.split(" ");
And
var words=sitename.split("");
Here, what is the difference between two splits.

var sitename="Welcome to JavaScript Kit"
var words=sitename.split("") //split using blank space as delimiter
for (var i=0; i<words.length; i++)
alert(words[i])
//6 alerts: "W", "e", "l", "c","o","m"
It wont stop on just m it will have many more alerts after that.
every word will be alerted till "K" "I" "T" http://jsfiddle.net/zwJJN/
var words=sitename.split("") //split using blank space as delimiter
var words=sitename.split(" ") //split using white space space as delimiter
When we use split the whole string is searched for the delimiter given and is splitted on the basis of that
var words=sitename.split("")// every character is splitted.
var words=sitename.split(" ")// every words is splitted having white space before it.

var words=sitename.split(" ");
This code is split by the blank space
var words=sitename.split("");
But here you didnt given anything so it will be split the char's

var words=sitename.split(" ");
this will split around space character
var words=sitename.split("");
this will split around each character
I ran the script and in my browser it is working fine, i get all alerts till the end 't'. may be your browser is not allowing the webpage to generate any more dialogs

I'm guessing your browser is preventing the alerts from spamming
Don't use alert inspect the .slice result. Use something like console.log to get a better look
console.log("Welcome to JavaScript Kit".split(""));
// ["W", "e", "l", "c", "o", "m", "e", " ", "t", "o", " ", "J", "a", "v", "a", "S", "c", "r", "i", "p", "t", " ", "K", "i", "t"]
And
console.log("Welcome to JavaScript Kit".split(" "));
// ["Welcome", "to", "JavaScript", "Kit"]

var words=sitename.split(" ");
This one split the words using the space Welcometo
var words=sitename.split("");
This one split the words using the character. i.e. Separate each charater, including white-space
Ref: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

How to split words using javascript

This might be a simple question but, how do i split words... for example
a = "even, test"
I have used .split to seperate the text with space.
so the result came is like
a = "even,"
b = "test"
But, how do I remove the 'comma' here?
But in some conditions it might get "even test" and in some conditions i might get "even, test". All are dynamic, so how do i check it for both?
Thanks

Firstly, the split() function is not jQuery - it is pure Javascript.
Did you try doing split with a comma and a space? That would have worked just fine in your case:
var result = input.split(', ');
For more complex splits, you can use regular expression pattern matching to allow multiple commas or spaces between the two fields:
var result = input.split(/[, ]+/);
but you probably don't need to go that far in your case.

I think is better to use something like this:
text.match(/[a-z'\-]+/gi);
Example:
var e=function()
{
var r=document.getElementById('r');
var x=document.getElementById('t').value.match(/[a-z'\-]+/gi);
for(var i=0;i<x.length;i++)
{
var li=document.createElement('li');
li.innerText=x[i];
r.appendChild(li);
}
}
<div style="float:right;width:18%">
<ol id="r" style="display:block;width:auto;border:1px inner;overflow:scroll;height:8em;max-height:10em;"></ol>
<button onclick="e()">Extract words!</button>
</div>
<textarea id="t" style="width:70%;height:12em">even, test; spider-man
But saying o'er what I have said before:
My child is yet a stranger in the world;
She hath not seen the change of fourteen years,
Let two more summers wither in their pride,
Ere we may think her ripe to be a bride.
—Shakespeare, William. The Tragedy of Romeo and Juliet</textarea>

I found a list of word separators in Sublime Text default settings.
Here's how to split with it, with some Unicode support (the defined separators are not Unicode though):
{ // word_separators: ./\()"'-,;<>~!##$%^&*|+=[]{}`~?: (32)
function splitByWords(str = '', limit = undefined) {
return str.split(/[-./\\()"',;<>~!##$%^&*|+=[\]{}`~?:]/u, limit)
}
function reverseString(str) {
let newString = ''
for (let i = str.length - 1; i >= 0; i--)
newString += str[i]
return newString
}
const str = '123.x/x\\x(x)x"x\'x-x:x,789;x<x>x~x!x#x#x$x%x^x&x*x|x+x=x[x]x{x}x`x~x?456'
console.log(splitByWords(str)) // (33) ["123", "x", "x", "x", "x", "x", "x", "x", "x", "x", "789", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "456"]
console.log(splitByWords(str, 1)) // ["123"]
console.log(splitByWords(reverseString(str), 1)) // ["654"]
}
For some reason the - has to be at the beginning, and the : at the end.
Edit: you might want to add \s (after the -) to count whitespace as separator

Just use this code:
var a = "even, test";
var words = a.split(", ");

a.split(',')
or
var re = /\s*,\s*/
var newA = a.split(re);

I think you could do it like this:
var a= 'even,'
var newA = a.slice(0, -1)
This will remove the last char from a given string.
And to check if the string contains a comma, I would do the following:
if (a.indexOf(",") >= 0){
//contains a comma
} else {
//no comma
}
I am a javascript beginner, so this probably is not the best way to do it, but nevertheless it works.

Hej Harry
if the comma is the separator you can call split with the comma
Ex:
var newColls = myString.split(",");
and not split with space.
GL

Develop Reference

JavaScript is the programming language of the Web.

in javascript checking items in a string variable against an English dictionary - javascript

Related

What am I missing? Lambda Challenge

Picking a random index from an array is always returning me the same index?

split all character using javascript except '\t','\b','\n'

Javascript Split difference

How to split words using javascript

Categories

Resources