How to compare strings in javascript ignoring special characters

How to compare strings in javascript ignoring special characters - javascript

I'm developing an app and I've been asked to compare strings, but text in string have special characters (spanish accents like "á", "é", "í", "ó" and "ú")
I already manage capitalization with toUpperCase(), but still, I want to be sure that I have no problem with accents.
What I have to do is to compare some words already saved in system and check if used typed any of them.
What I do is store the typed words in an array, and then proceed to analyze them in another function (yet to be implemented)
This is my function where I store the words the user types (it may change to make it more complete):
function clickNewWord(){
var theWord = textField.value.toUpperCase();
ArrayWrittenWords.push(theWord);
textField.value = "";
}
PD: I'll take the opportunity to ask: What would be the correct coding to work with accents? UTF-8?

Although its an old question however, for the sake of future googlers here is the best way to remove accent from a string:
var string = 'á é í ó ú';
string.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
>a e i o u

You can convert them and then match them, let me if my example is clear :)
var stringInTheSystem = ['aaaa','bbbb'];// Array of string in your system;
var term = 'áaaa';// the word you want to compare it;
term = term.replace(/á/g, "a");
term = term.replace(/é/g, "e");
term = term.replace(/í/g, "i");
term = term.replace(/ó/g, "o");
term = term.replace(/ú/g, "u");
var matcher = new RegExp( term, "i" );
$.grep( stringInTheSystem, function( value ) {
value = value.test || value.value || value;
console.log(matcher.test( value ));
});

Related

Using Variable in Regex Character Set

i'm trying to use a variable (save) as a regex character set but keep getting null
function mutation(arr) {
var save = arr[1];
var rgx = /[save]/gi;
return arr[0].match(rgx).join('') == arr[0];
}
mutation(["Mary", "Army"]);
Goal of the function is to see if all the letters of arr[1] are contained in arr[0] by returning true or false. Function does work as i want it to when i manually put arr[1] into the character set (returns true in this situation), just cant get it to work with the variable.

Your exact current approach won't work (I think) due to it not being possible to build a regex pattern using /.../ notation with a variable. But, we can still use RegExp to build the pattern. For the sample data you showed us, here is a regex pattern which would work:
^(?!.*[^Mary]).*$`
In other words, we can assert, on the second string Army, that all its characters can be found in the first string Mary.
function mutation(arr) {
var save = arr[1];
var rgx = "^(?!.*[^" + save + "]).*$";
var re = new RegExp(rgx, "gi");
return re.test(arr[0]);
}
console.log(mutation(["Mary", "Army"]));
console.log(mutation(["Jon Skeet", "Tim Biegeleisen"]));

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??

var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );

If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.

Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/

You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

Correct sentence structure via javascript regular expressions

Below I have a sentance and desiredResult for the sentance. Using the pattern below I can snag the t T that needs to be changed to t, t but I don't know where to go further.
var sentence = "Over the candidate behaves the patent Then the doctor.";
var desiredResult = "Over the candidate behaves the patent, then the doctor.";
var pattern = /[a-z]\s[A-Z]/g;
I want to a correct sentence by adding comma and a space before a capital other than 'I' if the preceding letter is lowercase.

Use .replace() on your sentence and pass replacing function as second parameter
var corrected = sentence.replace(
/([a-z])\s([A-Z])/g,
function(m,s1,s2){ //arguments: whole match (t T), subgroup1 (t), subgroup2 (T)
return s1+', '+s2.toLowerCase();
}
);
As for preserving uppercased I, there are many ways, one of them:
var corrected = sentence.replace(
/([a-z])\s([A-Z])(.)/g,
function(m,s1,s2,s3){
return s1+((s2=='I' && /[^a-z]/i.test(s3))?(' '+s2):(', '+s2.toLowerCase()))+s3;
}
);
But there are more cases when it will fail, like: His name is Joe., WTF is an acronym for What a Terrible Failure. and many others.

Fastest / most efficient way to compare two string arrays Javascript

Hi I was wondering whether anyone could offer some advice on the fastest / most efficient way to compre two arrays of strings in javascript.
I am developing a kind of tag cloud type thing based on a users input - the input being in the form a written piece of text such as a blog article or the likes.
I therefore have an array that I keep of words to not include - is, a, the etc etc.
At the moment i am doing the following:
Remove all punctuation from the input string, tokenize it, compare each word to the exclude array and then remove any duplicates.
The comparisons are preformed by looping over each item in the exclude array for every word in the input text - this seems kind of brute force and is crashing internet explorer on arrays of more than a few hundred words.
i should also mention my exclude list has around 300 items.
Any help would really be appreciated.
Thanks

I'm not sure about the whole approach, but rather than building a huge array then iterating over it, why not put the "keys" into a map-"like" object for easier comparison?
e.g.
var excludes = {};//object
//set keys into the "map"
excludes['bad'] = true;
excludes['words'] = true;
excludes['exclude'] = true;
excludes['all'] = true;
excludes['these'] = true;
Then when you want to compare... just do
var wordsToTest = ['these','are','all','my','words','to','check','for'];
var checkWord;
for(var i=0;i<wordsToTest.length;i++){
checkWord = wordsToTest[i];
if(excludes[checkword]){
//bad word, ignore...
} else {
//good word... do something with it
}
}
allows these words through ['are','my','to','check','for']

It would be worth a try to combine the words into a single regex, and then compare with that. The regex engine's optimizations might allow the search to skip forward through the search text a lot more efficiently than you could do by iterating yourself over separate strings.

You could use a hashing function for strings (I don't know if JS has one but i'm sure uncle Google can help ;] ). Then you would calculate hashes for all the words in your exclude list and create an array af booleans indexed by those hashes. Then just iterate through the text and check the word hashes against that array.

I have taken scunliffe's answer and modified it as follows:
var excludes = ['bad','words','exclude','all','these']; //array
now lets prototype a function that checks if a value is inside an Array:
Array.prototype.hasValue= function(value) {
for (var i=0; i<this.length; i++)
if (this[i] === value) return true;
return false;
}
lets test some words:
var wordsToTest = ['these','are','all','my','words','to','check','for'];
var checkWord;
for(var i=0; i< wordsToTest.length; i++){
checkWord = wordsToTest[i];
if( excludes.hasValue(checkWord) ){
//is bad word
} else {
//is good word
console.log( checkWord );
}
}
output:
['are','my','to','check','for']

I'd opt for the regex version
text = 'This is a text that contains the words to delete. It has some <b>HTML</b> code in it, and punctuation!';
deleteWords = ['is', 'a', 'that', 'the', 'to', 'this', 'it', 'in', 'and', 'has'];
// clear punctuation and HTML code
onlyWordsReg = /\<[^>]*\>|\W/g;
onlyWordsText = text.replace(onlyWordsReg, ' ');
reg = new RegExp('\\b' + deleteWords.join('\\b|\\b') + '\\b', 'ig');
cleanText = onlyWordsText .replace(reg, '');
// tokenize after this

How do I split a string that contains different signs?

I want to split a string that can look like this: word1;word2;word3,word4,word5,word6.word7. etc.
The string is from an output that I get from a php page that collects data from a database so the string may look different but the first words are always separated with ; and then , and the last words with .(dot)
I want to be able to fetch all the words that ends with for example ; , or . into an array. Does someone know how to do this?
I would also like to know how to fetch only the words that ends with ;
The function ends_with(string, character) below works but it takes no regard to whitespace. For example if the word is Jonas Sand, it only prints Sand. Anybody knows how to fix this?

Probably
var string = "word1;word2;word3,word4,word5,word6.word7";
var array = string.split(/[;,.]/);
// array = ["word1", "word2", "word3", "word4", "word5", "word6", "word7"]
The key is in the regular expression passed to the String#split method. The character class operator [] allows the regular expression to select between the characters contained with it.
If you need to split on a string that contains more than one character, you can use the | to differentiate.
var array = string.split(/;|,|./) // same as above
Edit: Didn't thoroughly read the question. Something like this
var string = "word1;word2;word3,word4,word5,word6.word7";
function ends_with(string, character) {
var regexp = new RegExp('\\w+' + character, 'g');
var matches = string.match(regexp);
var replacer = new RegExp(character + '$');
return matches.map(function(ee) {
return ee.replace(replacer, '');
});
}
// ends_with(string, ';') => ["word1", "word2"]

var myString = word1;word2;word3,word4,word5,word6.word7;
var result = myString.split(/;|,|./);

Develop Reference

JavaScript is the programming language of the Web.

How to compare strings in javascript ignoring special characters - javascript

Although its an old question however, for the sake of future googlers here is the best way to remove accent from a string: var string = 'á é í ó ú'; string.normalize('NFD').replace(/[\u0300-\u036f]/g, ''); >a e i o u

Related

Using Variable in Regex Character Set

Regex match cookie value and remove hyphens

Correct sentence structure via javascript regular expressions

Fastest / most efficient way to compare two string arrays Javascript

How do I split a string that contains different signs?

Categories

Resources