How to match one string by ignoring special character - javascript

I am implementing a searching technique by autosuggest and solr.
I am getting a bunch of matching strings with special characters. For example, if I search for
jp
The returned strings include:
jp,
j p ,
j.p.,
j.p. nag,
j-p-naga
I need to highlight all strings contains "jp" by ignoring special characters.
expected output like :
"j-p-naga"---- "j-p" should highlight({span}j-p{/span}-naga

You want to eliminate special characters? Use regular expressions:
function removeSpecials(str) {
return str.replace(/[^a-zA-Z0-9]/g, '');
}
console.log('jp' === removeSpecials('j .$##p##'));
console.log('jp' === removeSpecials('j.p.,');
// Both true
Or maybe you want to check if a string contains the character j followed by p in any location?
function strHasJP(str) { return /[^j]*j[^p]*p.*/.test(str); }
I'm not sure what you are trying to do. For help on RegExp, go here https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions .

Related

javascript search string for sequential periods [duplicate]

This question already has answers here:
Regex to check consecutive occurrence of period symbol in username
(2 answers)
Closed 4 years ago.
How do I search a string for multiple periods in a row, for example "...".
string.search(/.../);
string.search("[...]");
The above will return true even if a single period is detected.
string.search("...");
This is even worse as it always returns true, even if there is no period.
var banlist = ["item1", "item2", "[...]"];
I am searching each string in the list, and I need to be able to search for "...".
Edit:
Now I have a new problem. three periods exactly "..." comes out to unicode character 8230, and it is not found in the search. Any other sequence of periods can be found except for three "...".
You need to escape the . character using a backslash \
let str = "som.e ra...ndom s.tring..";
console.log(str.match(/\.{2,}/g));
where {2,} indicates that there has to be at least 2 . characters in a row and /g indicates that the whole string should be evaluated. Notice that the . is escaped.
Use \\ to escape the dot in your search.
For example '\\.\\.' or '\\.{2}'
string.search() is useful for getting the index of the first occurrence of a pattern. It returns -1 if pattern isn't found.
string.search('\\.{2}')
With string.search going through the list would be something like
for (let i = 0; i < banlist.length; i++) {
if (banlist[i].search('\\.{2}') != -1) {
// Do something
}
}
If you don't need the index and just want to know if there are 2 or more dots in the string, string.match might be useful since it returns null when the pattern doesn't match.
string.match('\\.{2}')
With string.match going through the list would be something like
for (let i = 0; i < banlist.length; i++) {
if (banlist[i].match('\\.{2}')) {
// Do something
}
}

Formatting in Javascript

I have a question related to formatting strings.
User should parse a string in the Format XX:XX.
if the string parsed by user is in the format XX:XX i need to return true,
else false:
app.post('/test', (req, res) => {
if (req.body.time is in the format of XX:XX) {
return true
} else {
return false
}
});
You can use the RegExp.test function for this kind of thing.
Here is an example:
var condition = /^[a-zA-Z]{2}:[a-zA-Z]{2}$/.test("XX:XX");
console.log("Condition: ", condition);
The regex that I've used in this case check if the string is composed from two upper or lower case letters fallowed by a colon and other two such letters.
Based on your edits it seems that you're trying to check if a string represents an hour and minute value, if that is the case, a regex like this will be more appropriate /^\d{2}:\d{2}$/. This regex checks if the string is composed of 2 numbers fallowed by a colon and another 2 numbers.
The tool you're looking for is called Regular Expressions.
It is globally supported in almost every development platform, which makes it extremely convenient to use.
I would recommend this website for working out your regular expressions.
/^[a-zA-Z]{2}:[a-zA-Z]{2}&/g is an example of a Regular Expression that will take any pattern of:
[a-zA-Z]{2} - two characters from the sets a-z and A-Z.
Followed by :
Followed by the same first argument. Essentially, validating the pattern XX:XX. Of course, you can manipulate it as to what you want to allow for X.
^ marks the beginning of a string and $ marks the end of it, so ASD:AS would not work even though it contains the described pattern.
try using regex
var str = "12:aa";
var patt = new RegExp("^([a-zA-Z]|[0-9]){2}:([a-zA-Z]|[0-9]){2}$");
var res = patt.test(str);
if(res){ //if true
//do something
}
else{}

How to allow only certain words consecutively with Regex in javascript

I'm trying to write a regex that will return true if it matches the format below, otherwise, it should return false. It should only allow words as below:
Positive match (return true)
UA-1234-1,UA-12345-2,UA-34578-2
Negative match (return false or null)
Note: A is missing after U
UA-1234-1,U-12345-2
It should always give me true when the string passed to regex is
UA-1234-1,UA-12345-2,UA-34578-2,...........
Below is what I am trying to do but it is matching only the first element and not returning null.
var pattern=/^UA-[0-9]+(-[0-9]+)?/g;
pattern.match("UA-1234-1,UA-12345-2,UA-34578-2");
pattern.exec("UA-1234-1,UA-12345-2,UA-34578-2)
Thanks in advance. Help is greatly appreciated.
The pattern you need is a pattern enclosed with anchors (^ - start of string and $ - end of string) that matches your pattern at first (the initial "block") and then matches 0 or more occurrences of a , followed with the block pattern.
It looks like /^BLOCK(?:,BLOCK)*$/. You may introduce optional whitespaces in between, e.g. /^BLOCK(?:,\s*BLOCK)*$/.
In the end, the pattern looks like ^UA-[0-9]+(?:-[0-9]+)?(?:,UA-[0-9]+(?:-[0-9]+)?)*$. It is best to build it dynamically to keep it readable and easy to maintain:
const block = "UA-[0-9]+(?:-[0-9]+)?";
let rx = new RegExp(`^${block}(?:,${block})*$`); // RegExp("^" + block + "(?:," + block + ")*$") // for non-ES6
let tests = ['UA-1234-1,UA-12345-2,UA-34578-2', 'UA-1234-1,U-12345-2'];
for (var s of tests) {
console.log(s, "=>", rx.test(s));
}
split the string by commas, and test each element instead.

Test if a sentence is matching a text declaration using regex

I want to test if a sentence like type var1,var2,var3 is matching a text declaration or not.
So, I used the following code :
var text = "int a1,a2,a3",
reg = /int ((([a-z_A-Z]+[0-9]*),)+)$/g;
if (reg.test(text)) console.log(true);
else console.log(false)
The problem is that this regular expression returns false on text that is supposed to be true.
Could someone help me find a good regular expression matching expressions as in the example above?
You have a couple of mistekes.
As you wrote, the last coma is required at the end of the line.
I suppose you also want to match int abc123 as correct string, so you need to include letter to other characters
Avoid using capturing groups for just testing strings.
const str = 'int a1,a2,a3';
const regex = /int (?:[a-zA-Z_](?:[a-zA-Z0-9_])*(?:\,|$))+/g
console.log(regex.test(str));
You will need to add ? after the comma ,.
This token ? matches between zero and one.
Notice that the last number in your text a3 does not have , afterward.
int ((([a-z_A-Z]+[0-9]*),?)+)$

Remove all special characters with RegExp

I would like a RegExp that will remove all special characters from a string. I am trying something like this but it doesn’t work in IE7, though it works in Firefox.
var specialChars = "!##$^&%*()+=-[]\/{}|:<>?,.";
for (var i = 0; i < specialChars.length; i++) {
stringToReplace = stringToReplace.replace(new RegExp("\\" + specialChars[i], "gi"), "");
}
A detailed description of the RegExp would be helpful as well.
var desired = stringToReplace.replace(/[^\w\s]/gi, '')
As was mentioned in the comments it's easier to do this as a whitelist - replace the characters which aren't in your safelist.
The caret (^) character is the negation of the set [...], gi say global and case-insensitive (the latter is a bit redundant but I wanted to mention it) and the safelist in this example is digits, word characters, underscores (\w) and whitespace (\s).
Note that if you still want to exclude a set, including things like slashes and special characters you can do the following:
var outString = sourceString.replace(/[`~!##$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');
take special note that in order to also include the "minus" character, you need to escape it with a backslash like the latter group. if you don't it will also select 0-9 which is probably undesired.
Plain Javascript regex does not handle Unicode letters.
Do not use [^\w\s], this will remove letters with accents (like àèéìòù), not to mention to Cyrillic or Chinese, letters coming from such languages will be completed removed.
You really don't want remove these letters together with all the special characters. You have two chances:
Add in your regex all the special characters you don't want remove, for example: [^èéòàùì\w\s].
Have a look at xregexp.com. XRegExp adds base support for Unicode matching via the \p{...} syntax.
var str = "Їжак::: résd,$%& adùf"
var search = XRegExp('([^?<first>\\pL ]+)');
var res = XRegExp.replace(str, search, '',"all");
console.log(res); // returns "Їжак::: resd,adf"
console.log(str.replace(/[^\w\s]/gi, '') ); // returns " rsd adf"
console.log(str.replace(/[^\wèéòàùì\s]/gi, '') ); // returns " résd adùf"
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.js"></script>
using \W or [a-z0-9] regex won't work for non english languages like chinese etc.,
It's better to use all special characters in regex and exclude them from given string
str.replace(/[~`!##$%^&*()+={}\[\];:\'\"<>.,\/\\\?-_]/g, '');
The first solution does not work for any UTF-8 alphabet. (It will cut text such as Їжак). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.
function removeSpecials(str) {
var lower = str.toLowerCase();
var upper = str.toUpperCase();
var res = "";
for(var i=0; i<lower.length; ++i) {
if(lower[i] != upper[i] || lower[i].trim() === '')
res += str[i];
}
return res;
}
Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.
Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).
I use RegexBuddy for debbuging my regexes it has almost all languages very usefull. Than copy/paste for the targeted language.
Terrific tool and not very expensive.
So I copy/pasted your regex and your issue is that [,] are special characters in regex, so you need to escape them. So the regex should be : /!##$^&%*()+=-[\x5B\x5D]\/{}|:<>?,./im
str.replace(/\s|[0-9_]|\W|[#$%^&*()]/g, "") I did sth like this.
But there is some people who did it much easier like str.replace(/\W_/g,"");
#Seagull anwser (https://stackoverflow.com/a/26482552/4556619)
looks good but you get undefined string in result when there are some special (turkish) characters. See example below.
let str="bənövşəyi 😟пурпурный İdÖĞ";
i slightly improve it and patch with undefined check.
function removeSpecials(str) {
let lower = str.toLowerCase();
let upper = str.toUpperCase();
let res = "",i=0,n=lower.length,t;
for(i; i<n; ++i) {
if(lower[i] !== upper[i] || lower[i].trim() === ''){
t=str[i];
if(t!==undefined){
res +=t;
}
}
}
return res;
}
text.replace(/[`~!##$%^*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');
why dont you do something like:
re = /^[a-z0-9 ]$/i;
var isValid = re.test(yourInput);
to check if your input contain any special char

Categories

Resources