Regular expression for validating with specific characters through javascript

Regular expression for validating with specific characters through javascript - javascript

I want to validate a string that does not allows the following characters.
<,>,:,","/,\,|,?,*,#
I want to validate this through JavaScript.
I was trying this with the following code.
var reg = /[^a-zA-Z0-9 \-_]+/;
reg.test(filename[0])
But this was unable to validating the symbol #.
Please help.

The problem you have is that you included the hyphen in the middle of the pattern without escaping it. This tells the engine that you are expecting a range--in this case space through underscore. It's easier (in my opinion) to place the hyphen as either the first or last character in the pattern, at which point you don't have to escape it. (It would be the second character if you are using a negated character class.)
e.g.
var reg = /[^a-zA-Z0-9 \-_]+/;
--OR--
var reg = /[^a-zA-Z0-9 _-]+/;
--OR--
var reg = /[^-a-zA-Z0-9 _]+/;

Do you only want to allow English letters a-z (and A-Z), numbers, the space, '_', and '-'? If so, that is different than disallowing the characters you specified since '☃' doesn't have the characters you provided but may not be a valid string in your use case.
In the case you just want the English alphabet, numbers, space, '_', and '-', you can use the following RegExp and conditional:
var reg = /^[a-zA-Z0-9 \-_]+$/;
if (reg.test(filename[0])) {
// String is ok
}
This says everything in the string between beginning (^) and end ($) must be one or more of the allowed characters.
If you want to disallow the characters you provided in your question, you can use:
var reg = /[\<\>\:\,\/\\\|\?\*\#]/;
if (!reg.test(filename[0])) {
// String is ok
}
This says to search for any of the characters you've listed (they are all escaped with a \ before them) and if you find any, the string is invalid. So only if the test fails is the string a valid string - that's why there's a ! before the test.

string sourceString ="something" ;
var outString = sourceString.replace(/[`~!##$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');

Related

regex custom lenght but no whitespace allowed [duplicate]

I have a username field in my form. I want to not allow spaces anywhere in the string. I have used this regex:
var regexp = /^\S/;
This works for me if there are spaces between the characters. That is if username is ABC DEF. It doesn't work if a space is in the beginning, e.g. <space><space>ABC. What should the regex be?

While you have specified the start anchor and the first letter, you have not done anything for the rest of the string. You seem to want repetition of that character class until the end of the string:
var regexp = /^\S*$/; // a string consisting only of non-whitespaces

Use + plus sign (Match one or more of the previous items),
var regexp = /^\S+$/

If you're using some plugin which takes string and use construct Regex to create Regex Object i:e new RegExp()
Than Below string will work
'^\\S*$'
It's same regex #Bergi mentioned just the string version for new RegExp constructor

This will help to find the spaces in the beginning, middle and ending:
var regexp = /\s/g

This one will only match the input field or string if there are no spaces. If there are any spaces, it will not match at all.
/^([A-z0-9!##$%^&*().,<>{}[\]<>?_=+\-|;:\'\"\/])*[^\s]\1*$/
Matches from the beginning of the line to the end. Accepts alphanumeric characters, numbers, and most special characters.
If you want just alphanumeric characters then change what is in the [] like so:
/^([A-z])*[^\s]\1*$/

\b regex special character seems not working for Cyrillic in javascript [duplicate]

I am building search and I am going to use javascript autocomplete with it. I am from Finland (finnish language) so I have to deal with some special characters like ä, ö and å
When user types text in to the search input field I try to match the text to data.
Here is simple example that is not working correctly if user types for example "ää". Same thing with "äl"
var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";
// does not work
//var searchterm = "ää";
// Works
//var searchterm = "wi";
if ( new RegExp("\\b"+searchterm, "gi").test(title) ) {
$("#result").html("Match: ("+searchterm+"): "+title);
} else {
$("#result").html("nothing found with term: "+searchterm);
}
http://jsfiddle.net/7TsxB/
So how can I get those ä,ö and å characters to work with javascript regex?
I think I should use unicode codes but how should I do that? Codes for those characters are:
[\u00C4,\u00E4,\u00C5,\u00E5,\u00D6,\u00F6]
=> äÄåÅöÖ

There appears to be a problem with Regex and the word boundary \b matching the beginning of a string with a starting character out of the normal 256 byte range.
Instead of using \b, try using (?:^|\\s)
var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";
// does not work
//var searchterm = "ää";
// Works
//var searchterm = "wi";
if ( new RegExp("(?:^|\\s)"+searchterm, "gi").test(title) ) {
$("#result").html("Match: ("+searchterm+"): "+title);
} else {
$("#result").html("nothing found with term: "+searchterm);
}
Breakdown:
(?: parenthesis () form a capture group in Regex. Parenthesis started with a question mark and colon ?: form a non-capturing group. They just group the terms together
^ the caret symbol matches the beginning of a string
| the bar is the "or" operator.
\s matches whitespace (appears as \\s in the string because we have to escape the backslash)
) closes the group
So instead of using \b, which matches word boundaries and doesn't work for unicode characters, we use a non-capturing group which matches the beginning of a string OR whitespace.

The \b character class in JavaScript RegEx is really only useful with simple ASCII encoding. \b is a shortcut code for the boundary between \w and \W sets or \w and the beginning or end of the string. These character sets only take into account ASCII "word" characters, where \w is equal to [a-zA-Z0-9_] and \W is the negation of that class.
This makes the RegEx character classes largely useless for dealing with any real language.
\s should work for what you want to do, provided that search terms are only delimited by whitespace.

this question is old, but I think I found a better solution for boundary in regular expressions with unicode letters.
Using XRegExp library you can implement a valid \b boundary expanding this
XRegExp('(?=^|$|[^\\p{L}])')
the result is a 4000+ char long, but it seems to work quite performing.
Some explanation: (?= ) is a zero-length lookahead that looks for a begin or end boundary or a non-letter unicode character. The most important think is the lookahead, because the \b doesn't capture anything: it is simply true or false.

\b is a shortcut for the transition between a letter and a non-letter character, or vice-versa.
Updating and improving on max_masseti's answer:
With the introduction of the /u modifier for RegExs in ES2018, you can now use \p{L} to represent any unicode letter, and \P{L} (notice the uppercase P) to represent anything but.
EDIT: Previous version was incomplete.
As such:
const text = 'A Fé, o Império, e as terras viciosas';
text.split(/(?<=\p{L})(?=\P{L})|(?<=\P{L})(?=\p{L})/);
// ['A', ' Fé', ',', ' o', ' Império', ',', ' e', ' as', ' terras', ' viciosas']
We're using a lookbehind (?<=...) to find a letter and a lookahead (?=...) to find a non-letter, or vice versa.

I would recommend you to use XRegExp when you have to work with a specific set of characters from Unicode, the author of this library mapped all kind of regional sets of characters making the work with different languages easier.

Despite the fact the issue seems to be 8 years old, I run into a similar problem (I had to match Cyrillic letters) not so far ago. I spend a whole day on this and could not find any appropriate answer here on StackOverflow. So, to avoid others making lots of effort, I'd like to share my solution.
Yes, \b word boundary works only with Latin letters (Word boundary: \b):
Word boundary \b doesn’t work for non-Latin alphabets
The word boundary test \b checks that there should be \w on the one side from the position and "not \w" – on the other side.
But \w means a Latin letter a-z (or a digit or an underscore), so the test doesn’t work for other characters, e.g. Cyrillic letters or hieroglyphs.
Yes, JavaScript RegExp implementation hardly supports UTF-8 encoding.
So, I tried implementing own word boundary feature with the support of non-Latin characters. To make word boundary work just with Cyrillic characters I created such regular expression:
new RegExp(`(?<![\u0400-\u04ff])${cyrillicSearchValue}(?![\u0400-\u04ff])`,'gi')
Where \u0400-\u04ff is a range of Cyrillic characters provided in the table of codes. It is not an ideal solution, however, it works properly in most cases.
To make it work in your case, you just have to pick up an appropriate range of codes from the list of Unicode characters.
To try out my example run the code snippet below.
function getMatchExpression(cyrillicSearchValue) {
return new RegExp(
`(?<![\u0400-\u04ff])${cyrillicSearchValue}(?![\u0400-\u04ff])`,
'gi',
);
}
const sentence = 'Будь-який текст кирилицею, де необхідно знайти слово з контексту';
console.log(sentence.match(getMatchExpression('текст')));
// expected output: ["текст"]
console.log(sentence.match(getMatchExpression('но')));
// expected output: null

I noticed something really weird with \b when using Unicode:
/\bo/.test("pop"); // false (obviously)
/\bä/.test("päp"); // true (what..?)
/\Bo/.test("pop"); // true
/\Bä/.test("päp"); // false (what..?)
It appears that meaning of \b and \B are reversed, but only when used with non-ASCII Unicode? There might be something deeper going on here, but I'm not sure what it is.
In any case, it seems that the word boundary is the issue, not the Unicode characters themselves. Perhaps you should just replace \b with (^|[\s\\/-_&]), as that seems to work correctly. (Make your list of symbols more comprehensive than mine, though.)

My idea is to search with codes representing the Finnish letters
new RegExp("\\b"+asciiOnly(searchterm), "gi").test(asciiOnly(title))
My original idea was to use plain encodeURI but the % sign seemed to interfere with the regexp.
http://jsfiddle.net/7TsxB/5/
I wrote a crude function using encodeURI to encode every character with code over 128 but removing its % and adding 'QQ' in the beginning. It is not the best marker but I couldn't get non alphanumeric to work.

What you are looking for is the Unicode word boundaries standard:
http://unicode.org/reports/tr29/tr29-9.html#Word_Boundaries
There is a JavaScript implementation here (unciodejs.wordbreak.js)
https://github.com/wikimedia/unicodejs

I had a similar problem, where I was trying to replace all of a particular unicode word with a different unicode word, and I cannot use lookbehind because it's not supported in the JS engine this code will be used in. I ultimately resolved it like this:
const needle = "КАРТОПЛЯ";
const replace = "БАРАБОЛЯ";
const regex = new RegExp(
String.raw`(^|[^\n\p{L}])`
+ needle
+ String.raw`(?=$|\P{L})`,
"gimu",
);
const result = (
'КАРТОПЛЯ сдффКАРТОПЛЯдадф КАРТОПЛЯ КАРТОПЛЯ КАРТОПЛЯ??? !!!КАРТОПЛЯ ;!;!КАРТОПЛЯ/#?#?'
+ '\n\nКАРТОПЛЯ КАРТОПЛЯ - - -КАРТОПЛЯ--'
)
.replace(regex, function (match, ...args) {
return args[0] + replace;
});
console.log(result)
output:
БАРАБОЛЯ сдффКАРТОПЛЯдадф БАРАБОЛЯ БАРАБОЛЯ БАРАБОЛЯ??? !!!БАРАБОЛЯ ;!;!БАРАБОЛЯ/#?#?
БАРАБОЛЯ БАРАБОЛЯ - - -БАРАБОЛЯ--
Breaking it apart
The first regex: (^|[^\n\p{L}])
^| = Start of the line or
[^\n\p{L}] = Any character which is not a letter or a newline
The second regex: (?=$|\P{L})
?= = Lookahead
$| = End of the line or
\P{L} = Any character which is not a letter
The first regex captures the group and is then used via args[0] to put it back into the string during replacement, thereby avoiding a lookbehind. The second regex utilized lookahead.
Note that the second one MUST be a lookahead because if we capture it then overlapping regex matches will not trigger (e.g. КАРТОПЛЯ КАРТОПЛЯ КАРТОПЛЯ would only match on the 1st and 3rd ones).

Trying to find text "myTest":
/(?<![\p{L}\p{N}_])myTest(?![\p{L}\p{N}_])/gu
Similar to NetBeans or Notepad++ form. Trying to find the expression without any letter or number or underscore (like \w characters of word boundary \b) in any unicode characters of letter and number before or after the expression.

I have had a similar problem, but I had to replace an array of terms. All solutions, which I have found did not worked, if two terms were in the text next to each other (because their boundaries overlaped). So I had to use a little modified approach:
var text = "Ještě. že; \"už\" à. Fürs, 'anlässlich' že že že.";
var terms = ["à","anlässlich","Fürs","už","Ještě", "že"];
var replaced = [];
var order = 0;
for (i = 0; i < terms.length; i++) {
terms[i] = "(^\|[ \n\r\t.,;'\"\+!?-])(" + terms[i] + ")([ \n\r\t.,;'\"\+!?-]+\|$)";
}
var re = new RegExp(terms.join("|"), "");
while (true) {
var replacedString = "";
text = text.replace(re, function replacer(match){
var beginning = match.match("^[ \n\r\t.,;'\"\+!?-]+");
if (beginning == null) beginning = "";
var ending = match.match("[ \n\r\t.,;'\"\+!?-]+$");
if (ending == null) ending = "";
replacedString = match.replace(beginning,"");
replacedString = replacedString.replace(ending,"");
replaced.push(replacedString);
return beginning+"{{"+order+"}}"+ending;
});
if (replacedString == "") break;
order += 1;
}
See the code in a fiddle: http://jsfiddle.net/antoninslejska/bvbLpdos/1/
The regular expression is inspired by: http://breakthebit.org/post/3446894238/word-boundaries-in-javascripts-regular
I can't say, that I find the solution elegant...

The correct answer to the question is given by andrefs.
I will only rewrite it more clearly, after putting all required things together.
For ASCII text, you can use \b for matching a word boundary both at the start and the end of a pattern. When using Unicode text, you need to use 2 different patterns for doing the same:
Use (?<=^|\P{L}) for matching the start or a word boundary before the main pattern.
Use (?=\P{L}|$) for matching the end or a word boundary after the main pattern.
Additionally, use (?i) in the beginning of everything, to make all those matchings case-insensitive.
So the resulting answer is: (?i)(?<=^|\P{L})xxx(?=\P{L}|$), where xxx is your main pattern. This would be the equivalent of (?i)\bxxx\b for ASCII text.
For your code to work, you now need to do the following:
Assign to your variable "searchterm", the pattern or words you want to find.
Escape the variable's contents. For example, replace '\' with '\\' and also do the same for any reserved special character of regex, like '\^', '\$', '\/', etc. Check here for a question on how to do this.
Insert the variable's contents to the pattern above, in the place of "xxx", by simply using the string.replace() method.

bad but working:
var text = " аб аб АБ абвг ";
var ttt = "(аб)"
var p = "(^|$|[^A-Za-zА-Я-а-я0-9()])"; // add other word boundary symbols here
var exp = new RegExp(p+ttt+p,"gi");
text = text.replace(exp, "$1($2)$3").replace(exp, "$1($2)$3");
const t1 = performance.now();
console.log(text);
result (without qutes):
" (аб) (аб) (АБ) абвг "

I struggled hard on this. Working with French accented characters, and I managed to find this solution :
const myString = "MyString";
const regex = new RegExp(
"(?:[^À-ú]|^)\\b(" + myString + ")\\b(?:[^À-ú]|$)",
"ig"
);
What id does :
It keeps checking word-boundaries with \b before and after "MyString".
In addition to that, (?:[^À-ú]|^) and (?:[^À-ú]|$) will check if MyString is not surrounded by any accented characters
It will not work with cyrillic but it may be possible to find the range of cirillic charactes and edit [^À-ú] in consequence.
Warning, it captures only the group (MyString) but the total match contains previous and next characters
See example : https://regex101.com/r/5P0ZIe/1
Match examples :
MyString
match : "MyString"
group 1 : "MyString"
Lorem ipsum. MyString dolor sit amet
match : " MyString "
group 1 : "MyString"
(MyString)
match : "(MyString)"
group 1 : "MyString"
BetweenCharactersMyStringIsNotFound
match : Nothing
group 1 : Nothing
éMyStringé
match : Nothing
group 1 : Nothing
ùMyString
match : Nothing
group 1 : Nothing
MyStringÖ
match : Nothing
group 1 : Nothing

Add limited spaces depend on name in regular expression using JavaScript

I have one regular expression for Full Name Validation,but I want name should start with alphabates and space (it depends on name) with limit like 50 (not more than 50):
^[a-zA-Z ]*$
This is working but but here no limit, how to add limit and spaces (depends on name)?

To force the first symbol to be an English letter or a space, and the following 49 characters can be any (but a newline), you can use the following regex:
^[a-zA-Z ].{49}$
If you want to just limit the input to English letters and spaces, you just need to add the limiting quantifier {1,50} meaning from 1 up to 50 occurrences of the preceding subpattern:
^[a-zA-Z ]{1,50}$
Adapting to your code and coding style, here is how you can use the second regex:
if($(this).attr('id') === "FullName") {
var re = new RegExp("^[a-zA-Z ]{1,50}$");
if(!re.test($(this).val())) {
res = "FullName is Not Valid"; alertDispaly(res);
}
}
To apply further restrictions, e.g. do not end in a space, you can use ^[a-zA-Z ]{1,49}[a-zA-Z]$. Or, no double space allowed: ^(?!.* )[a-zA-Z ]{1,49}[a-zA-Z]$.
EDIT: To allow tabs, newline characters, and other whitespace, you can add \\s to your pattern, e.g.:
var re = new RegExp("^[a-zA-Z\\s]{1,50}$");
You need to use \s in literal regex notation, and \\s in a RegExp constructor.

If you want to limit the name to max 50 characters:
^[a-zA-Z\s]{,50}$
Here {,50} will match for maximum of 50 preceding characters.
\s will match any space character.
EDIT
var reg = /^[a-zA-Z\s]{1,50}$/;
....
if (reg.test($(this).val())) {
// Valid
} else {
// Invalid
}
Demo: http://jsfiddle.net/tusharj/xcr2hgk9/

You can try:
^[a-zA-Z\s]{0,50}$
http://regexr.com/3b3o2

Regex to check if string contains Alphanumeric Characters and Spaces only - javascript

This is what I have so far:
function checkTitle(){
reg = /^[\w ]+$/;
a = reg.test($("#title").val());
console.log(a);
}
So far in my tests it catches all special characters except _.
How do I catch all special characters including _ in the current function?
I need the string to only have Alphanumeric Characters and Spaces. Appreciate the help cause I am having a hard time understanding regex patterns.
Thanks!

Your problem is that \w matches all alphanumeric values and underscore.
Rather than parsing the entire string, I'd just look for any unwanted characters. For example
var reg = /[^A-Za-z0-9 ]/;
If the result of reg.test is true, then the string fails validation.

Since you are stating you are new to RegExp, I might as well include some tips with the answer. I suggest the following regexp:
/^[a-z\d ]+$/i
Here:
There is no need for the upper case A-Z because of the i flag in the end, which matches in a case-insensitive manner
\d special character represents digits

Finding Plus Sign in Regular Expression

var string = 'abcd+1';
var pattern = 'd+1'
var reg = new RegExp(pattern,'');
alert(string.search(reg));
I found out last night that if you try and find a plus sign in a string of text with a Javascript regular expression, it fails. It will not find that pattern, even though it exists in that string. This has to be because of a special character. What's the best way to find a plus sign in a piece of text? Also, what other characters will this fail on?

Plus is a special character in regular expressions, so to express the character as data you must escape it by prefixing it with \.
var reg = /d\+1/;

\-\.\/\[\]\\ **always** need escaping
\*\+\?\)\{\}\| need escaping when **not** in a character class- [a-z*+{}()?]
But if you are unsure, it does no harm to include the escape before a non-word character you are trying to match.
A digit or letter is a word character, escaping a digit refers to a previous match, escaping a letter can match an unprintable character, like a newline (\n), tab (\t) or word boundary (\b), or a a set of characters, like any word-character (\w), any non-word character (\W).
Don't escape a letter or digit unless you mean it.

Just a note,
\ should be \\ in RegExp pattern string, RegExp("d\+1") will not work and Regexp(/d\+1/) will get error.
var string = 'abcd+1';
var pattern = 'd\\+1'
var reg = new RegExp(pattern,'');
alert(string.search(reg));
//3

You should use the escape character \ in front of the + in your pattern. eg. \+

You probably need to escape the plus sign:
var pattern = /d\+1/
The plus sign is used in regular expressions to indicate 1 or more characters in a row.

It should be var pattern = '/d\\+1/'.
The string will escape '\\' as '\' ('\\+' --> '\+') so the regex object init with /d\+1/

if you want to use + (plus sign) or $ (sigil /dollar sign), then use \ (backslash) as a prefix. Like that:
\$ or \+

Develop Reference

JavaScript is the programming language of the Web.

Regular expression for validating with specific characters through javascript - javascript

I want to validate a string that does not allows the following characters. <,>,:,","/,\,|,?,*,# I want to validate this through JavaScript. I was trying this with the following code. var reg = /[^a-zA-Z0-9 \-_]+/; reg.test(filename[0]) But this was unable to validating the symbol #. Please help.

string sourceString ="something" ; var outString = sourceString.replace(/[`~!##$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');

Related

regex custom lenght but no whitespace allowed [duplicate]

\b regex special character seems not working for Cyrillic in javascript [duplicate]

Add limited spaces depend on name in regular expression using JavaScript

Regex to check if string contains Alphanumeric Characters and Spaces only - javascript

Finding Plus Sign in Regular Expression

Categories

Resources