Why this Regex, matches incorrect characters? - javascript

I need to match these characters. This quote is from an API documentation (external to our company):
Valid characters: 0-9 A-Z a-z & # - . , ( ) / : ; ' # "
I used this Regex to match characters:
^[0-9a-z&#-\.,()/:;'""#]*$
However, this wrongly matches characters like %, $, and many other characters. What's wrong?
You can test this regular expression online using http://regexhero.net/tester/, and this regular expression is meant to work in both .NET and JavaScript.

You are not escaping the dash -, which is a reserved character. If you add replace the dash with \- then the regex no longer matches those characters between # and \

Move the literal - to the front of the character set:
^[-0-9a-z&#\.,()/:;'""#]*$
otherwise it is taken as specifying a range like when you use it in 0-9.

- sign, when not escaped, has special meaning in square brackets. #-\. is transformed into #-. (BTW, backslash before dot is not necessary in square brackets), which means "any character between # (ASCII 0x23) and . (ASCII 0x2E). The correct notation is
^[0-9a-z&#\-.,()/:;'"#]*$

The special characters in a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-).
As such, you should either escape them with a backslash (\), or put them in a position where there is no ambiguity and they do not need escaping. In the case of a hyphen, this would be the first or last position.
You also do not need to escape the dot (.).
Your regex thus becomes:
^[-0-9a-z&#.,()/:;'"#]*$
As a side note, there are many available regex evaluators which provide code hinting. This way, you can simply hover your mouse over your regular expression and it can be explained in English words.
One such free one is RegExr.
Typing your original regex in it and hovering over the hyphen shows:
Matches characters in the range '#-\'

Try that
^[0-9a-zA-Z\&\#\-\.\,\(\)\/\:\;\'\"\#]*$

Related

How to prevent regex from validating double dots after # character

I am using the following regex in a js
^[a-zA-Z0-9._+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$
This validates email in subdomain (ex: myname#google.co.in)
Unfortunately a double dot is also validated as true, such as
myname#..in
myname#domain..in
I understand the part #[a-zA-Z0-9.-] is to be modified but kinda struck. What is the best way to proceed.
TIA
Try using:
^([\w+-]+\.)*[\w+-]+#([\w+-]+\.)*[\w+-]+\.[a-zA-Z]{2,4}$
I've replaced the [a-zA-Z0-9_] with the exact equivalent \w in the char group.
Note that in the regex language the dot . is a special char that matches everything (but newlines). So to match a literal dot you need to escape it \..
Legenda:
^ start of the string
([\w+-]+\.)* zero or more regex words (in addiction to plus + and minus-) composed by 1 or more chars followed by a literal dot \.
[\w+-]+ regex words (plus [+-]) of 1 or more chars
# literal char
([\w+-]+\.)*[\w+-]+ same sequence as above
\.[a-zA-Z]{2,4} literal dot followed by a sequence of lowercase or uppercase char with a length between 2 and 4 chars.
$ end of the string
Try this:
^([a-zA-Z0-9._+-]+)(#[a-zA-Z0-9-]+)(.[a-zA-Z]{2,4}){2,}$
You can test it here - https://regex101.com/r/Ihj8sd/1

Crockford - Chapter 7 - parse_url

var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;
Why is the dot . in this part
[0-9.-A-Za-z]+
not escaped by a backslash?
Brackets ([]) specify a character class: matching a single character in the string between [].
While inside a character class, only the \ and - have special meaning (are metacharacters):
backslash \: general escape character.
hyphen -: character range.
Notice, though, it must be between chars to have special meaning:
[0-9] means any number between 0 and 9, while in [09-], - assumes the quality of an ordinary -, not a range.
That's why, inside [], a . is just (will only match) a dot.
Note: It is also worth noticing that the char ] must be escaped to be used inside a character class, such as [a-z\]], otherwise it will close it as usual. Finally, using ^, as in [^a-z], designates a negated character class, that means any char that is not one of those (in the example, any char that is not a...z).
So it matches a dot.
Except under some circumstances (e.g., escaping the range hyphen when it's not the first character in the character class brackets) you don't need to escape special characters in a class.
You may escape the normal metacharacters inside character classes, but it's noisy and redundant.

Match special characters including square braces

I want to have a regex for text field in ExtJs(maskRe) which matches all java code pattern
I've used
maskRe:/^[A-Za-z0-9 _=//~'"|{}();*:?+,.]*$/
I also want to include [,], but it seems /[, /], //[, //] is not working..
Any inputs please
The problem is you need to escape your forward slash. Change // to \/:
/^[A-Za-z0-9 _=\/~'"|{}();*:?+,.]*$/
However this regular expression does not match any Java code. Java code can contain almost any Unicode character. int møøse = 42; is valid Java.
To strip special characters from its magic powers you have to escape them, by putting backslash \ in front of character. I.e. to match [ you type \[.
And since backslash acts as special character as well, to match it literally, you escape it the same way: \\.
And since you used / as patter delimiter, you need to escape its occurrences within pattern:
/^[A-Za-z0-9 _=\/~'"|{}();*:?+,.]*$/
The way to escape regex meta-characters is using a backslash (\), not a forwards slash (/).
[,] should be \[,\]
// should be \/

javascript replace() function strange behaviour with regexp

Am i doing sth wrong or there is a problem with JS replace ?
<input type="text" id="a" value="(55) 55-55-55" />​
document.write($("#a").val().replace(/()-/g,''));​
prints (55) 555555
http://jsfiddle.net/Yb2yV/
how can i replace () and spaces too?
In a JavaScript regular expression, the ( and ) characters have special meaning. If you want to list them literally, put a backslash (\) in front of them.
If your goal is to get rid of all the (, ), -, and space characters, you could do it with a character class combined with an alternation (e.g., either-or) on \s, which stands for "whitespace":
document.write($("#a").val().replace(/[()\-]|\s/g,''));​
(I didn't put backslashes in front of the () because you don't need to within a character class. I did put one in front of the - because within a character class, - has special meaning.)
Alternately, if you want to get rid of anything that isn't a digit, you can use \D:
document.write($("#a").val().replace(/\D/g,''));​
\D means "not a digit" (note that it's a capital, \d in lower case is the opposite [any digit]).
More info on the MDN page on regular expressions.
You need to use a character class
/[-() ]/
Using "-" as the first character solves the ambiguity because a dash is normally used for ranges (e.g. [a-zA-Z0-9]).
document.write($("#a").val().replace(/[\s()-]/g,''));​
That will remove all whitespace (\s), parens, and dashes
Use this
.replace(/\(|\)|-| /g,'')
You have to escape the parenthesis (i.e. \( instead of (). In your regexp, you want to list the four items: \(, \), '-' and (space) and as you want to replace any of them, not just a string of them four together, you have to use OR | between them.
May be very bad but a very basic approach would be,
document.write($("#a").val().replace(/(\()|(\))|-| |/g,''));​​
| means OR,
\ is used for escaping reserved symbols
You want to match any character in the set, so you should use square brackets to make a character set:
document.write($("#a").val().replace(/[()\- ]/g,''));
Normally, parentheses have a special meaning in regular expressions, so they were being ignored in your regex, leaving just the dash. Normally, to get literal parentheses, you need to escape them with \ (but in a square bracket block, as above, you don't).
The dash above is escaped because it has normally indicates range in a character set, e.g., [a-z].
The brackets indicate a capturing group in the regexp. You'd need to escape them (/\(\)-/) to match the sequence "()-". Yet I guess you want to use a character class, i.e. a expression that matches "(", ")" or "-"; for whitespaces include the \s shorthand:
value.replace(/[()-\s]/g, "");
You might want to read some documentation or tutorial.

Regex not working as expected

Whats wrong with this regular expression?
/^[a-zA-Z\d\s&#-\('"]{1,7}$/;
when I enter the following valid input, it fails:
a&'-#"2
Also check for 2 consecutive spaces within the input.
The dash needs to be either escaped (\-) or placed at the end of the character class, or it will signify a range (as in A-Z), not a literal dash:
/^[A-Z\d\s&#('"-]{1,7}$/i
would be a better regex.
N. B: [#-\(] would have matched #, $, %, &, ' or (.
To address the added requirement of not allowing two consecutive spaces, use a lookahead assertion:
/^(?!.*\s{2})[A-Z\d\s&#('"-]{1,7}$/i
(?!.*\s{2}) means "Assert that it's impossible to match (from the current position) any string followed by two whitespace characters". One caveat: The dot doesn't match newline characters.
The - (hyphen) has a special meaning inside a character class, used for specifying ranges. Did you mean to escape it?:
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/;
This RegExp matches your input.
You have an unescaped - in the middle of your character class. This means that you're actually searching for all characters between and including # and ( (which are #, $, %, &, ', and (). Either move it to the end or escape it with a backslash. Your regex should read:
/^[a-zA-Z\d\s&#\('"-]{1,7}$/
or
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/
remove the ; at the end and
^[a-zA-Z\d\s\&\#\-\(\'\"]+$
Your input does not match the regular expression. The problem here is the hyphen in you regexp. If you move it from its position after the '#' character to the start of the regex, like so:
/^[-a-zA-Z\d\s&#\('"]{1,7}$/;
everything is fine and dandy.
You can always use Rubular for checking your regular expressions. I use it on a regular (no pun intended) basis.

Categories

Resources