javascript replace() function strange behaviour with regexp - javascript

Am i doing sth wrong or there is a problem with JS replace ?
<input type="text" id="a" value="(55) 55-55-55" />​
document.write($("#a").val().replace(/()-/g,''));​
prints (55) 555555
http://jsfiddle.net/Yb2yV/
how can i replace () and spaces too?

In a JavaScript regular expression, the ( and ) characters have special meaning. If you want to list them literally, put a backslash (\) in front of them.
If your goal is to get rid of all the (, ), -, and space characters, you could do it with a character class combined with an alternation (e.g., either-or) on \s, which stands for "whitespace":
document.write($("#a").val().replace(/[()\-]|\s/g,''));​
(I didn't put backslashes in front of the () because you don't need to within a character class. I did put one in front of the - because within a character class, - has special meaning.)
Alternately, if you want to get rid of anything that isn't a digit, you can use \D:
document.write($("#a").val().replace(/\D/g,''));​
\D means "not a digit" (note that it's a capital, \d in lower case is the opposite [any digit]).
More info on the MDN page on regular expressions.

You need to use a character class
/[-() ]/
Using "-" as the first character solves the ambiguity because a dash is normally used for ranges (e.g. [a-zA-Z0-9]).

document.write($("#a").val().replace(/[\s()-]/g,''));​
That will remove all whitespace (\s), parens, and dashes

Use this
.replace(/\(|\)|-| /g,'')
You have to escape the parenthesis (i.e. \( instead of (). In your regexp, you want to list the four items: \(, \), '-' and (space) and as you want to replace any of them, not just a string of them four together, you have to use OR | between them.

May be very bad but a very basic approach would be,
document.write($("#a").val().replace(/(\()|(\))|-| |/g,''));​​
| means OR,
\ is used for escaping reserved symbols

You want to match any character in the set, so you should use square brackets to make a character set:
document.write($("#a").val().replace(/[()\- ]/g,''));
Normally, parentheses have a special meaning in regular expressions, so they were being ignored in your regex, leaving just the dash. Normally, to get literal parentheses, you need to escape them with \ (but in a square bracket block, as above, you don't).
The dash above is escaped because it has normally indicates range in a character set, e.g., [a-z].

The brackets indicate a capturing group in the regexp. You'd need to escape them (/\(\)-/) to match the sequence "()-". Yet I guess you want to use a character class, i.e. a expression that matches "(", ")" or "-"; for whitespaces include the \s shorthand:
value.replace(/[()-\s]/g, "");
You might want to read some documentation or tutorial.

Related

RegEx does not match my example

This RegEx could not find example string.
RegEx:
^ALTER\\sTABLE\\sADMIN_\\sADD CONSTRAINT \\s(.*)\\sPRIMARY KEY \\s(\(.*\))\\.([a-zA-Z0-9_]+)
Example:
ALTER TABLE ADMIN_ ADD CONSTRAINT PK_ADMIN_ PRIMARY KEY (RECNOADM);
I am new to regex and tried to complete my RegEx at REGEX101.COM but with no success. What am I missing?
Djorjde
^\s*ALTER\s+TABLE\s+ADMIN_\s+ADD\s+CONSTRAINT\s+(.+)\s+PRIMARY\s+KEY\s*\((.+)\)\s*;\s*$
This expression will match the SQL statement you used as an example, capturing PK_ADMIN_ in the first group and RECNOADM in the second.
My suggestion is to use always \s+ to match the spaces (\s* when they are optional, like the leading or trailing spaces), unless they have to be exactly a single space.
So let's break the regex down:
^ Marks the beginning of the line. You don't want the line to match if there's anything else before.
\s* Optional leading spaces.
ALTER\s+TABLE\s+ADMIN_\s+ADD\s+CONSTRAINT This will match ALTER TABLE ADMIN_ ADD CONSTRAINT, regardless of the spacing used.
\s+(.+)\s+ Then, the next space-bound word(s)** will be captured into the first group. You're accepting any character here! Maybe you could want to restrict that to \w+ or the like. Unless you accept an empty group, use the + closure (i.e., one or more), not the * one (i.e., zero or more)
PRIMARY\s+KEY Matches the sequence PRIMARY KEY, again, regardless of the spacing.
\s*\((.+)\) This will capture anything inside the parentheses as the PK in the second capture group.
\s* Means that it can be optionally preceded by an arbitrary number of spaces (although they are optional. They are in SQL if I recall correctly)
\(...\) You have to escape the parentheses because they are characters to match, no special characters of the regex.
(.+) Here you capture (between unescaped parentheses) everything between the (escaped) parentheses into a capture group. The second one in this case.
\s*;\s* The sentence has to end with a semicolon, optionally preceded and/or succeeded by any spaces.
$ Marks the end of the line.
In case you want to accept more than one sentence in the same line, you'd remove the ^ and $ zero-width delimiters.
About the escaping, the easiest way here is to simply double every backslash in the expression you built in the editor: ^\\s*ALTER\\s+TABLE\\s+ADMIN_\\s+ADD\\s+CONSTRAINT\\s+(.+)\\s+PRIMARY\\s+KEY\\s*\\((.+)\\)\\s*;\\s*$ However, there are context and/or languages where a more complex escaping may be needed (e.g., the Linux shell)
** Note that in 4, the inner expression .+ will take as many characters as possible, as long as the remaining parts also match the string. This is because the closures are by default greedy, meaning that the engine will try to match the longest string possible. That means that, for instance, this entry will match: ALTER TABLE ADMIN_ ADD CONSTRAINT PK_ADMIN_ OR *WHATEVER* YOU "WANT" TO PUT HERE! PRIMARY KEY (RECNOADM);, capturing PK_ADMIN_ OR *WHATEVER* YOU "WANT" TO PUT HERE! in the first group. Hence the importance of restricting the set of accepted characters ;)
Have you tried the following?
^ALTER\sTABLE\sADMIN_\sADD\sCONSTRAINT\s((.*))\sPRIMARY\sKEY\s\((.*)\);
I am wrapping two separate blocks through () in order to identify from the Regex the values inserted if you need to access them too.
In your regex there are few issues with white spaces (mixing up white spaces with \s and the white space should be \s not \s)
In JavaScript you only need to escape backslashes that are part of escape sequences when you're composing a regexp from a string, e.g.:
var r = new RegExp('\\d');
console.log(r.test('2'));
But the additional \ is not part of the regexp and you don't need it when using the literal syntax (or regexp101):
var r = /\d/;
console.log(r.test('2'));

Regex to allow special characters

I need a regex that will allow alphabets, hyphen (-), quote ('), dot (.), comma(,) and space. this is what i have now
^[A-Za-z\s\-]$
Thanks
I removed \s from your regex since you said space, and not white space. Feel free to put it back by replacing the space at the end with \s Otherwise pretty simple:
^[A-Za-z\-'., ]+$
It matches start of the string. Any character in the set 1 or more times, and end of the string. You don't have to escape . in a set, in case you were wondering.
You probably tried new RegExp("^[A-Za-z\s\-\.\'\"\,]$"). Yet, you have a string literal there, and the backslashes just escape the following characters - necessary only for the delimiting quote (and for backslashes).
"^[A-Za-z\s\-\.\'\"\,]$" === "^[A-Za-zs-.'\",]$" === '^[A-Za-zs-.\'",]$'
Yet, the range s-. is invalid. So you would need to escape the backslash to pass a string with a backslash in the RegExp constructor:
new RegExp("^[A-Za-z\\s\\-\\.\\'\\\"\\,]$")
Instead, regex literals are easier to read and write as you do not need to string-escape regex escape characters. Also, they are parsed only once during script "compilation" - nothing needs to be executed each time you the line is evaluated. The RegExp constructor only needs to be used if you want to build regexes dynamically. So use
/^[A-Za-z\s\-\.\'\"\,]$/
and it will work. Also, you don't need to escape any of these chars in a character class - so it's just
/^[A-Za-z\s\-.'",]$/
You are pretty close, try the following:
^[A-Za-z\s\-'.,]+$
Note that I assumed that you want to match strings that contain one or more of any of these characters, so I added + after the character class which mean "repeat the previous element one or more times".
Note that this will currently also allow tabs and line breaks in addition to spaces because \s will match any whitespace character. If you only want to allow spaces, change it to ^[A-Za-z \-'.,]+$ (just replaced \s with a space).

Can it be done with regex?

Having the following regex: ([a-zA-Z0-9//._-]{3,12}[^//._-]) used like pattern="([a-zA-Z0-9/._-]{3,12}[^/._-])" to validate an HTML text input for username, I wonder if is there anyway of telling it to check that the string has only one of the following: ., -, _
By that I mean, that I'm in need of regex that would accomplish the following (if possible)
alex-how => Valid
alex-how. => Not valid, because finishing in .
alex.how => Valid
alex.how-ha => Not valid, contains already a .
alex-how_da => Not valid, contains already a -
The problem with my current regex, is that for some reason, accepts any character at the end of the string that is not ._-, and can't figure it out why.
The other problem, is that it doesn't check to see that it contains only of the allowed special characters.
Any ideas?
Try this one out:
^(?!(.*[.|_|-].*){2})(?!.*[.|_|-]$)[a-zA-Z0-9//._-]{3,12}$
Regexpal link. The regex above allow at max one of ., _ or -.
What you want is one or more strings containing all upper, lower and digit characters
followed by either one or none of the characters in "-", ".", or "_", followed by at least one character:
^[a-zA-Z0-9]+[-|_|\.]{0,1}[a-zA-Z0-9]+$
Hope this will work for you:-
It says starts with characters followed by (-,.,_) and followed and end with characters
^[\w\d]*[-_\.\w\d]*[\w\d]$
Seems to me you want:
^[A-Za-z0-9]+(?:[\._-][A-Za-z0-9]+)?$
Breaking it down:
^: beginning of line
[A-Za-z0-9]+: one or more alphanumeric characters
(?:[\._-][A-Za-z0-9]+)?: (optional, non-captured) one of your allowed special characters followed by one or more alphanumeric characters
$: end of line
It's unclear from your question if you wanted one of your special characters (., -, and _) to be optional or required (e.g., zero-or-one versus exactly-one). If you actually wanted to require one such special character, you would just get rid of the ? at the very end.
Here's a demonstration of this regular expression on your example inputs:
http://rubular.com/r/SQ4aKTIEF6
As for the length requirement (between 3 and 12 characters): This might be a cop-out, but personally I would argue that it would make more sense to validate this by just checking the length property directly in JavaScript, rather than over-complicating the regular expression.
^(?=[a-zA-Z0-9/._-]{3,12}$)[a-zA-Z0-9]+(?:[/._-][a-zA-Z0-9]+)?$
or, as a JavaScript regex literal:
/^(?=[a-zA-Z0-9\/._-]{3,12})[a-zA-Z0-9]+(?:[\/._-][a-zA-Z0-9]+)?$/
The lookahead, (?=[a-zA-Z0-9/._-]{3,12}$), does the overall-length validation.
Then [a-zA-Z0-9]+ ensures that the name starts with at least one non-separator character.
If there is a separator, (?:[/._-][a-zA-Z0-9]+)? ensures that there's at least one non-separator following it.
Note that / has no special meaning in a regex. You only have to escape it if you're using a regex literal (because / is the regex delimiter), and you escape it by prefixing with a backslash, not another forward-slash. And inside a character class, you don't need to escape the dot (.) to make it match a literal dot.
The dot in regex has a special meaning: "any character here".
If you mean a literal dot, you should escape it to tell the regex parser so.
Escape dot in a regex range

Regex not working as expected

Whats wrong with this regular expression?
/^[a-zA-Z\d\s&#-\('"]{1,7}$/;
when I enter the following valid input, it fails:
a&'-#"2
Also check for 2 consecutive spaces within the input.
The dash needs to be either escaped (\-) or placed at the end of the character class, or it will signify a range (as in A-Z), not a literal dash:
/^[A-Z\d\s&#('"-]{1,7}$/i
would be a better regex.
N. B: [#-\(] would have matched #, $, %, &, ' or (.
To address the added requirement of not allowing two consecutive spaces, use a lookahead assertion:
/^(?!.*\s{2})[A-Z\d\s&#('"-]{1,7}$/i
(?!.*\s{2}) means "Assert that it's impossible to match (from the current position) any string followed by two whitespace characters". One caveat: The dot doesn't match newline characters.
The - (hyphen) has a special meaning inside a character class, used for specifying ranges. Did you mean to escape it?:
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/;
This RegExp matches your input.
You have an unescaped - in the middle of your character class. This means that you're actually searching for all characters between and including # and ( (which are #, $, %, &, ', and (). Either move it to the end or escape it with a backslash. Your regex should read:
/^[a-zA-Z\d\s&#\('"-]{1,7}$/
or
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/
remove the ; at the end and
^[a-zA-Z\d\s\&\#\-\(\'\"]+$
Your input does not match the regular expression. The problem here is the hyphen in you regexp. If you move it from its position after the '#' character to the start of the regex, like so:
/^[-a-zA-Z\d\s&#\('"]{1,7}$/;
everything is fine and dandy.
You can always use Rubular for checking your regular expressions. I use it on a regular (no pun intended) basis.

Why this Regex, matches incorrect characters?

I need to match these characters. This quote is from an API documentation (external to our company):
Valid characters: 0-9 A-Z a-z & # - . , ( ) / : ; ' # "
I used this Regex to match characters:
^[0-9a-z&#-\.,()/:;'""#]*$
However, this wrongly matches characters like %, $, and many other characters. What's wrong?
You can test this regular expression online using http://regexhero.net/tester/, and this regular expression is meant to work in both .NET and JavaScript.
You are not escaping the dash -, which is a reserved character. If you add replace the dash with \- then the regex no longer matches those characters between # and \
Move the literal - to the front of the character set:
^[-0-9a-z&#\.,()/:;'""#]*$
otherwise it is taken as specifying a range like when you use it in 0-9.
- sign, when not escaped, has special meaning in square brackets. #-\. is transformed into #-. (BTW, backslash before dot is not necessary in square brackets), which means "any character between # (ASCII 0x23) and . (ASCII 0x2E). The correct notation is
^[0-9a-z&#\-.,()/:;'"#]*$
The special characters in a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-).
As such, you should either escape them with a backslash (\), or put them in a position where there is no ambiguity and they do not need escaping. In the case of a hyphen, this would be the first or last position.
You also do not need to escape the dot (.).
Your regex thus becomes:
^[-0-9a-z&#.,()/:;'"#]*$
As a side note, there are many available regex evaluators which provide code hinting. This way, you can simply hover your mouse over your regular expression and it can be explained in English words.
One such free one is RegExr.
Typing your original regex in it and hovering over the hyphen shows:
Matches characters in the range '#-\'
Try that
^[0-9a-zA-Z\&\#\-\.\,\(\)\/\:\;\'\"\#]*$

Categories

Resources