What does `escape a string` mean in Regex? (Javascript)

What does `escape a string` mean in Regex? (Javascript) - javascript

I'm trying to understand the backslash and how to use escaping like: \ in regular expressions.
I've read that when using strings its named to escape a string.
But what does that actually mean?

Many characters in regular expressions have special meanings. For instance, the dot character '.' means "any one character". There are a great deal of these specially-defined characters, and sometimes, you want to search for one, not use its special meaning.
See this example to search for any filename that contains a '.':
/^[^.]+\..+/
In the example, there are 3 dots, but our description says that we're only looking for one. Let's break it down by the dots:
Dot #1 is used inside a "character class" (the characters inside the square brackets), which tells the regex engine to search for "any one character" that is not a '.', and the "+" says to keep going until there are no more characters or the next character is the '.' that we're looking for.
Dot #2 is preceded by a backslash, which says that we're looking for a literal '.' in the string (without the backslash, it would be using its special meaning, which is looking for "any one character"). This dot is said to be "escaped", because it's special meaning is not being used in this context - the backslash immediately before it made that happen.
Dot #3 is simply looking for "any one character" again, and the '+' following it says to keep doing that until it runs out of characters.
So, the backslash is used to "escape" the character immediately following it; as such, it's called the "escape character". That just means that the character's special meaning is taken away in that one place.
Now, escaping a string (in regex terms) means finding all of the characters with special meaning and putting a backslash in front of them, including in front of other backslash characters. When you've done this one time on the string, you have officially "escaped the string".

Say you try to print out a string, let's say "this\that".
That \ character is recognized as a special character. I'm not sure about regex, but say in Java or C, \t will tab the rest of the string over, so it would print as
this hat
But the \ "escapes" a character from the string, deriving it of regular meaning, so using "this\that" instead would result in
this\that
I hope this helped.

Quoting from MSDN:
The backslash (\) in a regular expression indicates one of the following:
The character that follows it is a special character, as shown in the table in the following section. For example, \b is an anchor that indicates that a regular expression match should begin on a word boundary, \t represents a tab, and \x020 represents a space.
A character that otherwise would be interpreted as an unescaped language construct should be interpreted literally. For example, a brace ({) begins the definition of a quantifier, but a backslash followed by a brace (\{) indicates that the regular expression engine should match the brace. Similarly, a single backslash marks the beginning of an escaped language construct, but two backslashes (\) indicate that the regular expression engine should match the backslash.

Related

How to improve this Regular expression validation?

I tried to write a form validation for description textarea> of the users about their owns like he/she education or experience.
I wrote this Regex for this textarea, but I have a problem if user use above comma it's not allowed, for example if user written "House's", it's not allowing to write this comma '.
PWhich symbols may needed or predicate while users describe owns?
I used this Regex:
$descriptionValidation = "/^[a-zA-Z0-9\.\-\,\"\(\) ]+[a-zA-Z0-9\.\-\,\"\(\) ]*$/";

To match a whole string and require that the string only consist of alphanumeric characters and: dots, commas, single-quotes (also called apostrophes, but not "above commas"), double-quotes, left parentheses, right parentheses, spaces, and hyphens, use the following expression.
The ^ and $ metacharacters ensure that the characters span the entire length of the string. + means one or more of the any of the characters in the list. The "list" is technically called a "character class". a-z is the full range of letters and \d is the full range of numbers. - does have special meaning inside of a character class but only if it has a non-ranged expression on both sides of it. If you wish to prevent mistakes with hyphens inside of a character class, you can add a backslash to escape it or you can write the hyphen at the start or end of the character class OR you can write it next to a character range.
/^[a-z\d.,'"() -]+$/i
When declaring this pattern in php using single quotes, you will need to escape the single-quote in the character class.
$descriptionValidation = '/^[a-z\d.,\'"() -]+$/i';

javascript replace() function strange behaviour with regexp

Am i doing sth wrong or there is a problem with JS replace ?
<input type="text" id="a" value="(55) 55-55-55" />
document.write($("#a").val().replace(/()-/g,''));
prints (55) 555555
http://jsfiddle.net/Yb2yV/
how can i replace () and spaces too?

In a JavaScript regular expression, the ( and ) characters have special meaning. If you want to list them literally, put a backslash (\) in front of them.
If your goal is to get rid of all the (, ), -, and space characters, you could do it with a character class combined with an alternation (e.g., either-or) on \s, which stands for "whitespace":
document.write($("#a").val().replace(/[()\-]|\s/g,''));
(I didn't put backslashes in front of the () because you don't need to within a character class. I did put one in front of the - because within a character class, - has special meaning.)
Alternately, if you want to get rid of anything that isn't a digit, you can use \D:
document.write($("#a").val().replace(/\D/g,''));
\D means "not a digit" (note that it's a capital, \d in lower case is the opposite [any digit]).
More info on the MDN page on regular expressions.

You need to use a character class
/[-() ]/
Using "-" as the first character solves the ambiguity because a dash is normally used for ranges (e.g. [a-zA-Z0-9]).

document.write($("#a").val().replace(/[\s()-]/g,''));
That will remove all whitespace (\s), parens, and dashes

Use this
.replace(/\(|\)|-| /g,'')
You have to escape the parenthesis (i.e. \( instead of (). In your regexp, you want to list the four items: \(, \), '-' and (space) and as you want to replace any of them, not just a string of them four together, you have to use OR | between them.

May be very bad but a very basic approach would be,
document.write($("#a").val().replace(/(\()|(\))|-| |/g,''));
| means OR,
\ is used for escaping reserved symbols

You want to match any character in the set, so you should use square brackets to make a character set:
document.write($("#a").val().replace(/[()\- ]/g,''));
Normally, parentheses have a special meaning in regular expressions, so they were being ignored in your regex, leaving just the dash. Normally, to get literal parentheses, you need to escape them with \ (but in a square bracket block, as above, you don't).
The dash above is escaped because it has normally indicates range in a character set, e.g., [a-z].

The brackets indicate a capturing group in the regexp. You'd need to escape them (/\(\)-/) to match the sequence "()-". Yet I guess you want to use a character class, i.e. a expression that matches "(", ")" or "-"; for whitespaces include the \s shorthand:
value.replace(/[()-\s]/g, "");
You might want to read some documentation or tutorial.

Strange javascript regular expressions

I have found the following regular expression
new RegExp("(^|\\s)hello(\\s|$)");
I refer http://www.javascriptkit.com/jsref/escapesequence.shtml for regular expressions..
But i cannot see \s escape sequence there..I know \s indicate whitespace character...
But what does the preceding \ do ..Which character is escaped?
I found similar regular expression in the Treewalker code in the following document http://ejohn.org/blog/getelementsbyclassname-speed-comparison/

The double \\ is to escape the backslash inside the string. In other word, \\ will be interpreted as \ for the regular expression.

The extra \ in this case is to escape the \ in the \s. Because we are inside a string declaration, you have to double up the \ to escape it. Once the string is processed and saved, it is reduced down to (^|\s)hello(\s|$)

The character immediately following the first \ is escaped. Normally \s escapes the s to mean "whitespace". In your example, the character which is escaped is \.
What you have is an expression which builds a regex (presumably to pass elsewhere) of (^|\s)hello(\s|$) — the word "hello" preceded either by whitespace or the start of the string, and followed by whitespace or the end of the string.

Essentially what the reg ex is doing, is looking for the opening and closing items of text surrounding the word hello and literally interpreting the '\s' as string content at the same time.
In laymans terms it's looking for a string that exactly matches:
|\shello\s|
As others have said the double \ is to escape the single \ so that instead of the reg ex engine looking for white-space it actually looks for '\s' as a string.
The ^ means start of line, the $ means end of line and the 2 | are interpreted as actual chars to look for
Lastly your start and end markers are bracketed () which means they will be extracted and placed in matches, which for you using C# means you can get at them by using:
myRegex.Matches.Group[1].Value
myRegex.Matches.Group[2].Value
1 being the beginning grouping, and 2 being the end.

Regex not working as expected

Whats wrong with this regular expression?
/^[a-zA-Z\d\s&#-\('"]{1,7}$/;
when I enter the following valid input, it fails:
a&'-#"2
Also check for 2 consecutive spaces within the input.

The dash needs to be either escaped (\-) or placed at the end of the character class, or it will signify a range (as in A-Z), not a literal dash:
/^[A-Z\d\s&#('"-]{1,7}$/i
would be a better regex.
N. B: [#-\(] would have matched #, $, %, &, ' or (.
To address the added requirement of not allowing two consecutive spaces, use a lookahead assertion:
/^(?!.*\s{2})[A-Z\d\s&#('"-]{1,7}$/i
(?!.*\s{2}) means "Assert that it's impossible to match (from the current position) any string followed by two whitespace characters". One caveat: The dot doesn't match newline characters.

The - (hyphen) has a special meaning inside a character class, used for specifying ranges. Did you mean to escape it?:
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/;
This RegExp matches your input.

You have an unescaped - in the middle of your character class. This means that you're actually searching for all characters between and including # and ( (which are #, $, %, &, ', and (). Either move it to the end or escape it with a backslash. Your regex should read:
/^[a-zA-Z\d\s&#\('"-]{1,7}$/
or
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/

remove the ; at the end and
^[a-zA-Z\d\s\&\#\-\(\'\"]+$

Your input does not match the regular expression. The problem here is the hyphen in you regexp. If you move it from its position after the '#' character to the start of the regex, like so:
/^[-a-zA-Z\d\s&#\('"]{1,7}$/;
everything is fine and dandy.
You can always use Rubular for checking your regular expressions. I use it on a regular (no pun intended) basis.

understanding regular expression for detecting string

I encountered this regular expression that detects string literal of Unicode characters in JavaScript.
'"'("\\x"[a-fA-F0-9]{2}|"\\u"[a-fA-F0-9]{4}|"\\"[^xu]|[^"\n\\])*'"'
but I couldn't understand the role and need of
"\\x"[a-fA-F0-9]{2}
"\\"[^xu]|[^"\n\\]
My guess about 1) is that it is detecting control characters.

"\\x"[a-fA-F0-9]{2}
This is a literal \x followed by two characters from the hex-digit group.
This matches the shorter-form character escapes for the code points 0–255, \x00–\xFF. These are valid in JavaScript string literals but they aren't in JSON, where you have to use \u0000–\u00FF instead.
"\\"[^xu]|[^"{esc}\n]
This matches one of:
backslash followed by one more character, except for x or u. The valid cases for \xNN and \uNNNN were picked up in the previous |-separated clauses, so what this does is avoid matching invalid syntax like \uqX.
anything else, except for the " or newline. It is probably also supposed to be excluding other escape characters, which I'm guessing is what {esc} means. That isn't part of the normal regex syntax, but it may be some extended syntax or templating over the top of regex. Otherwise, [^"{esc}\n] would mean just any character except ", {, e, s, c, } or newline, which would be wrong.
Notably, the last clause, that picks up ‘anything else’, doesn't exclude \ itself, so you can still have \uqX in your string and get a match even though that is invalid in both JSON and JavaScript.

Develop Reference

JavaScript is the programming language of the Web.