Strange javascript regular expressions

Strange javascript regular expressions - javascript

I have found the following regular expression
new RegExp("(^|\\s)hello(\\s|$)");
I refer http://www.javascriptkit.com/jsref/escapesequence.shtml for regular expressions..
But i cannot see \s escape sequence there..I know \s indicate whitespace character...
But what does the preceding \ do ..Which character is escaped?
I found similar regular expression in the Treewalker code in the following document http://ejohn.org/blog/getelementsbyclassname-speed-comparison/

The double \\ is to escape the backslash inside the string. In other word, \\ will be interpreted as \ for the regular expression.

The extra \ in this case is to escape the \ in the \s. Because we are inside a string declaration, you have to double up the \ to escape it. Once the string is processed and saved, it is reduced down to (^|\s)hello(\s|$)

The character immediately following the first \ is escaped. Normally \s escapes the s to mean "whitespace". In your example, the character which is escaped is \.
What you have is an expression which builds a regex (presumably to pass elsewhere) of (^|\s)hello(\s|$) — the word "hello" preceded either by whitespace or the start of the string, and followed by whitespace or the end of the string.

Essentially what the reg ex is doing, is looking for the opening and closing items of text surrounding the word hello and literally interpreting the '\s' as string content at the same time.
In laymans terms it's looking for a string that exactly matches:
|\shello\s|
As others have said the double \ is to escape the single \ so that instead of the reg ex engine looking for white-space it actually looks for '\s' as a string.
The ^ means start of line, the $ means end of line and the 2 | are interpreted as actual chars to look for
Lastly your start and end markers are bracketed () which means they will be extracted and placed in matches, which for you using C# means you can get at them by using:
myRegex.Matches.Group[1].Value
myRegex.Matches.Group[2].Value
1 being the beginning grouping, and 2 being the end.

Related

Can Someone explain escaping a string in JavaScript

I know this line escapes a javascript string by adding \ before special characters. But can anyone explain how it does that? And also. Can it cause any problems in further string manipulations.
str = str.replace(/[-\\/\\^$*+?.()|[\]{}]/g, '\\$&');

The regular expression replaces anything it matches with backslash and the matched substring. It matches any single characters of the set between [ and ]. That is:
- simple dash
\\ single backslash (preceded by another backslash, because it server as escape character)
/ slash
\\ single backslash again (it is obsolete here)
^, $, *, +, ?, ., (, ), |, [ single characters
\] closing bracket is preceded by escape character, otherwise it would close set of characters to match
{, } single characters
The replacement string \\$& simply says:
\\ add single backslash
$& add the whole substring matched by the expression

What does `escape a string` mean in Regex? (Javascript)

I'm trying to understand the backslash and how to use escaping like: \ in regular expressions.
I've read that when using strings its named to escape a string.
But what does that actually mean?

Many characters in regular expressions have special meanings. For instance, the dot character '.' means "any one character". There are a great deal of these specially-defined characters, and sometimes, you want to search for one, not use its special meaning.
See this example to search for any filename that contains a '.':
/^[^.]+\..+/
In the example, there are 3 dots, but our description says that we're only looking for one. Let's break it down by the dots:
Dot #1 is used inside a "character class" (the characters inside the square brackets), which tells the regex engine to search for "any one character" that is not a '.', and the "+" says to keep going until there are no more characters or the next character is the '.' that we're looking for.
Dot #2 is preceded by a backslash, which says that we're looking for a literal '.' in the string (without the backslash, it would be using its special meaning, which is looking for "any one character"). This dot is said to be "escaped", because it's special meaning is not being used in this context - the backslash immediately before it made that happen.
Dot #3 is simply looking for "any one character" again, and the '+' following it says to keep doing that until it runs out of characters.
So, the backslash is used to "escape" the character immediately following it; as such, it's called the "escape character". That just means that the character's special meaning is taken away in that one place.
Now, escaping a string (in regex terms) means finding all of the characters with special meaning and putting a backslash in front of them, including in front of other backslash characters. When you've done this one time on the string, you have officially "escaped the string".

Say you try to print out a string, let's say "this\that".
That \ character is recognized as a special character. I'm not sure about regex, but say in Java or C, \t will tab the rest of the string over, so it would print as
this hat
But the \ "escapes" a character from the string, deriving it of regular meaning, so using "this\that" instead would result in
this\that
I hope this helped.

Quoting from MSDN:
The backslash (\) in a regular expression indicates one of the following:
The character that follows it is a special character, as shown in the table in the following section. For example, \b is an anchor that indicates that a regular expression match should begin on a word boundary, \t represents a tab, and \x020 represents a space.
A character that otherwise would be interpreted as an unescaped language construct should be interpreted literally. For example, a brace ({) begins the definition of a quantifier, but a backslash followed by a brace (\{) indicates that the regular expression engine should match the brace. Similarly, a single backslash marks the beginning of an escaped language construct, but two backslashes (\) indicate that the regular expression engine should match the backslash.

Regexp adds invisible dot character replacing \b

I want this to be my regex: /^word\b/ (word is dynamic)
When I set it up to be dynamic I have to use this:
var word='spoon';
'spoon .table .chair'.match(new RegExp('^'+word+'\b'));
However, this finds null, while this:
var word='spoon';
'spoon .table .chair'.match(/^spoon\b/);
finds ["spoon"].
The interesting part is when I examine the difference between the regex I worte and the regex RegExp wrote:
console.log(/^spoon\b/,new RegExp('^'+word+'\b'))
It shows this:
/^spoon\b/ /^spoon/
If I then copy the second part of the log output (/^spoon/) into my code editor I see this character:
What is that? How do I do RegExp word-ending-with as I am not always guaranteed to have a space at the end when the string might be a one-word string (spoon or another word)
I'd rather just do this without the invisible thing

You've got to escape the \ in the b in the regex string by adding an extra slash:
var regex = new RegExp('^' + word + '\\b')
This is because the RegExp is expecting to see the two characters \ and b, but the string '\b' is one character, ascii 8, the backspace character (in the same way that '\n' is a single newline character).

In Javascript, \b doesn't mean a \ followed by a b. It means the backspace character (ASCII code 8). To get a \ followed by a b, you need to escape the slash so that Javascript doesn't parse it as a backspace:
'^' + word + '\\b'
The same thing applies if you want to use \d or \s or anything else: You need to escape the \ with another one so that Javascript doesn't think it's a Javascript escape code and the RegExp can parse it as what you expect.

Match special characters including square braces

I want to have a regex for text field in ExtJs(maskRe) which matches all java code pattern
I've used
maskRe:/^[A-Za-z0-9 _=//~'"|{}();*:?+,.]*$/
I also want to include [,], but it seems /[, /], //[, //] is not working..
Any inputs please

The problem is you need to escape your forward slash. Change // to \/:
/^[A-Za-z0-9 _=\/~'"|{}();*:?+,.]*$/
However this regular expression does not match any Java code. Java code can contain almost any Unicode character. int møøse = 42; is valid Java.

To strip special characters from its magic powers you have to escape them, by putting backslash \ in front of character. I.e. to match [ you type \[.
And since backslash acts as special character as well, to match it literally, you escape it the same way: \\.
And since you used / as patter delimiter, you need to escape its occurrences within pattern:
/^[A-Za-z0-9 _=\/~'"|{}();*:?+,.]*$/

The way to escape regex meta-characters is using a backslash (\), not a forwards slash (/).
[,] should be \[,\]
// should be \/

javascript replace() function strange behaviour with regexp

Am i doing sth wrong or there is a problem with JS replace ?
<input type="text" id="a" value="(55) 55-55-55" />
document.write($("#a").val().replace(/()-/g,''));
prints (55) 555555
http://jsfiddle.net/Yb2yV/
how can i replace () and spaces too?

In a JavaScript regular expression, the ( and ) characters have special meaning. If you want to list them literally, put a backslash (\) in front of them.
If your goal is to get rid of all the (, ), -, and space characters, you could do it with a character class combined with an alternation (e.g., either-or) on \s, which stands for "whitespace":
document.write($("#a").val().replace(/[()\-]|\s/g,''));
(I didn't put backslashes in front of the () because you don't need to within a character class. I did put one in front of the - because within a character class, - has special meaning.)
Alternately, if you want to get rid of anything that isn't a digit, you can use \D:
document.write($("#a").val().replace(/\D/g,''));
\D means "not a digit" (note that it's a capital, \d in lower case is the opposite [any digit]).
More info on the MDN page on regular expressions.

You need to use a character class
/[-() ]/
Using "-" as the first character solves the ambiguity because a dash is normally used for ranges (e.g. [a-zA-Z0-9]).

document.write($("#a").val().replace(/[\s()-]/g,''));
That will remove all whitespace (\s), parens, and dashes

Use this
.replace(/\(|\)|-| /g,'')
You have to escape the parenthesis (i.e. \( instead of (). In your regexp, you want to list the four items: \(, \), '-' and (space) and as you want to replace any of them, not just a string of them four together, you have to use OR | between them.

May be very bad but a very basic approach would be,
document.write($("#a").val().replace(/(\()|(\))|-| |/g,''));
| means OR,
\ is used for escaping reserved symbols

You want to match any character in the set, so you should use square brackets to make a character set:
document.write($("#a").val().replace(/[()\- ]/g,''));
Normally, parentheses have a special meaning in regular expressions, so they were being ignored in your regex, leaving just the dash. Normally, to get literal parentheses, you need to escape them with \ (but in a square bracket block, as above, you don't).
The dash above is escaped because it has normally indicates range in a character set, e.g., [a-z].

The brackets indicate a capturing group in the regexp. You'd need to escape them (/\(\)-/) to match the sequence "()-". Yet I guess you want to use a character class, i.e. a expression that matches "(", ")" or "-"; for whitespaces include the \s shorthand:
value.replace(/[()-\s]/g, "");
You might want to read some documentation or tutorial.

Develop Reference

JavaScript is the programming language of the Web.