Backslash Discrepancy between Regex Constructor and Literals [duplicate] - javascript

This question already has an answer here:
javascript why double escape dot/character [closed]
(1 answer)
Closed 7 years ago.
The title sums it up. I came across an odd discrepancy in backslash escaping between regular expression literals and constructor functions with new RegExp(), and I was curious about what's behind it.
I was trying to escape a parenthesis ( inside a constructor, like so:
var search = new RegExp('/(', 'g');
var result = "(test)".match(search);
But this kept returning an error. The match worked fine inside a literal /\(/g;, but inside the constructor I ended up having to do something like this:
search = new RegExp('\\(', 'g');
Can someone please explain to me why an escaping backslash requires an escaping backslash itself in a constructor, but not a literal?

Because the backslash is a special character in both the context of a regexp, and the context of a string literal. You have to get past the string literal's special usage before the regexp parser can see it and apply its own special rules.

NOTE If pattern is a StringLiteral, the usual escape sequence substitutions are performed before the String is processed by RegExp. If pattern must contain an escape sequence to be recognised by RegExp, any backslash \ characters must be escaped within the StringLiteral to prevent them being removed when the contents of the StringLiteral are formed.
http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.4.1

Related

why does this js RegExp test return true? [duplicate]

The regex allows chars that are: alphanumeric, space, '-', '_', '&', '()' and '/'
this is the expression
[\s\/\)\(\w&-]
I have tested this in various online testers and know it works, I just can't get it to work correctly in code. I get sysntax errors with anything I try.. any suggestions?
var programProductRegex = new RegExp([\s\/\)\(\w&-]);
You can use the regular expression syntax:
var programProductRegex = /[\s\/\)\(\w&-]/;
You use forward slashes to delimit the regex pattern.
If you use the RegExp object constructor you need to pass in a string. Because backslashes are special escape characters inside JavaScript strings and they're also escape characters in regular expressions, you need to use two backslashes to do a regex escape inside a string. The equivalent code using a string would then be:
var programProductRegex = new RegExp("[\\s\\/\\)\\(\\w&-]");
All the backslashes that were in the original regular expression need to be escaped in the string to be correctly interpreted as backslashes.
Of course the first option is better. The constructor is helpful when you obtain a string from somewhere and want to make a regular expression out of it.
var programProductRegex = new RegExp(userInput);
If you are using a String and want to escape characters like (, you need to write \\( (meaning writing backslash, then the opening parenthesis => escaping it).
If you are using the RegExp object, you only need one backslash for each character (like \()
Enclose your regex with delimiters:
var programProductRegex = /[\s\/)(\w&-]/;

RegExp constructor escapes slash character but not dot

Say I have this:
console.log(new RegExp('.git'));
console.log(new RegExp('scripts/npm'));
the results are:
/.git/
/scripts\/npm/
my question is - why does it escape the slash in scripts/npm, but it does not escape the . in .git? What is the rhyme and reason to that?
Note, in this case, the regex strings are being passed from the command line, so I need to convert them to regex using RegExp.
An unescaped / denotes the beginning and end of a regular expression. When you pass in a string containing / into the constructor, of course that / is part of the regular expression, not a symbol denoting the beginning or end.
The . is something else entirely, and has nothing to RE delimiters, so it's left as-is.
Note that if you want the regular expression to match a literal dot (rather than any character), you need to double-escape it when using the constructor:
console.log(new RegExp('\\.git'));
When you write regex in JS you can initialize regex strings using two /. This is called regular expression literal initialization. More about it here.
For instance
let re = /(\w+)\s(\w+)/;
Now, for the question why does it append \ before /, it is simply due to the way RegExp processes the passed string literal. This prevents the passed string from becoming corrupt ensuring all passed characters are accounted.
Furthermore if you examine the object returned by the RegExp, we can see that the actual source attribute is set to scripts\\/npm. So, the first \ indicates literal significance of the second \. Whilst from regular expressions perspective \, it simply escapes the proceeding / to form regular expression literal notation.

Javascript regex ignoring backslash period requirement [duplicate]

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Closed 2 years ago.
For my regex:
^(http(s)?\://)?(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$
I am wondering as to why "john" passes? This regex is only supposed to pass for URLs
> var j = new RegExp("^(http(s)?\://)?(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$");
> j.test("john")
> true
There is a required \. within the regex line
You are constructing your regular expression by passing a string to the RegExp constructor function.
While \ has special meaning as an escape character in regular expressions, it also has a similar meaning in string literals.
The \\ in the string literal has been parsed into \ in the string and thus is treated as an escape character in the regular expression.
You need to provide an escaped \ and and escaped escape character.
So for a regular expression that matches a single \ you need:
var myRegEx = new RegExp("\\\\")
I suggest avoiding the constructor function and using a regex literal instead.
var myRegEx = /\\/;

How do I include an inline comment in a regular expression in JavaScript [duplicate]

This question already has answers here:
Commenting Regular Expressions
(7 answers)
Closed 3 years ago.
Inline comments works when a string passed to the RegExp constructor:
RegExp("foo"/*bar*/).test("foo")
but not with an expression. Is there any equivalent or alternative in JavaScript to emulate x-mode for the RegExp object?
Javascript supports neither the x modifier, nor inline comments (?#comment). See here.
I guess, the best you can do, is to use the RegExp constructor and write every line in e separate string and concatenate them (with comments between the strings):
RegExp(
"foo" + // match a foo
"bar" + // followed by a bar
"$" // at the end of the string
).test("somefoobar");
Other than using a zero-length sub-expression, it's not possible. Examples of "comments":
/[a-z](?!<-- Any letter)/
(?!..) is a negated look-ahead. It matches if the previous is not followed by the string within the parentheses. Since the thing between (?! and ) is a real regular (sub)expression, you cannot use arbitrary characters unless escaped with a backslash
An alternative is to use the positive look-ahead:
/[a-z](?=|<-- Any letter)/
This look-ahead will always match, because obviously the a-z is also followed by an empty string.

Javascript RegEx Not Working [duplicate]

This question already has answers here:
Why do regex constructors need to be double escaped?
(5 answers)
Closed 2 years ago.
I have the following javascript code:
function checkLegalYear() {
var val = "02/2010";
if (val != '') {
var regEx = new RegExp("^(0[1-9]|1[0-2])/\d{4}$", "g");
if (regEx.test(val)) {
//do something
}
else {
//do something
}
}
}
However, my regEx test always returns false for any value I pass (02/2010). Is there something wrong in my code? I've tried this code on various javascript editors online and it works fine.
Because you're creating your regular expression from a string, you have to double-up your backslashes:
var regEx = new RegExp("^(0[1-9]|1[0-2])/\\d{4}$", "g");
When you start with a string, you have to account for the fact that the regular expression will first be parsed as such — that is, as a JavaScript string constant. The syntax for string constants doesn't know anything about regular expressions, and it has its own uses for backslash characters. Thus by the time the parser is done with your regular expression strings, it will look a lot different than it does when you look at your source code. Your source string looks like
"^(0[1-9]|1[0-2])/\d{4}$"
but after the string parse it's
^(0[1-9]|1[0-2])/d{4}$
Note that \d is now just d.
By doubling the backslash characters, you're telling the string parser that you want single actual backslashes in the string value.
There's really no reason here not to use regular expression syntax instead:
var regEx = /^(0[1-9]|1[0-2])\/\d{4}$/g;
edit — I also notice that there's an embedded "/" character, which has to be quoted if you use regex syntax.

Categories

Resources