Why does new RegExp() string not require escaping a forward slash? - javascript

Reading this document https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
"When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary."
So why is there no difference between these two uses of RegExp()?
"a/b/c".match(new RegExp("\/", "g")) // (2) [ "/", "/" ]
"a/b/c".match(new RegExp("/", "g")) // (2) [ "/", "/" ]
Is the document incorrect or am I missing something?
The possible duplicate question indicates that forward slashes must be escaped as literals. Using a string, the example above shows this is not the case given the use case of a string with the constructor function. This question is specifically about using a string with the constructor function.
So based on the answer below, the difference appears to be with literals, both the " and the / need to be escaped, but when using a string only the " needs to be escaped.

Forward slashes are only special characters in that they designate the syntactical beginning and end of a literal regular expression in Javascript. But when you use the constructor, that's not needed, because the first parameter provided to new RegExp is the (entire) string to construct the regular expression from.

Related

Passing a variable to javascript regex [duplicate]

I am studying about RegExp but everywhere I can see two syntax
new RegExp("[abc]")
And
/[abc]/
And if with modifiers then what is the use of additional backslash (\)
/\[abc]/g
I am not getting any bug with these two but I wonder is there any difference between these two. If yes then what is it and which is best to use?
I referred Differences between Javascript regexp literal and constructor but there I didn't find an explanation of which is best and what the difference is.
The key difference is that literal REGEX can't accept dynamic input, i.e. from variables, whereas the constructor can, because the pattern is specified as a string.
Say you wanted to match one or more words from an array in a string:
var words = ['foo', 'bar', 'orange', 'platypus'];
var str = "I say, foo, what a lovely platypus!";
str.match(new RegExp('\\b('+words.join('|')+')\\b', 'g')); //["foo", "platypus"]
This would not be possible with a literal /pattern/, as anything between the two forward slashes is interpreted literally; we'd have to specify the allowed words in the pattern itself, rather than reading them in from a dynamic source (the array).
Note also the need to double-escape (i.e. \\) special characters when specifying patterns in this way, because we're doing so in a string - the first backslash must be escaped by the second so one of them makes it into the pattern. If there were only one, it would be interpreted by JS's string parser as an escaping character, and removed.
As you can see, the RegExp constructor syntax requires string to be passed. \ in the string is used to escape the following character. Thus,
new RegExp("\s") // This gives the regex `/s/` since s is escaped.
will produce the regex s.
Note: to add modifiers/flags, pass the flags as second parameter to the constructor function.
While, /\s/ - the literal syntax, will produce the regex which is predictable.
The RegExp constructor syntax allows to create regular expression from the dynamically.
So, when the regex need to be crafted dynamically, use RegExp constructor syntax otherwise use regex literal syntax.
They are kind of the same but "Regular expression literals should be used when possible" because it is easier to read and does not require escaping like a string literal does.
Escaping example:
new RegExp("\\d+");
/\d+/;
Using the RegExp constructor is suitable when the pattern is computed dynamically, e.g. when it is provided by the user.
Source SonarLint Rule.
There are 2 ways of defining regular expressions.
Through an object constructor
Can be changed at runtime.
Through a literal.
Compiled at load of the script
Better performance
The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input.
You could use any of the two and they will be handled in exactly the same way..

RegExp constructor escapes slash character but not dot

Say I have this:
console.log(new RegExp('.git'));
console.log(new RegExp('scripts/npm'));
the results are:
/.git/
/scripts\/npm/
my question is - why does it escape the slash in scripts/npm, but it does not escape the . in .git? What is the rhyme and reason to that?
Note, in this case, the regex strings are being passed from the command line, so I need to convert them to regex using RegExp.
An unescaped / denotes the beginning and end of a regular expression. When you pass in a string containing / into the constructor, of course that / is part of the regular expression, not a symbol denoting the beginning or end.
The . is something else entirely, and has nothing to RE delimiters, so it's left as-is.
Note that if you want the regular expression to match a literal dot (rather than any character), you need to double-escape it when using the constructor:
console.log(new RegExp('\\.git'));
When you write regex in JS you can initialize regex strings using two /. This is called regular expression literal initialization. More about it here.
For instance
let re = /(\w+)\s(\w+)/;
Now, for the question why does it append \ before /, it is simply due to the way RegExp processes the passed string literal. This prevents the passed string from becoming corrupt ensuring all passed characters are accounted.
Furthermore if you examine the object returned by the RegExp, we can see that the actual source attribute is set to scripts\\/npm. So, the first \ indicates literal significance of the second \. Whilst from regular expressions perspective \, it simply escapes the proceeding / to form regular expression literal notation.

JSHint "Bad or unnecessary escaping." Do double slashes beginning/end matter?

I'm storing some RegExps in an Object as strings, but getting the above error message.
I believe this is because they aren't prefixed with / or suffixed with / - as I'm running them into a new RegExp() constructor, as the script allows users to define RegExps, so I want them all to be dynamic.
var patterns = {
email: '^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$',
url: '[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)',
number: '^[-+]?[0-9]*\.?[0-9]+$',
empty: '^\\s*$'
};
There's the above strings.
To fix them I can do this and / / them:
var patterns = {
email: '/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/',
url: '/[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)/',
number: '/^[-+]?[0-9]*\.?[0-9]+$',
empty: '/^\\s*$/'
};
But when called via new RegExp() surely they'll do this (for example):
var reg = new RegExp(patterns.empty);
/**
* reg = //^\\s*$//
*/
With double slashes. My question as a bit of a RegExp beginner, is do these double slashes matter? Can it be fixed another way. JSHint is complaining because it's not a "real" RegExp.
I can also remove them from strings and store as true RegExps, but again I need them to be dynamic. Any help appreciated.
The problem is that the backslash character (\) is used both for escaping special characters in string literals (e.g. \n is interpreted as a single newline character, and \\ as a single backslash character), and it's used in escaping special characters in regular expressions.
So when a string literal is used for a regexp, and you need the regexp to see \, you need to escape the backslash and include \\ in the string literal. Specifically, in email you need \\. rather than \.. E.g.
email: '^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$'
Alternatively, you could put the regular expressions in /.../ rather than '...' (or '/.../'). Then string literal escaping doesn't apply, and you don't need to double the slashes. E.g.
email: /^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/
In the latter case, you also don't need new RegExp(patterns.email), since patterns.email is already a RegExp object.

Regex is not working as expected?

I have this regex :
new RegExp("^[a-z 0-9\"\-\`]+$", "ig")
and I'm testing a string which is not suppose to work : '#vc'
But it does pass the test : ( and it shouldn't (#))
new RegExp("^[a-z 0-9\"\-\`]+$", "ig").test('#vc') //true
But if I remove either \" or \- or \`, it does work ( and test fails as it should).
What am I doing wrong ?
My regex simply search for English , numbers , space , ["],[-][ and [`]
If you use RegExp constructor, you need to double up the escaping, since there are 2 layers of escaping: escaping in JavaScript string literal and escaping in the regex syntax.
However, in your case, it is possible to write a regex without escaping at all.
In your case, you can just use literal RegExp, in which you only have to care about escaping in regex syntax, and escaping for any / that appears in the regex (since / is used as delimiter for literal RegExp:
/^[a-z 0-9"\-`]+$/gi
Another way is:
/^[a-z 0-9"`-]+$/gi
You don't need to escape dash - if it is the last in a character class. This way, you don't need to confuse yourself with all the escaping.
Or if you still want to use RegExp constructor, you need to double up the escape to specify \ in the string:
new RegExp('^[a-z 0-9"\\-`]+$', "ig")
Or just use the other version where - is specified last in the character class:
new RegExp('^[a-z 0-9"`-]+$', "ig")
Note that I change the string quote from " to ' to avoid having to escape " in the string. If you for some reason prefers ", escape " at the literal string level:
new RegExp("^[a-z 0-9\"`-]+$", "ig")
As for your current regex
new RegExp("^[a-z 0-9\"\-\`]+$", "ig")
is equivalent to
/^[a-z 0-9"-`]+$/gi
As you can see a character range from " to ` is included, which means all characters with ASCII code from 0x22 to 0x60 are included, and # happens to be in the range.
To check whether the pattern is what you want, you can always call source property of the regex to obtain the source string of the regex.

Building regexp from JS variables not working

I am trying to build a regexp from static text plus a variable in javascript. Obviously I am missing something very basic, see comments in code below. Help is very much appreciated:
var test_string = "goodweather";
// One regexp we just set:
var regexp1 = /goodweather/;
// The other regexp we built from a variable + static text:
var regexp_part = "good";
var regexp2 = "\/" + regexp_part + "weather\/";
// These alerts now show the 2 regexp are completely identical:
alert (regexp1);
alert (regexp2);
// But one works, the other doesn't ??
if (test_string.match(regexp1))
alert ("This is displayed.");
if (test_string.match(regexp2))
alert ("This is not displayed.");
First, the answer to the question:
The other answers are nearly correct, but fail to consider what happens when the text to be matched contains a literal backslash, (i.e. when: regexp_part contains a literal backslash). For example, what happens when regexp_part equals: "C:\Windows"? In this case the suggested methods do not work as expected (The resulting regex becomes: /C:\Windows/ where the \W is erroneously interpreted as a non-word character class). The correct solution is to first escape any backslashes in regexp_part (the needed regex is actually: /C:\\Windows/).
To illustrate the correct way of handling this, here is a function which takes a passed phrase and creates a regex with the phrase wrapped in \b word boundaries:
// Given a phrase, create a RegExp object with word boundaries.
function makeRegExp(phrase) {
// First escape any backslashes in the phrase string.
// i.e. replace each backslash with two backslashes.
phrase = phrase.replace(/\\/g, "\\\\");
// Wrap the escaped phrase with \b word boundaries.
var re_str = "\\b"+ phrase +"\\b";
// Create a new regex object with "g" and "i" flags set.
var re = new RegExp(re_str, "gi");
return re;
}
// Here is a condensed version of same function.
function makeRegExpShort(phrase) {
return new RegExp("\\b"+ phrase.replace(/\\/g, "\\\\") +"\\b", "gi");
}
To understand this in more depth, follows is a discussion...
In-depth discussion, or "What's up with all these backslashes!?"
JavaScript has two ways to create a RegExp object:
/pattern/flags - You can specify a RegExp Literal expression directly, where the pattern is delimited using a pair of forward slashes followed by any combination of the three pattern modifier flags: i.e. 'g' global, 'i' ignore-case, or 'm' multi-line. This type of regex cannot be created dynamically.
new RegExp("pattern", "flags") - You can create a RegExp object by calling the RegExp() constructor function and pass the pattern as a string (without forward slash delimiters) as the first parameter and the optional pattern modifier flags (also as a string) as the second (optional) parameter. This type of regex can be created dynamically.
The following example demonstrates creating a simple RegExp object using both of these two methods. Lets say we wish to match the word "apple". The regex pattern we need is simply: apple. Additionally, we wish to set all three modifier flags.
Example 1: Simple pattern having no special characters: apple
// A RegExp literal to match "apple" with all three flags set:
var re1 = /apple/gim;
// Create the same object using RegExp() constructor:
var re2 = new RegExp("apple", "gim");
Simple enough. However, there are significant differences between these two methods with regard to the handling of escaped characters. The regex literal syntax is quite handy because you only need to escape forward slashes - all other characters are passed directly to the regex engine unaltered. However, when using the RegExp constructor method, you pass the pattern as a string, and there are two levels of escaping to be considered; first is the interpretation of the string and the second is the interpretation of the regex engine. Several examples will illustrate these differences.
First lets consider a pattern which contains a single literal forward slash. Let's say we wish to match the text sequence: "and/or" in a case-insensitive manner. The needed pattern is: and/or.
Example 2: Pattern having one forward slash: and/or
// A RegExp literal to match "and/or":
var re3 = /and\/or/i;
// Create the same object using RegExp() :
var re4 = new RegExp("and/or", "i");
Note that with the regex literal syntax, the forward slash must be escaped (preceded with a single backslash) because with a regex literal, the forward slash has special meaning (it is a special metacharacter which is used to delimit the pattern). On the other hand, with the RegExp constructor syntax (which uses a string to store the pattern), the forward slash does NOT have any special meaning and does NOT need to be escaped.
Next lets consider a pattern which includes a special: \b word boundary regex metasequence. Say we wish to create a regex to match the word "apple" as a whole word only (so that it won't match "pineapple"). The pattern (as seen by the regex engine) needs to be: \bapple\b:
Example 3: Pattern having \b word boundaries: \bapple\b
// A RegExp literal to match the whole word "apple":
var re5 = /\bapple\b/;
// Create the same object using RegExp() constructor:
var re6 = new RegExp("\\bapple\\b");
In this case the backslash must be escaped when using the RegExp constructor method, because the pattern is stored in a string, and to get a literal backslash into a string, it must be escaped with another backslash. However, with a regex literal, there is no need to escape the backslash. (Remember that with a regex literal, the only special metacharacter is the forward slash.)
Backslash SOUP!
Things get even more interesting when we need to match a literal backslash. Let's say we want to match the text sequence: "C:\Program Files\JGsoft\RegexBuddy3\RegexBuddy.exe". The pattern to be processed by the regex engine needs to be: C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe. (Note that the regex pattern to match a single backslash is \\ i.e. each must be escaped.) Here is how you create the needed RegExp object using the two JavaScript syntaxes
Example 4: Pattern to match literal back slashes:
// A RegExp literal to match the ultimate Windows regex debugger app:
var re7 = /C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe/;
// Create the same object using RegExp() constructor:
var re8 = new RegExp(
"C:\\\\Program Files\\\\JGsoft\\\\RegexBuddy3\\\\RegexBuddy\\.exe");
This is why the /regex literal/ syntax is generally preferred over the new RegExp("pattern", "flags") method - it completely avoids the backslash soup that can frequently arise. However, when you need to dynamically create a regex, as the OP needs to here, you are forced to use the new RegExp() syntax and deal with the backslash soup. (Its really not that bad once you get your head wrapped 'round it.)
RegexBuddy to the rescue!
RegexBuddy is a Windows app that can help with this backslash soup problem - it understands the regex syntaxes and escaping requirements of many languages and will automatically add and remove backslashes as required when pasting to and from the application. Inside the application you compose and debug the regex in native regex format. Once the regex works correctly, you export it using one of the many "copy as..." options to get the needed syntax. Very handy!
You should use the RegExp constructor to accomplish this:
var regexp2 = new RegExp(regexp_part + "weather");
Here's a related question that might help.
The forward slashes are just Javascript syntax to enclose regular expresions in. If you use normal string as regex, you shouldn't include them as they will be matched against. Therefore you should just build the regex like that:
var regexp2 = regexp_part + "weather";
I would use :
var regexp2 = new RegExp(regexp_part+"weather");
Like you have done that does :
var regexp2 = "/goodweather/";
And after there is :
test_string.match("/goodweather/")
Wich use match with a string and not with the regex like you wanted :
test_string.match(/goodweather/)
While this solution may be overkill for this specific question, if you want to build RegExps programmatically, compose-regexp can come in handy.
This specific problem would be solved by using
import {sequence} from 'compose-regexp'
const weatherify = x => sequence(x, /weather/)
Strings are escaped, so
weatherify('.')
returns
/\.weather/
But it can also accept RegExps
weatherify(/./u)
returns
/.weather/u
compose-regexp supports the whole range of RegExps features, and let one build RegExps from sub-parts, which helps with code reuse and testability.

Categories

Resources