JavaScript: RegExp constructor vs RegEx literal

JavaScript: RegExp constructor vs RegEx literal - javascript

I am studying about RegExp but everywhere I can see two syntax
new RegExp("[abc]")
And
/[abc]/
And if with modifiers then what is the use of additional backslash (\)
/\[abc]/g
I am not getting any bug with these two but I wonder is there any difference between these two. If yes then what is it and which is best to use?
I referred Differences between Javascript regexp literal and constructor but there I didn't find an explanation of which is best and what the difference is.

The key difference is that literal REGEX can't accept dynamic input, i.e. from variables, whereas the constructor can, because the pattern is specified as a string.
Say you wanted to match one or more words from an array in a string:
var words = ['foo', 'bar', 'orange', 'platypus'];
var str = "I say, foo, what a lovely platypus!";
str.match(new RegExp('\\b('+words.join('|')+')\\b', 'g')); //["foo", "platypus"]
This would not be possible with a literal /pattern/, as anything between the two forward slashes is interpreted literally; we'd have to specify the allowed words in the pattern itself, rather than reading them in from a dynamic source (the array).
Note also the need to double-escape (i.e. \\) special characters when specifying patterns in this way, because we're doing so in a string - the first backslash must be escaped by the second so one of them makes it into the pattern. If there were only one, it would be interpreted by JS's string parser as an escaping character, and removed.

As you can see, the RegExp constructor syntax requires string to be passed. \ in the string is used to escape the following character. Thus,
new RegExp("\s") // This gives the regex `/s/` since s is escaped.
will produce the regex s.
Note: to add modifiers/flags, pass the flags as second parameter to the constructor function.
While, /\s/ - the literal syntax, will produce the regex which is predictable.
The RegExp constructor syntax allows to create regular expression from the dynamically.
So, when the regex need to be crafted dynamically, use RegExp constructor syntax otherwise use regex literal syntax.

They are kind of the same but "Regular expression literals should be used when possible" because it is easier to read and does not require escaping like a string literal does.
Escaping example:
new RegExp("\\d+");
/\d+/;
Using the RegExp constructor is suitable when the pattern is computed dynamically, e.g. when it is provided by the user.
Source SonarLint Rule.

There are 2 ways of defining regular expressions.
Through an object constructor
Can be changed at runtime.
Through a literal.
Compiled at load of the script
Better performance
The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input.
You could use any of the two and they will be handled in exactly the same way..

Related

Passing a variable to javascript regex [duplicate]

I am studying about RegExp but everywhere I can see two syntax
new RegExp("[abc]")
And
/[abc]/
And if with modifiers then what is the use of additional backslash (\)
/\[abc]/g
I am not getting any bug with these two but I wonder is there any difference between these two. If yes then what is it and which is best to use?
I referred Differences between Javascript regexp literal and constructor but there I didn't find an explanation of which is best and what the difference is.

The key difference is that literal REGEX can't accept dynamic input, i.e. from variables, whereas the constructor can, because the pattern is specified as a string.
Say you wanted to match one or more words from an array in a string:
var words = ['foo', 'bar', 'orange', 'platypus'];
var str = "I say, foo, what a lovely platypus!";
str.match(new RegExp('\\b('+words.join('|')+')\\b', 'g')); //["foo", "platypus"]
This would not be possible with a literal /pattern/, as anything between the two forward slashes is interpreted literally; we'd have to specify the allowed words in the pattern itself, rather than reading them in from a dynamic source (the array).
Note also the need to double-escape (i.e. \\) special characters when specifying patterns in this way, because we're doing so in a string - the first backslash must be escaped by the second so one of them makes it into the pattern. If there were only one, it would be interpreted by JS's string parser as an escaping character, and removed.

As you can see, the RegExp constructor syntax requires string to be passed. \ in the string is used to escape the following character. Thus,
new RegExp("\s") // This gives the regex `/s/` since s is escaped.
will produce the regex s.
Note: to add modifiers/flags, pass the flags as second parameter to the constructor function.
While, /\s/ - the literal syntax, will produce the regex which is predictable.
The RegExp constructor syntax allows to create regular expression from the dynamically.
So, when the regex need to be crafted dynamically, use RegExp constructor syntax otherwise use regex literal syntax.

They are kind of the same but "Regular expression literals should be used when possible" because it is easier to read and does not require escaping like a string literal does.
Escaping example:
new RegExp("\\d+");
/\d+/;
Using the RegExp constructor is suitable when the pattern is computed dynamically, e.g. when it is provided by the user.
Source SonarLint Rule.

There are 2 ways of defining regular expressions.
Through an object constructor
Can be changed at runtime.
Through a literal.
Compiled at load of the script
Better performance
The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input.
You could use any of the two and they will be handled in exactly the same way..

Why are regular expression strings not encapsulated in quotes in Javascript?

Aside from Javascript, all instances of regular expressions use something like (for finding a number in brackets) "\\[[0-9]+\\]" or r"\[[0-9]+\]". That string is then used in a function like Contains("\\[[0-9]+\\]", "[1009] is a number."). Regex strings in Javascripts are not encapsulated at all, so I see things like var patt = /w3schools/i. Why is this? How does Javascript tell the difference between this and other content? Why not just use normal strings?

Why is this?
That's just how regex literals work. Regular expressions are objects in JS, not plain strings.
How does Javascript tell the difference between this and other content?
That's just how the language grammar is defined. In fact it makes it much easier to tell the difference between a string and a regex than in other languages.
Why not just use normal strings?
Because escaping works different. Other languages use "raw" strings for this, which JavaScript doesn't (didn't) have. Instead, they introduced a literal notation for regular expressions - using / as a delimiter (borrowed from Perl).
Of course, you still can use normal strings, and create a regex object using the RegExp constructor, but for static expressions the literal syntax is much simpler.

Well, they are not strings to begin with. The are regex literals.
How does Javascript tell the difference between this and other content?
Just like the " are used to delimit string literals, or [...] are used to delimit array literals, / are used to delimit regular expression literals.
Why not just use normal strings?
Regular expression have different special characters and different escaping rules. That's why you have to use double escapes if you use a string with RegExp (e.g. "\\[[0-9]+\\]"). Many people get that wrong and it's a bit confusing.
So it makes sense to have a representation of regular expression that is not "inside" of another abstraction (strings).

Regular expressions in JavaScript are objects not strings.
var regex = /[0-9]/;
console.log(typeof regex); // "objec"
Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec and test methods of RegExp, and with the match, replace, search, and split methods of String. This chapter describes JavaScript regular expressions.
Regular Expressions
The opening and closing / are not part of the expression they are just marking a regex literal just like {} is marking an object literal.

Issue with custom javascript regex

I have a custom regular expression which I use to detect whole numbers, fractions and floats.
var regEx = new RegExp("^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(/[0-9])?)?)$");
var quantity = 'd';
var matched = quantity.match(regEx);
alert(matched);

(The code is also found here: http://jsfiddle.net/aNb3L/ .)
The problem is that for a single letter it matches, and I can't figure out why. But for more letters it fails(which is good).
Disclaimer: I am new to regular expressions, although in http://gskinner.com/RegExr/ it doesn't match a single letter

It's easier to use straight regular expression syntax:
var regEx = /^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(\/[0-9])?)?)$/;
When you use the RegExp constructor, you have to double-up on the backslashes. As it is, your code only has single backslashes, so the \. subexpressions are being treated as . — and that's how single non-digit characters are slipping through.
Thus yours would also work this way:
var regEx = new RegExp("^((^[1-9]|(0\\.)|(\\.))([0-9]+)?((\\s|\\.)[0-9]+(/[0-9])?)?)$");
This happens because the string syntax also uses backslash as a quoting mechanism. When your regular expression is first parsed as a string constant, those backslashes are stripped out if you don't double them. When the string is then passed to the regular expression parser, they're gone.
The only time you really need to use the RegExp constructor is when you're building up the regular expression dynamically or when it's delivered to your code via JSON or something.

Well, for a whole number this would be your regex:
/^(0|[1-9]\d*)$/
Then you have to account for the possibility of a float:
/^(0|[1-9]\d*)(.\d+)?$/
Then you have to account for the possibility of a fraction:
/^(0|[1-9]\d*)((.\d+)|(\/[1-9]\d*)?$/
To me this regex is much easier to read than your original, but it's up to you of course.

Quoting regex literals in javascript? Why not?

In this answer to a question, and lots of other places, I see unquoted strings in javascript.
For example:
var re = /\[media id="?(\d+)"?\]/gi;
Why shouldn't it instead be:
var re = '/\[media id="?(\d+)"?\]/gi';
Is it some kind of special handling of regular expressions, or can any string be declared like that?

var re = /\[media id="?(\d+)"?\]/gi;
is regex literal, not a string.

it's only for regular expressions, not for strings.

Because, in JavaScript, Regex is a built-in type, not a string-pattern that is passed to some parser like e.g. in C# or Java.
That means that when you write var regex = /pattern/, JavaScript automatically uses that literal as a regular expression pattern, making regex an object of the RegExp type.
See: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions

Is it some kind of special handling of regular expressions?
Yes, regular expressions get special handling. As MDN points out, there is a built-in JavaScript regular expression type, with its own syntax for literals.
or can any string be declared like that?
No. Since regular expressions are objects and are not strings, if you tried to write a string with a regular expression literal you would get a regular expression object, not a string.

Differences between Javascript regexp literal and constructor

We recently had a bug, after a fellow developer changed a RegExp literal into a constructor call, and I was wondering why there is any difference at all.
The exact code was
var parts = new RegExp("/rt:([^#]+)#(\d+)/").exec(tag);
vs the original of
var parts = /rt:([^#]+)#(\d+)/.exec(tag);
When tag is, for example rt:60C1C036-42FA-4073-B10B-1969BD2358FB#00000000077, the first (buggy) call returns null, while the second one returns["rt:60C1C036-42FA-4073-B10B-1969BD2358FB#00000000077", "60C1C036-42FA-4073-B10B-1969BD2358FB", "00000000077"]
Needless to say, I reverted the change, but I'd like to know why is there such a difference in the first place.

There are two problems:
The / are not part of the expression. They are delimiters, marking a regex literal. They have to be removed if you use RegExp, otherwise they match a slash literally.
Secondly, the backslash is the escape character in string literals. To create a literal \ for the expression, you have to escape it in the string.
Thus, the equivalent would be:
new RegExp("rt:([^#]+)#(\\d+)")
Especially the escaping makes expression a bit more difficult to write if you want to use RegExp. It is actually only needed if you want to create expression dynamically, that is, if you want to include text stored in a variable for example. If you have a fixed expression, a literal /.../ is easier to write and more concise.

\d needs to be escaped when passesd to new RegExp constructor. So, it needs to be
var parts = new RegExp("rt:([^#]+)#(\\d+)").exec(tag);

Develop Reference

JavaScript is the programming language of the Web.

JavaScript: RegExp constructor vs RegEx literal - javascript

Related

Passing a variable to javascript regex [duplicate]

Why are regular expression strings not encapsulated in quotes in Javascript?

Issue with custom javascript regex

Quoting regex literals in javascript? Why not?

Differences between Javascript regexp literal and constructor

Categories

Resources