Differences between Javascript regexp literal and constructor - javascript

We recently had a bug, after a fellow developer changed a RegExp literal into a constructor call, and I was wondering why there is any difference at all.
The exact code was
var parts = new RegExp("/rt:([^#]+)#(\d+)/").exec(tag);
vs the original of
var parts = /rt:([^#]+)#(\d+)/.exec(tag);
When tag is, for example rt:60C1C036-42FA-4073-B10B-1969BD2358FB#00000000077, the first (buggy) call returns null, while the second one returns["rt:60C1C036-42FA-4073-B10B-1969BD2358FB#00000000077", "60C1C036-42FA-4073-B10B-1969BD2358FB", "00000000077"]
Needless to say, I reverted the change, but I'd like to know why is there such a difference in the first place.

There are two problems:
The / are not part of the expression. They are delimiters, marking a regex literal. They have to be removed if you use RegExp, otherwise they match a slash literally.
Secondly, the backslash is the escape character in string literals. To create a literal \ for the expression, you have to escape it in the string.
Thus, the equivalent would be:
new RegExp("rt:([^#]+)#(\\d+)")
Especially the escaping makes expression a bit more difficult to write if you want to use RegExp. It is actually only needed if you want to create expression dynamically, that is, if you want to include text stored in a variable for example. If you have a fixed expression, a literal /.../ is easier to write and more concise.

\d needs to be escaped when passesd to new RegExp constructor. So, it needs to be
var parts = new RegExp("rt:([^#]+)#(\\d+)").exec(tag);

Related

Passing a variable to javascript regex [duplicate]

I am studying about RegExp but everywhere I can see two syntax
new RegExp("[abc]")
And
/[abc]/
And if with modifiers then what is the use of additional backslash (\)
/\[abc]/g
I am not getting any bug with these two but I wonder is there any difference between these two. If yes then what is it and which is best to use?
I referred Differences between Javascript regexp literal and constructor but there I didn't find an explanation of which is best and what the difference is.
The key difference is that literal REGEX can't accept dynamic input, i.e. from variables, whereas the constructor can, because the pattern is specified as a string.
Say you wanted to match one or more words from an array in a string:
var words = ['foo', 'bar', 'orange', 'platypus'];
var str = "I say, foo, what a lovely platypus!";
str.match(new RegExp('\\b('+words.join('|')+')\\b', 'g')); //["foo", "platypus"]
This would not be possible with a literal /pattern/, as anything between the two forward slashes is interpreted literally; we'd have to specify the allowed words in the pattern itself, rather than reading them in from a dynamic source (the array).
Note also the need to double-escape (i.e. \\) special characters when specifying patterns in this way, because we're doing so in a string - the first backslash must be escaped by the second so one of them makes it into the pattern. If there were only one, it would be interpreted by JS's string parser as an escaping character, and removed.
As you can see, the RegExp constructor syntax requires string to be passed. \ in the string is used to escape the following character. Thus,
new RegExp("\s") // This gives the regex `/s/` since s is escaped.
will produce the regex s.
Note: to add modifiers/flags, pass the flags as second parameter to the constructor function.
While, /\s/ - the literal syntax, will produce the regex which is predictable.
The RegExp constructor syntax allows to create regular expression from the dynamically.
So, when the regex need to be crafted dynamically, use RegExp constructor syntax otherwise use regex literal syntax.
They are kind of the same but "Regular expression literals should be used when possible" because it is easier to read and does not require escaping like a string literal does.
Escaping example:
new RegExp("\\d+");
/\d+/;
Using the RegExp constructor is suitable when the pattern is computed dynamically, e.g. when it is provided by the user.
Source SonarLint Rule.
There are 2 ways of defining regular expressions.
Through an object constructor
Can be changed at runtime.
Through a literal.
Compiled at load of the script
Better performance
The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input.
You could use any of the two and they will be handled in exactly the same way..

JavaScript: RegExp constructor vs RegEx literal

I am studying about RegExp but everywhere I can see two syntax
new RegExp("[abc]")
And
/[abc]/
And if with modifiers then what is the use of additional backslash (\)
/\[abc]/g
I am not getting any bug with these two but I wonder is there any difference between these two. If yes then what is it and which is best to use?
I referred Differences between Javascript regexp literal and constructor but there I didn't find an explanation of which is best and what the difference is.
The key difference is that literal REGEX can't accept dynamic input, i.e. from variables, whereas the constructor can, because the pattern is specified as a string.
Say you wanted to match one or more words from an array in a string:
var words = ['foo', 'bar', 'orange', 'platypus'];
var str = "I say, foo, what a lovely platypus!";
str.match(new RegExp('\\b('+words.join('|')+')\\b', 'g')); //["foo", "platypus"]
This would not be possible with a literal /pattern/, as anything between the two forward slashes is interpreted literally; we'd have to specify the allowed words in the pattern itself, rather than reading them in from a dynamic source (the array).
Note also the need to double-escape (i.e. \\) special characters when specifying patterns in this way, because we're doing so in a string - the first backslash must be escaped by the second so one of them makes it into the pattern. If there were only one, it would be interpreted by JS's string parser as an escaping character, and removed.
As you can see, the RegExp constructor syntax requires string to be passed. \ in the string is used to escape the following character. Thus,
new RegExp("\s") // This gives the regex `/s/` since s is escaped.
will produce the regex s.
Note: to add modifiers/flags, pass the flags as second parameter to the constructor function.
While, /\s/ - the literal syntax, will produce the regex which is predictable.
The RegExp constructor syntax allows to create regular expression from the dynamically.
So, when the regex need to be crafted dynamically, use RegExp constructor syntax otherwise use regex literal syntax.
They are kind of the same but "Regular expression literals should be used when possible" because it is easier to read and does not require escaping like a string literal does.
Escaping example:
new RegExp("\\d+");
/\d+/;
Using the RegExp constructor is suitable when the pattern is computed dynamically, e.g. when it is provided by the user.
Source SonarLint Rule.
There are 2 ways of defining regular expressions.
Through an object constructor
Can be changed at runtime.
Through a literal.
Compiled at load of the script
Better performance
The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input.
You could use any of the two and they will be handled in exactly the same way..

Regex - Invalid Quantifier

I am using this regex in JavaScript which is to evaluate if a given string matches some german phone number patterns.
var reg = new RegExp("(?:\+\d+)?\s*(?:\(\d+\)\s*(?:[/–-]\s*)?)?\d+(?:\s*(?:[\s/–-]\s*)?\d+)*");
When using it I get this error:
SyntaxError: invalid quantifier
...eg = new RegExp("(?:\+\d+)\s*(?:\(\d+\)\s*(?:[/–-]\s*))\d+(?:\s*(?:[\s/–-]\s*)\d...
I'm trying hard to learn reading regular expressions, but can not understand them full yet. I did not write this expression by myself and I am struggling to understand it.
Why I am getting this error?
Because you're using a string litteral, you need to escape each backslash:
(?:\\+\\d+)?\\s*(?:\\(\\d+\\)\\s*(?:[/–-]\\s*)?)?\\d+(?:\\s*(?:[\\s/–-]\\s*)?\\d+)*
The other solution would be to use the regex litteral:
var reg = /(?:\+\d+)?\s*(?:\(\d+\)\s*(?:[\/–-]\s*)?)?\d+(?:\s*(?:[\s\/–-]\s*)?\d+)*/;
new RegExp requires a string, and since backslashes already have meaning inside strings, they need to be escaped again.
In your case, though, you're using a static pattern, so you'd be better off with a literal:
var reg = /(?:\+\d+)?\s*(?:\(\d+\)\s*(?:[\/–-]\s*)?)?\d+(?:\s*(?:[\s\/–-]\s*)?\d+)*/;
Just be aware that you need to escape / here ;)
As an additional tip, you can simplify the need to escape things to some extent by doing stuff like [+] for a literal +. I think it looks nicer then \+, but that's just my opinion.

Issue with custom javascript regex

I have a custom regular expression which I use to detect whole numbers, fractions and floats.
var regEx = new RegExp("^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(/[0-9])?)?)$");
var quantity = 'd';
var matched = quantity.match(regEx);
alert(matched);
​
(The code is also found here: http://jsfiddle.net/aNb3L/ .)
The problem is that for a single letter it matches, and I can't figure out why. But for more letters it fails(which is good).
Disclaimer: I am new to regular expressions, although in http://gskinner.com/RegExr/ it doesn't match a single letter
It's easier to use straight regular expression syntax:
var regEx = /^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(\/[0-9])?)?)$/;
When you use the RegExp constructor, you have to double-up on the backslashes. As it is, your code only has single backslashes, so the \. subexpressions are being treated as . — and that's how single non-digit characters are slipping through.
Thus yours would also work this way:
var regEx = new RegExp("^((^[1-9]|(0\\.)|(\\.))([0-9]+)?((\\s|\\.)[0-9]+(/[0-9])?)?)$");
This happens because the string syntax also uses backslash as a quoting mechanism. When your regular expression is first parsed as a string constant, those backslashes are stripped out if you don't double them. When the string is then passed to the regular expression parser, they're gone.
The only time you really need to use the RegExp constructor is when you're building up the regular expression dynamically or when it's delivered to your code via JSON or something.
Well, for a whole number this would be your regex:
/^(0|[1-9]\d*)$/
Then you have to account for the possibility of a float:
/^(0|[1-9]\d*)(.\d+)?$/
Then you have to account for the possibility of a fraction:
/^(0|[1-9]\d*)((.\d+)|(\/[1-9]\d*)?$/
To me this regex is much easier to read than your original, but it's up to you of course.

RegExp in JavaScript, when a quantifier is part of the pattern

I have been trying to use a regexp that matches any text that is between a caret, less than and a greater than, caret.
So it would look like: ^< THE TEXT I WANT SELECTED >^
I have tried something like this, but it isn't working: ^<(.*?)>^
I'm assuming this is possible, right? I think the reason I have been having such a tough time is because the caret serves as a quantifier. Thanks for any help I get!
Update
Just so everyone knows, they following from am not i am worked
/\^<(.*?)>\^/
But, it turned out that I was getting html entities since I was getting my string by using the .innerHTML property. In other words,
> ... >
< ... <
To solve this, my regexp actually looks like this:
\^<(.*?)((.|\n)*)>\^
This includes the fact that the string in between should be any character or new line. Thanks!
You need to escape the ^ symbol since it has special meaning in a JavaScript regex.
/\^<(.*?)>\^/
In a JavaScript regex, the ^ means beginning of the string, unless the m modifier was used, in which case it means beginning of the line.
This should work:
\^<(.*?)>\^
In a regex, if you want to use a character that has a special meaning (caret, brackets, pipe, ...), you have to escape it using a backslash. For example, (\w\b)*\w\. will select a sequence of words terminated by a dot.
Careful!
If you have to pass the regex pattern as a string, i.e. there's no regex literal like in javascript or perl, you may have to use a double backslash, which the programming language will escape to a single one, which will then be processed by the regex engine.
Same regex in multiple languages:
Python:
import re
myRegex=re.compile(r"\^<(.*?)>\^") # The r before the string prevents backslash escaping
PHP:
$result=preg_match("/\\^<(.*?)>\\^/",$subject); // Notice the double backslashes here?
JavaScript:
var myRegex=/\^<(.*?)>\^/,
subject="^<blah example>^";
subject.match(myRegex);
If you tell us what programming language you're writing in, we'll be able to give you some finished code to work with.
Edit: Whoops, didn't even notice this was tagged as javascript. Then, you don't have to worry about double backslash at all.
Edit 2: \b represent a word boundary. Though I agree yours is what I would have used myself.

Categories

Resources