Why are regular expression strings not encapsulated in quotes in Javascript? - javascript

Aside from Javascript, all instances of regular expressions use something like (for finding a number in brackets) "\\[[0-9]+\\]" or r"\[[0-9]+\]". That string is then used in a function like Contains("\\[[0-9]+\\]", "[1009] is a number."). Regex strings in Javascripts are not encapsulated at all, so I see things like var patt = /w3schools/i. Why is this? How does Javascript tell the difference between this and other content? Why not just use normal strings?

Why is this?
That's just how regex literals work. Regular expressions are objects in JS, not plain strings.
How does Javascript tell the difference between this and other content?
That's just how the language grammar is defined. In fact it makes it much easier to tell the difference between a string and a regex than in other languages.
Why not just use normal strings?
Because escaping works different. Other languages use "raw" strings for this, which JavaScript doesn't (didn't) have. Instead, they introduced a literal notation for regular expressions - using / as a delimiter (borrowed from Perl).
Of course, you still can use normal strings, and create a regex object using the RegExp constructor, but for static expressions the literal syntax is much simpler.

Well, they are not strings to begin with. The are regex literals.
How does Javascript tell the difference between this and other content?
Just like the " are used to delimit string literals, or [...] are used to delimit array literals, / are used to delimit regular expression literals.
Why not just use normal strings?
Regular expression have different special characters and different escaping rules. That's why you have to use double escapes if you use a string with RegExp (e.g. "\\[[0-9]+\\]"). Many people get that wrong and it's a bit confusing.
So it makes sense to have a representation of regular expression that is not "inside" of another abstraction (strings).

Regular expressions in JavaScript are objects not strings.
var regex = /[0-9]/;
console.log(typeof regex); // "objec"
Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec and test methods of RegExp, and with the match, replace, search, and split methods of String. This chapter describes JavaScript regular expressions.
Regular Expressions
The opening and closing / are not part of the expression they are just marking a regex literal just like {} is marking an object literal.

Related

Passing a variable to javascript regex [duplicate]

I am studying about RegExp but everywhere I can see two syntax
new RegExp("[abc]")
And
/[abc]/
And if with modifiers then what is the use of additional backslash (\)
/\[abc]/g
I am not getting any bug with these two but I wonder is there any difference between these two. If yes then what is it and which is best to use?
I referred Differences between Javascript regexp literal and constructor but there I didn't find an explanation of which is best and what the difference is.
The key difference is that literal REGEX can't accept dynamic input, i.e. from variables, whereas the constructor can, because the pattern is specified as a string.
Say you wanted to match one or more words from an array in a string:
var words = ['foo', 'bar', 'orange', 'platypus'];
var str = "I say, foo, what a lovely platypus!";
str.match(new RegExp('\\b('+words.join('|')+')\\b', 'g')); //["foo", "platypus"]
This would not be possible with a literal /pattern/, as anything between the two forward slashes is interpreted literally; we'd have to specify the allowed words in the pattern itself, rather than reading them in from a dynamic source (the array).
Note also the need to double-escape (i.e. \\) special characters when specifying patterns in this way, because we're doing so in a string - the first backslash must be escaped by the second so one of them makes it into the pattern. If there were only one, it would be interpreted by JS's string parser as an escaping character, and removed.
As you can see, the RegExp constructor syntax requires string to be passed. \ in the string is used to escape the following character. Thus,
new RegExp("\s") // This gives the regex `/s/` since s is escaped.
will produce the regex s.
Note: to add modifiers/flags, pass the flags as second parameter to the constructor function.
While, /\s/ - the literal syntax, will produce the regex which is predictable.
The RegExp constructor syntax allows to create regular expression from the dynamically.
So, when the regex need to be crafted dynamically, use RegExp constructor syntax otherwise use regex literal syntax.
They are kind of the same but "Regular expression literals should be used when possible" because it is easier to read and does not require escaping like a string literal does.
Escaping example:
new RegExp("\\d+");
/\d+/;
Using the RegExp constructor is suitable when the pattern is computed dynamically, e.g. when it is provided by the user.
Source SonarLint Rule.
There are 2 ways of defining regular expressions.
Through an object constructor
Can be changed at runtime.
Through a literal.
Compiled at load of the script
Better performance
The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input.
You could use any of the two and they will be handled in exactly the same way..

JavaScript: RegExp constructor vs RegEx literal

I am studying about RegExp but everywhere I can see two syntax
new RegExp("[abc]")
And
/[abc]/
And if with modifiers then what is the use of additional backslash (\)
/\[abc]/g
I am not getting any bug with these two but I wonder is there any difference between these two. If yes then what is it and which is best to use?
I referred Differences between Javascript regexp literal and constructor but there I didn't find an explanation of which is best and what the difference is.
The key difference is that literal REGEX can't accept dynamic input, i.e. from variables, whereas the constructor can, because the pattern is specified as a string.
Say you wanted to match one or more words from an array in a string:
var words = ['foo', 'bar', 'orange', 'platypus'];
var str = "I say, foo, what a lovely platypus!";
str.match(new RegExp('\\b('+words.join('|')+')\\b', 'g')); //["foo", "platypus"]
This would not be possible with a literal /pattern/, as anything between the two forward slashes is interpreted literally; we'd have to specify the allowed words in the pattern itself, rather than reading them in from a dynamic source (the array).
Note also the need to double-escape (i.e. \\) special characters when specifying patterns in this way, because we're doing so in a string - the first backslash must be escaped by the second so one of them makes it into the pattern. If there were only one, it would be interpreted by JS's string parser as an escaping character, and removed.
As you can see, the RegExp constructor syntax requires string to be passed. \ in the string is used to escape the following character. Thus,
new RegExp("\s") // This gives the regex `/s/` since s is escaped.
will produce the regex s.
Note: to add modifiers/flags, pass the flags as second parameter to the constructor function.
While, /\s/ - the literal syntax, will produce the regex which is predictable.
The RegExp constructor syntax allows to create regular expression from the dynamically.
So, when the regex need to be crafted dynamically, use RegExp constructor syntax otherwise use regex literal syntax.
They are kind of the same but "Regular expression literals should be used when possible" because it is easier to read and does not require escaping like a string literal does.
Escaping example:
new RegExp("\\d+");
/\d+/;
Using the RegExp constructor is suitable when the pattern is computed dynamically, e.g. when it is provided by the user.
Source SonarLint Rule.
There are 2 ways of defining regular expressions.
Through an object constructor
Can be changed at runtime.
Through a literal.
Compiled at load of the script
Better performance
The literal is the best to use with known regular expressions, while the constructor is better for dynamically constructed regular expressions such as those from user input.
You could use any of the two and they will be handled in exactly the same way..

What Regular Expression Can I Use To Find Simple Regular Expressions

If I have a string, which is the source of a regular expression:
"For example, I have (.*) string with (\.d+) special bits (but this is just an aside)."
Is there a way to extract the special parts of the regular expression?
In particular, I'm interested in the parts that will give back values when I call string.match(expr);
Regex can be complicated, but if you do a global regex with ([\.\\]([*a-z])\+?), it will capture your individual fields without including the parenthesis per your request. Demo code as put in this fiddle is below as well.
var testString = 'For example, I have (.*) string with (.d+) special bits (but this is just an aside). (\\w+)';
var regex = /([\.\\]([*a-z])\+?)/gi;
var matches_array = testString.match(regex);
//Outputs the following: [".*", ".d+", "\w+"]
Regular expressions are not powerful enough to recognize the language of matching parentheses. (The formal proof uses the equivalence of regular expressions and finite state machines and the fact that there are infinitely many levels of nesting possible.) Thus, matching the first ) after each ( would make (\d+(\.d+)?) return (\d+(\.d+) and matching the last ) after each ( would make (\w+) (\w+) match the entire string.
The correct way to do this is with recursion (which mathematical regular expressions do not allow, but actual implementations such as PCRE do). You can also get a simple expression for non-nested parentheses. Just be careful to parse escape characters: to be fully robust, \( and \\\( are special, but \\( is not.

Regex To Sort A String Containing Digits

I have a string which contains digits. I need to sort this string using regular expression.
var myString = "85762034834126745305743";
I'm looking for a complete solution which only use regular expression. Just need your thought on this whether it can be achieved or not.
Regular expressions are not suited for this kind of task. Plain old JavaScript is a lot simpler and easier:
"85762034834126745305743".split("").sort().join("") // "00122333344445556677788"

Quoting regex literals in javascript? Why not?

In this answer to a question, and lots of other places, I see unquoted strings in javascript.
For example:
var re = /\[media id="?(\d+)"?\]/gi;
Why shouldn't it instead be:
var re = '/\[media id="?(\d+)"?\]/gi';
Is it some kind of special handling of regular expressions, or can any string be declared like that?
var re = /\[media id="?(\d+)"?\]/gi;
is regex literal, not a string.
it's only for regular expressions, not for strings.
Because, in JavaScript, Regex is a built-in type, not a string-pattern that is passed to some parser like e.g. in C# or Java.
That means that when you write var regex = /pattern/, JavaScript automatically uses that literal as a regular expression pattern, making regex an object of the RegExp type.
See: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
Is it some kind of special handling of regular expressions?
Yes, regular expressions get special handling. As MDN points out, there is a built-in JavaScript regular expression type, with its own syntax for literals.
or can any string be declared like that?
No. Since regular expressions are objects and are not strings, if you tried to write a string with a regular expression literal you would get a regular expression object, not a string.

Categories

Resources