Why do these two JavaScript regular expression produce different results? [duplicate] - javascript

In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?

You are constructing the regular expression by passing a string to the RegExp constructor.
\ is an escape character in string literals.
The \ is consumed by the string literal parsing…
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s and not \s.
You need to escape the \ to express the \ as data instead of being an escape character itself.

Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).

The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".
Here's a live example to illustrate why "\s" is not enough:
alert("One backslash: \s\nDouble backslashes: \\s");
Note how an extra \ before \s changes the output.

As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp without having to double escape them: use the String.raw template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
console.log('\\'.length); // length 1: an escaped backslash
console.log(`\\`.length); // length 1: an escaped backslash
console.log(String.raw`\\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw to type only one backslash, when the pattern requires a backslash:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));
But there's a better option. Generally, there's not much good reason to use new RegExp unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw to keep the pattern readable:
const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));
Best to only use new RegExp when the pattern must be created on-the-fly, like in the following snippet:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));

\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .
EDIT: Even had to do it here, because \\ in my answer turned to \.

Related

`"...".test("...")` matches but `RegExp("...").test("...")` does not [duplicate]

In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
You are constructing the regular expression by passing a string to the RegExp constructor.
\ is an escape character in string literals.
The \ is consumed by the string literal parsing…
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s and not \s.
You need to escape the \ to express the \ as data instead of being an escape character itself.
Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).
The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".
Here's a live example to illustrate why "\s" is not enough:
alert("One backslash: \s\nDouble backslashes: \\s");
Note how an extra \ before \s changes the output.
As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp without having to double escape them: use the String.raw template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
console.log('\\'.length); // length 1: an escaped backslash
console.log(`\\`.length); // length 1: an escaped backslash
console.log(String.raw`\\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw to type only one backslash, when the pattern requires a backslash:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));
But there's a better option. Generally, there's not much good reason to use new RegExp unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw to keep the pattern readable:
const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));
Best to only use new RegExp when the pattern must be created on-the-fly, like in the following snippet:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));
\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .
EDIT: Even had to do it here, because \\ in my answer turned to \.

Javascript how to replace $index [duplicate]

In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
You are constructing the regular expression by passing a string to the RegExp constructor.
\ is an escape character in string literals.
The \ is consumed by the string literal parsing…
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s and not \s.
You need to escape the \ to express the \ as data instead of being an escape character itself.
Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).
The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".
Here's a live example to illustrate why "\s" is not enough:
alert("One backslash: \s\nDouble backslashes: \\s");
Note how an extra \ before \s changes the output.
As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp without having to double escape them: use the String.raw template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
console.log('\\'.length); // length 1: an escaped backslash
console.log(`\\`.length); // length 1: an escaped backslash
console.log(String.raw`\\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw to type only one backslash, when the pattern requires a backslash:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));
But there's a better option. Generally, there's not much good reason to use new RegExp unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw to keep the pattern readable:
const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));
Best to only use new RegExp when the pattern must be created on-the-fly, like in the following snippet:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));
\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .
EDIT: Even had to do it here, because \\ in my answer turned to \.

Regex shows match but doesn't capture [duplicate]

In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
You are constructing the regular expression by passing a string to the RegExp constructor.
\ is an escape character in string literals.
The \ is consumed by the string literal parsing…
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s and not \s.
You need to escape the \ to express the \ as data instead of being an escape character itself.
Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).
The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".
Here's a live example to illustrate why "\s" is not enough:
alert("One backslash: \s\nDouble backslashes: \\s");
Note how an extra \ before \s changes the output.
As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp without having to double escape them: use the String.raw template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
console.log('\\'.length); // length 1: an escaped backslash
console.log(`\\`.length); // length 1: an escaped backslash
console.log(String.raw`\\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw to type only one backslash, when the pattern requires a backslash:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));
But there's a better option. Generally, there's not much good reason to use new RegExp unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw to keep the pattern readable:
const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));
Best to only use new RegExp when the pattern must be created on-the-fly, like in the following snippet:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));
\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .
EDIT: Even had to do it here, because \\ in my answer turned to \.

Javascript RegExp not escaping parenthesis properly. Why? [duplicate]

In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
You are constructing the regular expression by passing a string to the RegExp constructor.
\ is an escape character in string literals.
The \ is consumed by the string literal parsing…
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s and not \s.
You need to escape the \ to express the \ as data instead of being an escape character itself.
Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).
The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".
Here's a live example to illustrate why "\s" is not enough:
alert("One backslash: \s\nDouble backslashes: \\s");
Note how an extra \ before \s changes the output.
As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp without having to double escape them: use the String.raw template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
console.log('\\'.length); // length 1: an escaped backslash
console.log(`\\`.length); // length 1: an escaped backslash
console.log(String.raw`\\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw to type only one backslash, when the pattern requires a backslash:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));
But there's a better option. Generally, there's not much good reason to use new RegExp unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw to keep the pattern readable:
const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));
Best to only use new RegExp when the pattern must be created on-the-fly, like in the following snippet:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));
\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .
EDIT: Even had to do it here, because \\ in my answer turned to \.

JavaScript search and RegExp not working for me [duplicate]

In the regex below, \s denotes a space character. I imagine the regex parser, is going through the string and sees \ and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
You are constructing the regular expression by passing a string to the RegExp constructor.
\ is an escape character in string literals.
The \ is consumed by the string literal parsing…
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s and not \s.
You need to escape the \ to express the \ as data instead of being an escape character itself.
Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).
The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".
Here's a live example to illustrate why "\s" is not enough:
alert("One backslash: \s\nDouble backslashes: \\s");
Note how an extra \ before \s changes the output.
As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp without having to double escape them: use the String.raw template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
console.log('\\'.length); // length 1: an escaped backslash
console.log(`\\`.length); // length 1: an escaped backslash
console.log(String.raw`\\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw to type only one backslash, when the pattern requires a backslash:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));
But there's a better option. Generally, there's not much good reason to use new RegExp unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw to keep the pattern readable:
const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));
Best to only use new RegExp when the pattern must be created on-the-fly, like in the following snippet:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));
\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .
EDIT: Even had to do it here, because \\ in my answer turned to \.

Categories

Resources