Why 'ABC'.replace('B', '$`') gives AAC - javascript

Why this code prints AAC instead of expected A$`C?
console.log('ABC'.replace('B', '$`'));
==>
AAC
And how to make it give the expected result?

To insert a literal $ you have to pass $$, because $`:
Inserts the portion of the string that precedes the matched substring.
console.log('ABC'.replace('B', "$$`"));
See the documentation.
Other patterns:
Pattern
Inserts
$$
Inserts a $.
$&
Inserts the matched substring.
$`
Inserts the portion of the string that precedes the matched substring.
$'
Inserts the portion of the string that follows the matched substring.
$n
Where n is a positive integer less than 100, inserts the _n_th parenthesized submatch string, provided the first argument was a RegExp object. Note that this is 1-indexed. If a group n is not present (e.g., if group is 3), it will be replaced as a literal (e.g., $3).
$<Name>
Where Name is a capturing group name. If the group is not in the match, or not in the regular expression, or if a string was passed as the first argument to replace instead of a regular expression, this resolves to a literal (e.g., $<Name>). Only available in browser versions supporting named capturing groups.
JSFiddle
Also, there are even more things on the reference link I’ve posted above. If you still have any issue or doubt you probably can find an answer there, the screenshot above was taken from the link posted at the beginning of the answer.
It is worth saying, in my opinion, that any pattern that doesn’t match the above doesn’t need to be escaped, hence $ doesn’t need to be escaped, same story happens with $AAA.
In the comments above a user asked about why you need to “escape” $ with another $: despite I’m not truly sure about that, I think it is also worth to point out, from what we said above, that any invalid pattern won’t be interpreted, hence I think (and suspect, at this point) that $$ is a very special case, because it covers the cases where you need to replace the match with a dollar sign followed by a “pattern-locked” character, like the tick (`) as an example (or really the & as another).
In any other case, though, the dollar sign doesn’t need to be escaped, hence it probably makes sense that they decided to create such a specific rule, else you would’ve needed to escape the $ everywhere else (and I think this could’ve had an impact on any string object, because that would mean that even in var a = "hello, $ hey this one is a dollar";, you would’ve needed to escape the $).
If you’re still interested and want to read more, please also check regular-expressions.info and this JSFiddle with more cases.

In the replacement the $ dollar sign has a special meaning and is used when data from the match should be used in the replacement.
MDN: String.prototype.replace(): Specifying a string as a parameter
$$ Inserts a "$".
$` Inserts the portion of the string that precedes the matched substring.
As long as the $ does not result in a combination that has a special meaning, then it will be just handled as a regular char. But you should still always write it as a $$ in the replacement because otherwise, it might fail in future if a new $x combination is added.

Related

JS RegEx replacement of a non-captured group?

I'm currently going through the book "Eloquent JavaScript". There's an exercice at the end of Chapter 9 on Regular Expressions that I couldn't understand its solution very well. Description of the exercice can be found here.
TL;DR : The objective is to replace single quotes (') with double quotes (") in a given string while keeping single quotes in contractions. Using the replace methode with a RegEx of course.
Now, after actually resolving this exercice using my own method, I checked the proposed solution which looks like this :
console.log(text.replace(/(^|\W)'|'(\W|$)/g, '$1"$2'));
The RegEx looks fine and it's quite understandable, but what I fail to understand is the usage of replacements, mainly why using $2 works ? As far as I know this regular expression will only take one path of two, either (^|\W)' or '(\W|$) each of these paths will only result in a single captured group, so we will only have $1 available. And yet $2 is capturing what comes after the single quote without having an explicit second capture group that does this in the regular expression. One can argue that there are two groups, but then again $2 is capturing a different string than the one intended by the second group.
My questions :
Why $2 is actually a valid string and is not undefined, and what is it referring to precisely?
Is this one of JavaScript RegEx quirks ?
Does this mean $1, $2... don't always refer to explicit groups ?
The backreferences are initialized with an empty string upon each match, so there will be no issues if a group is not matched. And it is no quirk, it is in compliance with the ES5 standard.
Here is a quote from Backreferences to Failed Groups:
According to the official ECMA standard, a backreference to a non-participating capturing group must successfully match nothing just a backreference to a participating group that captured nothing does.
So, once a backreference is not participating in the match, it refers to an empty string, not undefined. And it is not a quirk, just a "feature". That is not quite expected sometimes, but it is just how it works.
In your scenario, either of the backreferences is empty upon a match since there are two alternative branches and only one matches each time. The point is to restore the char matched in either of the groups. Both backreferences are used as either of them contains the text to restore while the other only contains empty text.

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?
This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg
If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.
If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

JavaScript regular expression unexpected behavior

Let's have the following (a bit complex) regular expression in JavaScript:
\{\{\s*(?:(?:\:)([\w\$]+))?\#(?:([\w\$\/]+#?)?([\s\S]*?))?(\.([\w\$\/]*))?\s*\}\}
I am wondering why it matches the whole string here:
{{:control#}}x{{*>*}}
but not in the following case (where a space is added after #):
{{:control# }}x{{*>*}}
In PHP or Python, it matches in both cases just the first part {{: ... }}.
I want JavaScript to match only the first part as well. Is it possible without hacking (?!}}) before [\s\S]?
Moreover, is performance the reason for this different behavior in JavaScript, or is it just a bug in specification?
You can use a lazy ?? quantifier to achieve the same behavior in JavaScript:
\{\{\s*(?:(?::)([\w$]+))?#(?:([\w$\/]+#?)?([\s\S]*?))??(\.([\w$\/]*))?\s*}}
^^
See demo
From rexegg.com:
A?? Zero or one A, zero if that still allows the overall pattern to match (lazy)
This is no bug, and is right according to the ECMA standard specifications JavaScript abides by.
Here, in (?:([\w$\/]+#?)?([\s\S]*?))?, we have an optional non-capturing group that can match an empty text. JavaScript regex engine "consumes" empty texts in optional groups for them to be later accessible via backreferences. This problem is closely connected with the Backreferences to Failed Groups. E.g. ((q)?b\2) will match b in JavaScript, but it won't match in Python and PCRE.
According to the official ECMA standard, a backreference to a non-participating capturing group must successfully match nothing just like a backreference to a participating group that captured nothing does.
This subpattern is responsible for the behaviour:
([\w\$\/]+#?)? // P1
as it matches greedily, your whole test string (without the space) gets consumed.
As #stribizhev suggests, qualifying the designated part of your regex for non-greedy matching, results in a conservative match.
Both versions will match up to and including #, since both match patterns contain this character without any occurrence restrictions.
The second test string (including the space after #) matches non-greedily, since the P1 does not match white-space. Instead this white-space gets matcehd by the subsequent subexpression ( [\s\S]*? ), thus finishing the match.

How to explain "$1,$2" in JavaScript when using regular expression?

A piece of JavaScript code is as follows:
num = "11222333";
re = /(\d+)(\d{3})/;
re.test(num);
num.replace(re, "$1,$2");
I could not understand the grammar of "$1,$2". The book from which this code comes says $1 means RegExp.$1, $2 means RegExp.$2. But these explanations lead to more questions:
It is known that in JavaScript, the name of variables should begin with letter or _, how can $1 be a valid name of member variable of RegExp here?
If I input $1, the command line says it is not defined; if I input "$1", the command line only echoes $1, not 11222. So, how does the replace method know what "$1,$2" mean?
Thank you.
It's not a "variable" - it's a placeholder that is used in the .replace() call. $n represents the nth capture group of the regular expression.
var num = "11222333";
// This regex captures the last 3 digits as capture group #2
// and all preceding digits as capture group #1
var re = /(\d+)(\d{3})/;
console.log(re.test(num));
// This replace call replaces the match of the regex (which happens
// to match everything) with the first capture group ($1) followed by
// a comma, followed by the second capture group ($2)
console.log(num.replace(re, "$1,$2"));
$1 is the first group from your regular expression, $2 is the second. Groups are defined by brackets, so your first group ($1) is whatever is matched by (\d+). You'll need to do some reading up on regular expressions to understand what that matches.
It is known that in Javascript, the name of variables should begin with letter or _, how can $1 be a valid name of member variable of RegExp here?
This isn't true. $ is a valid variable name as is $1. You can find this out just by trying it. See jQuery and numerous other frameworks.
You are misinterpreting that line of code. You should consider the string "$1,$2" a format specifier that is used internally by the replace function to know what to do. It uses the previously tested regular expression, which yielded 2 results (two parenthesized blocks), and reformats the results. $1 refers to the first match, $2 to the second one. The expected contents of the num string is thus 11222,333 after this bit of code.
It is known that in Javascript, the name of variables should begin with letter or _,
No, it's not. $1 is a perfectly valid variable. You have to assign to it first though:
$variable = "this is a test"
This is how jQuery users a variable called $ as an alias for the jQuery object.
The book from which this code comes says $1 means RegExp.$1, $2 means RegExp.$2.
This book is made of paper. And paper cannot oppose any resistance to whom is writing on it :-) . But perhaps did you only misinterpret what is actually written in this book.
Actually, it is depending on the context.
In the context of the replace() method of String, $1, $2, ... $99 (1 through 99) are placeholders. They are handled internally by the replace() method (and they have nothing to do with RegExp.$1, RegExp.$2, etc, which are probably not even defined (see point 2. )). See String.prototype.replace() #Specifying_a_string_as_a_parameter. Compare this with the return value of the match() method of String when the flag g is not used, which is similar to the return value of the exec() method of RegExp. Compare also with the arguments passed implicitly to an (optional) function specified as second argument of replace().
RegExp.$1, RegExp.$2, ... RegExp.$9 (1 through 9 only) are non-standard properties of RegExp as you may see at RegExp.$1-$9 and Deprecated and obsolete features. They seem to be implemented on your browser, but, for somebody else, they could be not defined. To use them, you need always to prepend $1, $2, etc with RegExp.. These properties are static, read-only and stored in the RegExp global object, not in an individual regular expression object. But, anyway, you should not use them. The $1 through $99 used internally by the replace() method of String are stored elsewhere.
Have a nice day!

Regex format from PHP to Javascript

Can you please help me. How can I add this regex (?<=^|\s):d(?=$|\s) in javascript RegExp?
e.g
regex = new RegExp("?????" , 'g');
I want to replace the emoticon :d, but only if it is surrounded by spaces (or at an end of the string).
Firstly, as Some1.Kill.The.DJ mentioned, I recommend you use the literal syntax to create the regular expression:
var pattern = /yourPatternHere/g;
It's shorter, easier to read and you avoid complications with escape sequences.
The reason why the pattern does not work is that JavaScript does not support lookbehinds ((?<=...). So you have to find a workaround for that. You won't get around including that character in your pattern:
var pattern = /(?:^|\s):d(?!\S)/g;
Since there is no use in capturing anything in your pattern anyway (because :d is fixed) you are probably only interested in the position of the match. That means, when you find a match, you will have to check whether the first character is a space character (or is not :). If that is the case you have to increment the position by 1. If you know that your input string can never start with a space, you can simply increment any found position if it is not 0.
Note that I simplified your lookahead a bit. That is actually the beauty of lookarounds that you do not have to distinguish between end-of-string and a certain character type. Just use the negative lookahead, and assure that there is no non-space character ahead.
Just for future reference that means you could have simplified your initial pattern to:
(?<!\S):d(?!\S)
(If you were using a regex engine that supports lookbehinds.)
EDIT:
After your comment on the other answer, it's actually a lot easier to use the workaround. Just write back the captured space-character:
string = string.replace(/(^|\s):d(?!\S)/g, "$1emoticonCode");
Where $1 refers to what was matched with (^|\s). I.e. if the match was at the beginning of the string $1 will be empty, and if there was a space before :d, then $1 will contian that space character.
Javascript doesnt support lookbehind i.e(?<=)..
It supports lookahead
Better use
/(?:^|\s)(:d)(?=$|\s)/g
Group1 captures required match

Categories

Resources