How to explain "$1,$2" in JavaScript when using regular expression? - javascript

A piece of JavaScript code is as follows:
num = "11222333";
re = /(\d+)(\d{3})/;
re.test(num);
num.replace(re, "$1,$2");
I could not understand the grammar of "$1,$2". The book from which this code comes says $1 means RegExp.$1, $2 means RegExp.$2. But these explanations lead to more questions:
It is known that in JavaScript, the name of variables should begin with letter or _, how can $1 be a valid name of member variable of RegExp here?
If I input $1, the command line says it is not defined; if I input "$1", the command line only echoes $1, not 11222. So, how does the replace method know what "$1,$2" mean?
Thank you.

It's not a "variable" - it's a placeholder that is used in the .replace() call. $n represents the nth capture group of the regular expression.
var num = "11222333";
// This regex captures the last 3 digits as capture group #2
// and all preceding digits as capture group #1
var re = /(\d+)(\d{3})/;
console.log(re.test(num));
// This replace call replaces the match of the regex (which happens
// to match everything) with the first capture group ($1) followed by
// a comma, followed by the second capture group ($2)
console.log(num.replace(re, "$1,$2"));

$1 is the first group from your regular expression, $2 is the second. Groups are defined by brackets, so your first group ($1) is whatever is matched by (\d+). You'll need to do some reading up on regular expressions to understand what that matches.
It is known that in Javascript, the name of variables should begin with letter or _, how can $1 be a valid name of member variable of RegExp here?
This isn't true. $ is a valid variable name as is $1. You can find this out just by trying it. See jQuery and numerous other frameworks.

You are misinterpreting that line of code. You should consider the string "$1,$2" a format specifier that is used internally by the replace function to know what to do. It uses the previously tested regular expression, which yielded 2 results (two parenthesized blocks), and reformats the results. $1 refers to the first match, $2 to the second one. The expected contents of the num string is thus 11222,333 after this bit of code.

It is known that in Javascript, the name of variables should begin with letter or _,
No, it's not. $1 is a perfectly valid variable. You have to assign to it first though:
$variable = "this is a test"
This is how jQuery users a variable called $ as an alias for the jQuery object.

The book from which this code comes says $1 means RegExp.$1, $2 means RegExp.$2.
This book is made of paper. And paper cannot oppose any resistance to whom is writing on it :-) . But perhaps did you only misinterpret what is actually written in this book.
Actually, it is depending on the context.
In the context of the replace() method of String, $1, $2, ... $99 (1 through 99) are placeholders. They are handled internally by the replace() method (and they have nothing to do with RegExp.$1, RegExp.$2, etc, which are probably not even defined (see point 2. )). See String.prototype.replace() #Specifying_a_string_as_a_parameter. Compare this with the return value of the match() method of String when the flag g is not used, which is similar to the return value of the exec() method of RegExp. Compare also with the arguments passed implicitly to an (optional) function specified as second argument of replace().
RegExp.$1, RegExp.$2, ... RegExp.$9 (1 through 9 only) are non-standard properties of RegExp as you may see at RegExp.$1-$9 and Deprecated and obsolete features. They seem to be implemented on your browser, but, for somebody else, they could be not defined. To use them, you need always to prepend $1, $2, etc with RegExp.. These properties are static, read-only and stored in the RegExp global object, not in an individual regular expression object. But, anyway, you should not use them. The $1 through $99 used internally by the replace() method of String are stored elsewhere.
Have a nice day!

Related

JS RegEx replacement of a non-captured group?

I'm currently going through the book "Eloquent JavaScript". There's an exercice at the end of Chapter 9 on Regular Expressions that I couldn't understand its solution very well. Description of the exercice can be found here.
TL;DR : The objective is to replace single quotes (') with double quotes (") in a given string while keeping single quotes in contractions. Using the replace methode with a RegEx of course.
Now, after actually resolving this exercice using my own method, I checked the proposed solution which looks like this :
console.log(text.replace(/(^|\W)'|'(\W|$)/g, '$1"$2'));
The RegEx looks fine and it's quite understandable, but what I fail to understand is the usage of replacements, mainly why using $2 works ? As far as I know this regular expression will only take one path of two, either (^|\W)' or '(\W|$) each of these paths will only result in a single captured group, so we will only have $1 available. And yet $2 is capturing what comes after the single quote without having an explicit second capture group that does this in the regular expression. One can argue that there are two groups, but then again $2 is capturing a different string than the one intended by the second group.
My questions :
Why $2 is actually a valid string and is not undefined, and what is it referring to precisely?
Is this one of JavaScript RegEx quirks ?
Does this mean $1, $2... don't always refer to explicit groups ?
The backreferences are initialized with an empty string upon each match, so there will be no issues if a group is not matched. And it is no quirk, it is in compliance with the ES5 standard.
Here is a quote from Backreferences to Failed Groups:
According to the official ECMA standard, a backreference to a non-participating capturing group must successfully match nothing just a backreference to a participating group that captured nothing does.
So, once a backreference is not participating in the match, it refers to an empty string, not undefined. And it is not a quirk, just a "feature". That is not quite expected sometimes, but it is just how it works.
In your scenario, either of the backreferences is empty upon a match since there are two alternative branches and only one matches each time. The point is to restore the char matched in either of the groups. Both backreferences are used as either of them contains the text to restore while the other only contains empty text.

Deciphering contents of a .replace() parameter

In an example piece of code, I stumbled upon this line:
// Change the string into lower case and remove all non-alphanumeric characters
var cstr = str_entry.toLowerCase().replace(/[^a-zA-Z0-9]+/g,'');
I think I understand that the /g inside the parameter makes everything in between the // become empty strings (''). Am I correct?
What does the ^ part of the parameter do? What does everything inside the [ ] brackets mean?
The first parameter of the replace function is a regular expression, which is a way of determining if a string matches a complex pattern.
The /g parameter means 'global', so if two parts of the str_entry string match, they will both replaced with an empty string, instead of just the first one.
The ^ within [] means 'not', so it's saying 'check if the string is not a-zA-Z0-9'.
More simply, the regular expression is identifying any non-alphanumeric characters in your string. Using it with replace(..., '') will remove those characters.
Take a look at Regex101 for more information about how regular expressions work. You can punch in your regular expression and it will tell you what each part of it does.

Why 'ABC'.replace('B', '$`') gives AAC

Why this code prints AAC instead of expected A$`C?
console.log('ABC'.replace('B', '$`'));
==>
AAC
And how to make it give the expected result?
To insert a literal $ you have to pass $$, because $`:
Inserts the portion of the string that precedes the matched substring.
console.log('ABC'.replace('B', "$$`"));
See the documentation.
Other patterns:
Pattern
Inserts
$$
Inserts a $.
$&
Inserts the matched substring.
$`
Inserts the portion of the string that precedes the matched substring.
$'
Inserts the portion of the string that follows the matched substring.
$n
Where n is a positive integer less than 100, inserts the _n_th parenthesized submatch string, provided the first argument was a RegExp object. Note that this is 1-indexed. If a group n is not present (e.g., if group is 3), it will be replaced as a literal (e.g., $3).
$<Name>
Where Name is a capturing group name. If the group is not in the match, or not in the regular expression, or if a string was passed as the first argument to replace instead of a regular expression, this resolves to a literal (e.g., $<Name>). Only available in browser versions supporting named capturing groups.
JSFiddle
Also, there are even more things on the reference link I’ve posted above. If you still have any issue or doubt you probably can find an answer there, the screenshot above was taken from the link posted at the beginning of the answer.
It is worth saying, in my opinion, that any pattern that doesn’t match the above doesn’t need to be escaped, hence $ doesn’t need to be escaped, same story happens with $AAA.
In the comments above a user asked about why you need to “escape” $ with another $: despite I’m not truly sure about that, I think it is also worth to point out, from what we said above, that any invalid pattern won’t be interpreted, hence I think (and suspect, at this point) that $$ is a very special case, because it covers the cases where you need to replace the match with a dollar sign followed by a “pattern-locked” character, like the tick (`) as an example (or really the & as another).
In any other case, though, the dollar sign doesn’t need to be escaped, hence it probably makes sense that they decided to create such a specific rule, else you would’ve needed to escape the $ everywhere else (and I think this could’ve had an impact on any string object, because that would mean that even in var a = "hello, $ hey this one is a dollar";, you would’ve needed to escape the $).
If you’re still interested and want to read more, please also check regular-expressions.info and this JSFiddle with more cases.
In the replacement the $ dollar sign has a special meaning and is used when data from the match should be used in the replacement.
MDN: String.prototype.replace(): Specifying a string as a parameter
$$ Inserts a "$".
$` Inserts the portion of the string that precedes the matched substring.
As long as the $ does not result in a combination that has a special meaning, then it will be just handled as a regular char. But you should still always write it as a $$ in the replacement because otherwise, it might fail in future if a new $x combination is added.

Change date format in javascript (jquery)

I have two dates:
var first = '21-11-2012';
var second = '03-11-2012';
What is the best way to format it like this:
var first = '2012-11-21';
var second = '2012-11-03';
Should I use jQuery or simply JavaScript?
You don't need jQuery for this, simply use JavaScript like so:
function formatDate(d){
return d.split('-').reverse().join('-');
}
Although if you want more reusable code consider using the JavaScript Date Object.
No need to be thinking of jQuery for basic string manipulation: the standard JS String methods are more than adequate, which (I assume) is why jQuery doesn't actually have equivalent methods.
A regex .replace() or the split/reverse/join concept in the other answers can both do it in one line. I'd recommend getting familiar with the methods at the MDN page I linked to, but meanwhile:
first = first.replace(/^(\d\d)-(\d\d)-(\d\d\d\d)$/,"$3-$2-$1");
(Same for second - encapsulate in a function if desired.)
This uses a regular expression to match the different parts of the date and reverse them.
UPDATE - As requested, an explanation of the regex I used:
^ // match beginning of string
(\d\d) // match two digits, and capture them for use
// in replacement expression as $1
- // match the literal hyphen character
(\d\d) // match two digits, and capture for use as $2
- // match the literal hyphen character
(\d\d\d\d) // match four digits, and capture for use as $3
$ // match end of string
Because the pattern matches from beginning to end of string the whole string will be replaced. The parts of the expression in parentheses are "captured" and can be referred to as $1, $2, etc. (numbered in the order they appear in the expression) if used in the replacement string, so "$3-$2-$1" reverses the order of these captured pieces and puts hyphens between them. (If the input string didn't meet that format the regex would not match and no replacement would be made.)
first=first.split("-").reverse().join("-")
second=second.split("-").reverse().join("-")

What does $1, $2, etc. mean in Regular Expressions?

Time and time again I see $1 and $2 being used in code. What does it mean? Can you please include examples?
When you create a regular expression you have the option of capturing portions of the match and saving them as placeholders. They are numbered starting at $1.
For instance:
/A(\d+)B(\d+)C/
This will capture from A90B3C the values 90 and 3. If you need to group things but don't want to capture them, use the (?:...) version instead of (...).
The numbers start from left to right in the order the brackets are open. That means:
/A((\d+)B)(\d+)C/
Matching against the same string will capture 90B, 90 and 3.
This is esp. useful for Replacement String Syntax (i.e. Format Strings) Goes good for Cases/Case Foldings for Find & Replaces. To reference a capture, use $n where n is the capture register number. Using $0 means the entire match. Example : Find: (<a.*?>)(.*?)(</a>) Replace: $1\u$2\e$3

Categories

Resources