js differentiate regexp capture groups

js differentiate regexp capture groups - javascript

Is there a way in javascript/node.js to differentiate capture groups created by a regexp ? For example, this regexp: /([A-Z]+\[[^\]]+])|(\d+)/g on the text: 9 and 4687 matches but not NUMBER[9] or NUMBER[9568] However, [401] should match... will create two capture groups:
group 1: NUMBER[9] & NUMBER[9568]
group 2: 9, 4687 & 401
What I want is to add the tag NUMBER[] around numbers on my text. Is there a way to do something like text.replace(regexp, 'NUMBER[$&]', SECOND_GROUP) ?
EDIT: The output would be NUMBER[9] and NUMBER[4687] matches but not NUMBER[9] or NUMBER[9568] However, [NUMBER[401]] should match...

You may use a callback method as a replacement argument in String#replace() to check if a specific group matched, and perform appropriate replacement logic inside the anonymous method:
var regex = /([A-Z]+\[[^\]]+])|(\d+)/g;
var str = `9 and 4687 matches but not NUMBER[9] or NUMBER[9568] However, [401] should match...`;
var res = str.replace(regex, function($0,$1,$2) {
return $2 ? "NUMBER[" + $2 + "]" : $0; // If Group 2 matched, use special value, else paste back the match
});
console.log(res);
The $0 stands for the whole match, $1 represents Group 1 contents, and $2 refer to the Group 2 contents.
You may add more logic to also account for specific Group 1 match treatment.

Related

How can I access the expression that caused a match in a conditional match group Javascript regex?

I have a conditional match grouped regex like /(sun|\bmoon)/. When I access the matches in a string, I want to be able to see the expression that caused my match.
let regex = /(sun|\bmoon)/
let match = regex.exec('moon')
// return '\bmoon' ??
Is this possible?

JavaScript's RegExp does not currently have a method to show which part of the regex pattern matched. I don't believe this is something that will be implemented any time soon (or even ever), but that's my own opinion. You can, instead, use two separate patterns as I show in the snippet below.
let regexes = [/sun/, /\bmoon/]
let str = 'moon'
regexes.forEach(function(regex) {
let m = regex.exec(str)
if (m == null) return
console.log(`${regex}: ${m}`)
0})

The reason why capturing groups exist is to identify the part of the input string that matches a subexpression. In your example, the subexpression that matches is sun|\bmoon (the content of the capturing group).
If you want to know which of the two sub-expression actually matches the input string, all you have to do is to put them into smaller capturing groups:
let regex = /((sun)|(\bmoon))/
let match = regex.exec('moon')
# Array [ "moon", "moon", undefined, "moon" ]
The returned array contains the string that matched the entire regex (at position 0) and the substrings that matched each capturing group at the other positions.
The capturing groups are counted starting from 1 in the order they are open.
In the example above, "moon", undefined and "moon" correspond to the capturing groups (in order) ((sun)|(\bmoon)), (sun) and (\bmoon).
By checking the values of match[2] and match[3] you can find if the input string matched sun (no, it is undefined) or \bmoon (yes).
You can use non-capturing groups for groups you don't need to capture but cannot be removed because they are needed for the grouping purposes.

You can't see the regexp expression as written in the pattern, but you can see in the array returned by exec what has been matched.

you mean?
console.log(match[0]);
or you want the full expression that matches? Like \bmoon ? If so, you can't see it.

JavaScript RegExp all chracters except dynamic series

So, I'm working on an opensource project as a way to expand my knowledge of JavaScript, and created an utility that processes strings dynamically, and replaces specific occurrences with other strings.
An example of this would be the following:
jdhfkjhs${c1}kdfjh$%^%$S654sgdsjh${c20}SUYTDRF^%$&*#(Y
And assuming I select the character '#', the RegExp processes it to be:
########${c1}####################${c20}###############
The problem I am facing is my RegExp /[^\$\{c\d\}]/g is also matching any of the characters inside of the RegExp, so a string such as _,met$$$$$1234{}cccgg. will be returned as #####$$$$$1234{}ccc###
Is there a way I can catch such a dynamic group with JavaScript, or should I find an alternative way to achieve what I am doing?
For some context, the project code can be found here.

You may match the group and capture it to restore later, and just match any char (with . if no line breaks are expected or with [^] / [\s\S]):
var rx = /(\${c\d+})|./g;
var str = 'jdhfkjhs\${c1}kdfjh\$%^%\$S654sgdsjh\${c20}SUYTDRF^%\$&*#(Y';
var result = str.replace(rx, function ($0,$1) {
return $1 ? $1 : '#';
});
console.log(result);
Details:
(\${c\d+}) - Group 1: a literal ${c substring, then 1+ digits and a literal }
| - or
. - any char but a line break char (or any char if you use [^] or [\s\S]).
In the replacement, $0 stands for the whole match, $1 stands for the contents of the first capturing group. If the $1 is set, it is re-inserted to the resulting string, else, the char is replaced with #.

Replace multiple characters by one character with regex

I have this string :
var str = '#this #is____#a test###__'
I want to replace all the character (#,_) by (#) , so the excepted output is :
'#this #is#a test#'
Note :
I did not knew How much sequence of (#) or (_) in the string
what I try :
I try to write :
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]/g,'#')
alert(str)
But the output was :
#this #is## ###a test#####
my try online
I try to use the (*) for sequence But did not work :
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]*/g,'#')
alert(str)
so How I can get my excepted output ?

A well written RegEx can handle your problem rather easily.
Quoting Mohit's answer to have a startpoint:
var str = '#this #is__ __#a test###__';
var formattedStr = str.replace(/[#_,]+/g, '#');
console.log( formattedStr );
Line 2:
Put in formattedStr the result of the replace method on str.
How does replace work? The first parameter is a string or a RegEx.
Note: RegExps in Javascripts are Objects of type RegExp, not strings. So writing
/yourRegex/
or
New RegExp('/yourRegex/')
is equivalent syntax.
Now let's discuss this particular RegEx itself.
The leading and trailing slashes are used to surround the pattern, and the g at the end means "globally" - do not stop after the first match.
The square parentheses describe a set of characters who can be supplied to match the pattern, while the + sign means "1 or more of this group".
Basically, ### will match, but also # or #####_# will, because _ and # belong to the same set.
A preferred behavior would be given by using (#|_)+
This means "# or _, then, once found one, keep looking forward for more or the chosen pattern".
So ___ would match, as well as #### would, but __## would be 2 distinct match groups (the former being __, the latter ##).
Another problem is not knowing wheter to replace the pattern found with a _ or a #.
Luckily, using parentheses allows us to use a thing called capturing groups. You basically can store any pattern you found in temporary variabiles, that can be used in the replace pattern.
Invoking them is easy, propend $ to the position of the matched (pattern).
/(foo)textnotgetting(bar)captured(baz)/ for example would fill the capturing groups "variables" this way:
$1 = foo
$2 = bar
$3 = baz
In our case, we want to replace 1+ characters with the first occurrence only, and the + sign is not included in the parentheses!
So we can simply
str.replace("/(#|_)+/g", "$1");
In order to make it work.
Have a nice day!

Your regex replaces single instance of any matched character with character that you specified i.e. #. You need to add modifier + to tell it that any number of consecutive matching characters (_,#) should be replaced instead of each character individually. + modifier means that 1 or more occurrences of specified pattern is matched in one go. You can read more about modifiers from this page:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
var str = '#this #is__ __#a test###__';
var formattedStr = str.replace(/[#_,]+/g, '#');
console.log( formattedStr );

You should use the + to match one-or-more occurrences of the previous group.
var str = '#this #is__ __#a test###__'
str = str.replace(/[#_]+/g,'#')
alert(str)

JS Regex: Remove anything (ONLY) after a word

I want to remove all of the symbols (The symbol depends on what I select at the time) after each word, without knowing what the word could be. But leave them in before each word.
A couple of examples:
!!hello! my! !!name!!! is !!bob!! should return...
!!hello my !!name is !!bob ; for !
and
$remove$ the$ targetted$# $$symbol$$# only $after$ a $word$ should return...
$remove the targetted# $$symbol# only $after a $word ; for $

You need to use capture groups and replace:
"!!hello! my! !!name!!! is !!bob!!".replace(/([a-zA-Z]+)(!+)/g, '$1');
Which works for your test string. To work for any generic character or group of characters:
var stripTrailing = trail => {
let regex = new RegExp(`([a-zA-Z0-9]+)(${trail}+)`, 'g');
return str => str.replace(regex, '$1');
};
Note that this fails on any characters that have meaning in a regular expression: []{}+*^$. etc. Escaping those programmatically is left as an exercise for the reader.
UPDATE
Per your comment I thought an explanation might help you, so:
First, there's no way in this case to replace only part of a match, you have to replace the entire match. So we need to find a pattern that matches, split it into the part we want to keep and the part we don't, and replace the whole match with the part of it we want to keep. So let's break up my regex above into multiple lines to see what's going on:
First we want to match any number of sequential alphanumeric characters, that would be the 'word' to strip the trailing symbol from:
( // denotes capturing group for the 'word'
[ // [] means 'match any character listed inside brackets'
a-z // list of alpha character a-z
A-Z // same as above but capitalized
0-9 // list of digits 0 to 9
]+ // plus means one or more times
)
The capturing group means we want to have access to just that part of the match.
Then we have another group
(
! // I used ES6's string interpolation to insert the arg here
+ // match that exclamation (or whatever) one or more times
)
Then we add the g flag so the replace will happen for every match in the target string, without the flag it returns after the first match. JavaScript provides a convenient shorthand for accessing the capturing groups in the form of automatically interpolated symbols, the '$1' above means 'insert contents of the first capture group here in this string'.
So, in the above, if you replaced '$1' with '$1$2' you'd see the same string you started with, if you did 'foo$2' you'd see foo in place of every word trailed by one or more !, etc.

js regex - get string between expressions

i want to match string in javascript, but take only part from matched string.
E.g i have string 'my test string' and use match function to get only 'test':
var text = 'my test string';
var matched = text.match(/(?=my).+(?=string)/g);
iam tried to get it in this way but it will return 'my test'.
How can i do this to get only 'test' with regex?

You can use a capture group:
var match = text.match(/my (.*) string/g);
# match[0] will be the whole string, match[1] the capture group
match[1];
this will still match the whole string, but you can get the contents with match[1].
Some other regex engines have a feature called "lookbehind", but this is unsupported in JavaScript, so I recommend the method with the capture group.

You need to change your regex to /my (.+) string/g and create RegExp object from it:
var regex = new RegExp(/my (.+) string/g);
Then use regex.exec(string) to get the capturing groups:
var matches = regex.exec(text);
matches will be an array with the value: ["my test string", "test"].
matches contains 2 groups: $0 and $1. $0 is the whole match, and $1 is the first capturing group. $1 it's what inside the parentheses: .+.
You need $1, so you can get it by writing matches[1]:
//This will return the string you want
var matched = matches[1];

Develop Reference

JavaScript is the programming language of the Web.

js differentiate regexp capture groups - javascript

Related

How can I access the expression that caused a match in a conditional match group Javascript regex?

JavaScript RegExp all chracters except dynamic series

Replace multiple characters by one character with regex

JS Regex: Remove anything (ONLY) after a word

js regex - get string between expressions

Categories

Resources