match doesn't return capturing group [duplicate] - javascript

This question already has answers here:
Regex doesn't omit quotes when matching text in quotes
(3 answers)
Closed 7 years ago.
I'm trying to apply a regular expression to a string, in search of any placeholders, but I can't figure out why the result is only the full match, not the capturing group.
//----HTML------------------------//
<p>Let's try plaintext.<br/>
bommdidumpidudidoo</p>
<p id="output"></p>
//----JS------------------------//
var s = $('p').html();
var matches = s.match( /.*(plaintext)/g );
write(matches);
write(matches[0]);
write(matches[1]);
//------- whatever -------//
function write(s) {
$('#output').html( $('#output').html() +'<br/>'+s );
}
// output:
// Let's try plaintext
// Let's try plaintext
// undefined
» fiddle
(I've used my custom function write instead of console.log so that the result would show up in the fiddle)
Why is the third line undefined? I don't understand!
I'm pretty sure the expression is right. I'm 1oo% certain that this is the right capturing group syntax for JavaScript, and that .match() returns an array with the full match first, then all capturing groups. I've even tested it with regex101.com – over there, I do get the capturing group.
It's not the same as this other problem because there, the OR logic was the crux of the problem and here the pipe | doesn't even come up.

Oooh! It's because I'm doing a global search, with the g modifier. I do get the expected result if I remove that modifier. Tss.
I've used that modifier in the first place in order to grab multiple placeholders, but I guess I can still while-loop that stuff …

Related

NodeJS Regex match 4 digit number in string with 'OR' words [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 3 years ago.
I need to regex string with a year inside. Template is 'Year-<4 digits>-"high OR low"-level'.
I've built this regex: /Year-\d{4}-\b(low|high)\b-level/gi;
In online regex testers my strings pass the check. Sample code:
const template = /Year-\d{4}-\b(low|high)\b-level/gi;
const txtArr = ['Year-2019-low-level', 'Year-2019-high-level', 'Year-low-level', 'Year-high-level', 'Year-2018-low-level', 'Year-2018-low-level']
for (const s of txtArr) {
console.log(template.test(s), s);
}
I expect 2 of sample strings to not pass, but 4 should pass. But they dont - only 2 of them pass. Also in browser console they don't pass. Tried in FF and Chrome. Can't understand why.
Also, if I copy the string that is not passing the match and just make
console.log(template.test('Year-2018-low-level'), 'Year-2018-low-level');
it passes! I've got only one idea: looks like in every iteration of loop something is not reset in regex, and it keeps something in memory, that is not letting match pass.
P.S. I even copied the same string which must pass the test to array, like that:
const txtArr = ['Year-2019-low-level', 'Year-2019-low-level', 'Year-2019-low-level', 'Year-2019-low-level', 'Year-2019-low-level', 'Year-2019-low-level']
and the results are true-false-true-false-true... Why? And how to fix?
I found an explanation here: https://siderite.dev/blog/careful-when-reusing-javascript-regexp.html
"The moral of the story is to be careful of constructs like _reg.test(input);
when _reg is a global regular expression. It will attempt to match from the index of the last match in any previous string."
So the problem comes from the way the global statement is treated.
The author of the blog also describes the very same problem you have:
"Here is a case that was totally weird. Imagine a javascript function that returns an array of strings based on a regular expression match inside a for loop. In FireFox it would return half the number of items that it should have."
What you could do to avoid this problem is either not using the global keyword, or instanciate a new regex at each iteration:
const txtArr = ['Year-2019-low-level', 'Year-2019-high-level', 'Year-low-level', 'Year-high-level', 'Year-2018-low-level', 'Year-2018-low-level']
for (const s of txtArr) {
console.log(/Year-\d{4}-\b(low|high)\b-level/gi.test(s), s);
}
An alternative is to use !!s.match(template) instead of template.test(s), so you don't need to modify your regex.
Working example: https://codesandbox.io/s/zen-carson-z9cq6
An explanation to the weird behavior:
The RegExp object keeps track of the lastIndex where a match occurred,
so on subsequent matches it will start from the last used index,
instead of 0.
from this StackOverflow question: Why does a RegExp with global flag give wrong results?
I changed your regex and its working, with this one:
const template = /Year-\d{4}-(low|high)-level/

How do I insert something at a specific character with Regex in Javascript [duplicate]

This question already has answers here:
Simple javascript find and replace
(6 answers)
Closed 5 years ago.
I have string "foo?bar" and I want to insert "baz" at the ?. This ? may not always be at the 3 index, so I always want to insert something string at this ? char to get "foo?bazbar"
The String.protype.replace method is perfect for this.
Example
let result = "foo?bar".replace(/\?/, '?baz');
alert(result);
I have used a RegEx in this example as requested, although you could do it without RegEx too.
Additional notes.
If you expect the string "foo?bar?boo" to result in "foo?bazbar?boo" the above code works as-is
If you expect the string "foo?bar?boo" to result in "foo?bazbar?bazboo" you can change the call to .replace(/\?/g, '?baz')
You don't need a regular expression, since you're not matching a pattern, just ordinary string replacement.
string = 'foo?bar';
newString = string.replace('?', '?baz');
console.log(newString);

Ignore regex capturing group when using javascript split [duplicate]

This question already has answers here:
JavaScript string split by Regex results sub-strings include empty slices
(2 answers)
Closed 4 years ago.
I'm trying to split up a string into an array, and I'm looking to get back an array with the following format: ['a','b', 'c']
const code = "/*_ ex1.js */a/*_ ex2.js */b/*_ ex3.js */c"
code.split(/\/\*_.+?\*\//)
=> (This is what I want)
['a','b', 'c']
But when I try to ensure that the regex works with new lines
code.split(/\/\*_(.|\s)+?\*\//)
=>(Not what I want)
[' ', 'a', ' ', 'b', ' ', 'c']
I have no idea where these extra spaces are coming from. It obviously has something to do with the bracketed capturing group, but I don't understand how to get around that.
split includes the contents of any capturing group in the output. From MDN:
If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array.
In your case, the (.|\s) is a capturing group. Therefore, spaces are getting included in your output. The easiest way around this is to make it a non-capturing group with ?::
code.split(/\/\*_(?:.|\s)+?\*\//)
^^
This still leaves you with an initial empty string in the resulting array. (Your initial, non-multi-line version also behaves that way.) There is no way around that, since your splitter is coming right at the beginning of the string, and so the token to the left is an empty string. If you want to get rid of that, you could filter it out:
.filter(Boolean)
Try using String.prototype.match() with RegExp /[a-z](?=\/|\n|$)/g to match character class a through z followed by / character or newline character or end of input
const code = "/*_ ex1.js */a/*_ ex2.js */b/*_ ex3.js */c\n"
+ "/*_ ex4.js */d/*_ ex5.js */e/*_ ex6.js */f";
var res = code.match(/[a-z](?=\/|\n|$)/g);
console.log(res);

Javascript regex: why groups with one syntax, and not the other

I just spent a couple of hours wondering why a regular expression that I thought I understood, wasn't giving me the results I expected.
Consider these two ways of using the same regular expression:
var str="This will put us on the map!"
var a=str.match(/(?:\bwill\W+)(\w+)(\W+)/g)
alert(a[0]) //will put
alert(a[1]) //undefined
var regex=/(?:\bwill\W+)(\w+)(\W+)/g
var match = regex.exec(str)
alert(match[0]) //will put
alert(match[1]) //put
Fiddle
Obviously, the latter form is working properly; but what's wrong with the former?
Also, for thoroughness:
var re = new RegExp("(?:\\bwill\\W+)(\\w+)(\\W+)","g")
var rematch = re.exec(str)
alert(rematch[0]) //will put
alert(rematch[1]) //put
Fiddle
When I was searching here, I came across this question ("Javascript Regex Missing Groups") which claims that the g flag was causing the problem. However, that is clearly not the problem here, since the RE is exactly the same in the two cases, the only difference is how it's executed.
Thanks for your help!
Edit: The responses below do an excellent job of clearing this up. One thing I learned from this that I'd like to make clear for the record, is that the re.exec() method can be used to get all the matches, and it can also be used to get all the groups, but the way of accessing those two modes is somewhat subtle: With or without the g flag, the return value is always an array with the full match followed by the match groups. It is never an array containing multiple matches. The way to access multiple matches is to call the exec() method again on the same RegExp object.
It was mystifying to me why I was unable to answer this question myself with several hours of Google searching. The behavior in question is described in the documentation of string.match() and RegExp.exec(), although it was not described in a way that made those come up with any of the search strings related to the way I was experiencing the problem. So, for reference, I'm linking those here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
The difference is indeed with the g modifier. When used together with .match() it will yield all values of $0 for each match that was found.
For example:
> "This will put us on the map!".match(/(\w+)/g)
["This", "will", "put", "us", "on", "the", "map"]
But:
> "This will put us on the map!".match(/(\w+)/)
["This", "This"]
string.match will only match the string and leaves of all sub expressions. It only matches $0.
It is meant only for matching. But if your match is inside a group, then you'll get duplicates. 1 will be the matched one and the other one will be the group.
Whereas, regex.exec with g modifier is used to be used in loops and the groups will be retained in the array.
To put it simply:
.match() will only match the matched part of string without groups, an exception being is when the match itself is a group.
.exec will give you the match, the groups.
So use .match only when you want to find the match and use .exec when you want the groups.
Here's some perspective to clarify the point the others have made:
var str="This will put us on the map will do this!"
var a=str.match(/(?:\bwill\W+)(\w+)(\W+)/g)
console.log(a);
gives you this:
["will put ", "will do "]
So when you have the g modifier, it only does a global match for the full pattern which would normally be $0.
It's not like e.g. php that gives you a multi-dim array of full pattern matches and grouped matches e.g.
preg_match_all('~(?:\bwill\W+)(\w+)(\W+)~',$string,$matches);
Array
(
[0] => Array
(
[0] => will put
[1] => will do
)
[1] => Array
(
[0] => put
[1] => do
)
[2] => Array
(
[0] =>
[1] =>
)
)
In javascript, you only ever get a single-dim array. So .match will either give you each element as the match the full pattern does (with g modifier), or else just element 0 as the full pattern match, and elements 1+ as the grouped. Whereas .exec will only do the latter. Neither one of them will give you a multi-dim with both, like in my php example.

Why `pattern.test(name)` opposite results on consecutive calls [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 7 years ago.
Why is this code returning first true, then false
var pattern = new RegExp("mstea", 'gi'), name = "Amanda Olmstead";
console.log('1', pattern.test(name));
console.log('1', pattern.test(name));
Demo: Fiddle
g is for repeating searches. It changes the regular expression object into an iterator. If you want to use the test function to check your string is valid according to your pattern, remove this modifier :
var pattern = new RegExp("mstea", 'i'), name = "Amanda Olmstead";
The test function, contrary to replace or match doesn't consume the whole iteration, which lets it in a "bad" state. You should probably never use this modifier when using the test function.
You don't want to use gi in combination with pattern.test. The g flag means that it keeps track of where you are running so it can be reused. So instead, you should use:
var pattern = new RegExp("mstea", 'i'), name = "Amanda Olmstead";
console.log('1', pattern.test(name));
console.log('1', pattern.test(name));
Also, you can use /.../[flags] syntax for regex, like so:
var pattern = /mstea/i;
Because you set the g modifier.
Remove it for your case.
var pattern = new RegExp("mstea", 'i'), name = "Amanda Olmstead";
It isn't a bug.
The g causes it to carry out the next attempted match for the substring, after the first match. And that is why it returns false in every even attempt.
First attempt:
It is testing "Amanda Olmstead"
Second attempt:
It is testing "d" //match found in previous attempt (performs substring there)
Third attempt:
It is testing "Amanda Olmstead" again //no match found in previous attempt
... so on
MDN page for Regexp.exec states:
If your regular expression uses the "g" flag, you can use the exec
method multiple times to find successive matches in the same string.
When you do so, the search starts at the substring of str specified by
the regular expression's lastIndex property
MDN page for test states:
As with exec (or in combination with it), test called multiple times
on the same global regular expression instance will advance past the
previous match.

Categories

Resources