Why is Javascript Regex matching every second time? - javascript

I'm defining a regex object and then matching it in a loop. It only matches sometimes, to be precise - every second time. So I created a smallest working sample of this problem.
I tried this code in Opera and Firefox. The behavior is the same in both:
>>> domainRegex = /(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/g;
/(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/g
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
Why is this happening? Is this behaviour documented? Is there a way around, other than defining the regex inside loop body?

exec() works in the manner you have described; with the /g modifier present, it will return a match, starting from lastIndex with every invocation until there are no more matches, at which point it returns null and the value of lastIndex is reset to 0.
However, because you have anchored the expression using $ there won't be more than one match, so you can use String.match() instead and lose the /g modifier:
var domainRegex = /(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/;
'mail-we0-f174.google.com'.match(domainRegex); // [".google.com", "google.com"]

Additional Info to Ja͢cks response:
You can also set lastIndex
var myRgx = /test/g;
myRgx.exec(someString);
myRgx.lastIndex = 0;
or just create a new regex for each execution, which i find even cleaner
new RegExp(myRgx).exec(someString);

When performing a global search with a RegExp, the exec method starts matching beginning at
the lastIndex property. The lastIndex property is set at each exec invocation and
is set to the position following the last match found. If a match fails, lastIndex is reset to 0, which causes exec to match from the start again.
var a = 'asdfeeeasdfeedxasdf'
undefined
var p = /asdf/g
p.lastIndex
4
p.exec(a)
["asdf"]
p.lastIndex
11
p.exec(a)
["asdf"]
p.lastIndex
19
p.exec(a)
null //match failed
p.lastIndex
0 //lastIndex reset. next match will start at the beginning of the string a
p.exec(a)
["asdf"]

Each time you run the exec method of your regex it gets you the next match.
Once it reaches the end of the string, it returns null to let you know you've got all of the matches. The next time, it starts again from the begining.
As you only have one match (which returns an array of the full match and the match from the brackets), The first time, the regex starts searching from the start. It finds a match and returns it. The next time, it gets to the end and returns null. So if you had this in a loop, you could do something like this to loop through all matches:
while(regExpression.exec(string)){
// do something
}
Then the next time, it starts again from position 0.
"Is there a way around?"
Well, if you know there's only one match, or you only want the first match, you can save te result to a variable. There's no need to resuse .exec. If you are interested in all the matches, then you need to keep going until you get null.

why don't you use simple match method for string like
'mail-we0-f174.google.com'.match(/(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/)

Related

Why JS Regexp.exec returns an array with more elements than expected?

I'm attempting to regex match various duration strings (e.g. 1d10h, 30m, 90s, etc.) and have come up with a regex string to split the string into pieces, but it seems that I'm getting two undefined results at the ends that shouldn't be there. I imagine it has to do with the greedy matching via the ? groupings, but I'm not sure how to fix it.
My code looks like this:
const regex = /^(\d+?[d])?(\d+?[h])?(\d+[m])?(\d+[s])?$/gmi
const results = regex.exec('1d10h')
and the results I get look like so:
[
"1d10h",
"1d",
"10h",
undefined,
undefined,
]
I was only expecting the first three results (and in fact, I only really want 1d and 10h) but the two remaining undefined results keep popping up.
You have 4 groups in the regular expression - each enclosed with braces ( ... ) and enumerated naturally - the earlier opening brace appear in the expression the lower order index a group has.
And, of course, the whole match that could be named a "zero" group.
So, result of regex.exec('1d10h') contains 5 items:
results[0] - the whole expression match
results[i] - match of each group, i in {1,2,3,4}
Since in this case each group is optional (followed by ?) - it is allowed to have undefined in place of any unmatched group.
It is easy to see that if you remove a ? symbol after an unmatched group, the whole expression will fail to match and hence regex.exec('1d10h') will return null.
To get rid of undefined elements just filter them out:
const result = regex.exec('1d10h').filter(x => x);

Eloquent Javascript Looping over RegExp Matches

The following example is a bit confusing to me:
var text = "A string with 3 numbers in it ... 42 and 88.";
var number = /\b(\d+)\b/g;
var match;
while (match = number.exec(text)){
console.log("Found", match[1], "at", match.index);
}
Specifically, I don't understand how this has a "looping" effect. How does it run through all the matches within one string if it keeps calling match[1]. Is there some kind of side effect with exec that I am unaware of?
Edit:
I still would like an answer to how match[1] is working.
How does match[1] produce any answer? When I test this type of thing myself, I get undefined, look
> var y = /\d+/g.exec('5')
undefined
> y
[ '5', index: 0, input: '5' ]
> y[1]
undefined
Whats going on here? Wouldn't it be y[0], or in the case above, match[0]? Like:
> y[0]
'5'
The RegExp object remembers the last matched position with lastIndex property.
Quoting MDN Documentation,
If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property).
Important Note: The first part of the first line of the quoted section is important. If your regular expression uses the"g"flag. Only if the RegEx has g flag you will get this behavior.

Cutting strings

I'm following the excersises in the book Eloquent Javascript and came up to this piece of code:
function between(string, start, end)
{
// Start AT the last character position of the start string
var startAt = string.indexOf(start) + start.length;
//Count the position of the end string first character
var endAt = string.indexOf(end, startAt);
return string.slice(startAt, endAt);
}
var betweenIn = between('Inazuma Eleven', 'Ina', 'ven');
console.log(betweenIn);
Code works fine. It extracts a piece of string between in. Now I tried to really understand this piece, but one thing isn't clear to me. The variable endAt checks the string's first character position of the third given parameter, (in this case my string is 'Inazuma Eleven' end the parameter is 'ven'. I need that for slicing the string, BUT it seems that the second parameter of the indexOf method doesn't do anything. If I remove it, I get the same results. Why is this?
The second parameter of indexOf defaults to 0. This is the place in the string where it will start looking for your matching substring.
Starting after the end of the start string ensures that a) your end string doesn't match the first instance of it if the start string and end string are identical, and b) you have to scan less of the target string so the code runs faster.
In this instance, your start and end strings are different, so the outcome is the same. However since the indexOf method will be searching more of the string (starting from 0 instead of the 4th character) it will run fractionally slower.

Javascript RegExp Producing Unusual Results with Global Modifer [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 9 years ago.
I was able to work around this issue as it turned out I didn't need /g. But I was wondering if anyone would be able to explain why the following behavior occurred.
x = RegExp( "w", "gi" )
x.test( "Women" )
= true
x.test( "Women" )
= false
It would continue to alternate between true and false when evaluating the expression. Which was an issue because I was using the same compiled RegExp on a list of strings, leading some to evaluate to false when they should have been true.
You should not be using global modifier in a regex used for test, because it preserves the index of the last search and starts the next test from there.
I'd asked the same question.
When you use the g flag, the regex stores the end position of the match in its lastIndex property. The next time you call any of test(), exec(), or match(), the regex will start from that index in the string to try and find a match.
When no match is found, it will return null, and lastIndex is reset to 0. This is why your test kept alternating. It would match the W, and then lastIndex would be set to 1. The next time you called it, null would be returned, and lastIndex would be reset.
A pitfall related to this is when your regex can match the empty string. In that case, lastIndex will not change, and if you are getting all matches, there will be an infinite loop. In this case you should manually adjust lastIndex if it matched the empty string.
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp/test
As with exec (or in combination with it), test called multiple times on the same global regular expression instance will advance past the previous match.
Essentially, the RegExp object x keeps track of its last match internally. When you call .test again, it attempts to match starting after the "w"
Of course this is only true of a regex object instance.
> /w/gi.test('Women')
true
> /w/gi.test('Women')
true

Why does Javascript's regex.exec() not always return the same value? [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 6 years ago.
In the Chrome or Firebug console:
reg = /ab/g
str = "abc"
reg.exec(str)
==> ["ab"]
reg.exec(str)
==> null
reg.exec(str)
==> ["ab"]
reg.exec(str)
==> null
Is exec somehow stateful and depends on what it returned the previous time? Or is this just a bug? I can't get it to happen all the time. For example, if 'str' above were "abc abc" it doesn't happen.
A JavaScript RegExp object is stateful.
When the regex is global, if you call a method on the same regex object, it will start from the index past the end of the last match.
When no more matches are found, the index is reset to 0 automatically.
To reset it manually, set the lastIndex property.
reg.lastIndex = 0;
This can be a very useful feature. You can start the evaluation at any point in the string if desired, or if in a loop, you can stop it after a desired number of matches.
Here's a demonstration of a typical approach to using the regex in a loop. It takes advantage of the fact that exec returns null when there are no more matches by performing the assignment as the loop condition.
var re = /foo_(\d+)/g,
str = "text foo_123 more text foo_456 foo_789 end text",
match,
results = [];
while (match = re.exec(str))
results.push(+match[1]);
DEMO: http://jsfiddle.net/pPW8Y/
If you don't like the placement of the assignment, the loop can be reworked, like this for example...
var re = /foo_(\d+)/g,
str = "text foo_123 more text foo_456 foo_789 end text",
match,
results = [];
do {
match = re.exec(str);
if (match)
results.push(+match[1]);
} while (match);
DEMO: http://jsfiddle.net/pPW8Y/1/
From MDN docs:
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test will also advance the lastIndex property).
Since you are using the g flag, exec continues from the last matched string until it gets to the end (returns null), then starts over.
Personally, I prefer to go the other way around with str.match(reg)
Multiple Matches
If your regex need the g flag (global match), you will need to reset the index (position of the last match) by using the lastIndex property.
reg.lastIndex = 0;
This is due to the fact that exec() will stop on each occurence so you can run again on the remaining part. This behavior also exists with test()) :
If your regular expression uses the "g" flag, you can use the exec
method multiple times to find successive matches in the same string.
When you do so, the search starts at the substring of str specified by
the regular expression's lastIndex property (test will also advance
the lastIndex property)
Single Match
When there is only one possible match, you can simply rewrite you regex by omitting the g flag, as the index will be automatically reset to 0.

Categories

Resources