Javascript RegExp Producing Unusual Results with Global Modifer [duplicate] - javascript

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 9 years ago.
I was able to work around this issue as it turned out I didn't need /g. But I was wondering if anyone would be able to explain why the following behavior occurred.
x = RegExp( "w", "gi" )
x.test( "Women" )
= true
x.test( "Women" )
= false
It would continue to alternate between true and false when evaluating the expression. Which was an issue because I was using the same compiled RegExp on a list of strings, leading some to evaluate to false when they should have been true.

You should not be using global modifier in a regex used for test, because it preserves the index of the last search and starts the next test from there.
I'd asked the same question.

When you use the g flag, the regex stores the end position of the match in its lastIndex property. The next time you call any of test(), exec(), or match(), the regex will start from that index in the string to try and find a match.
When no match is found, it will return null, and lastIndex is reset to 0. This is why your test kept alternating. It would match the W, and then lastIndex would be set to 1. The next time you called it, null would be returned, and lastIndex would be reset.
A pitfall related to this is when your regex can match the empty string. In that case, lastIndex will not change, and if you are getting all matches, there will be an infinite loop. In this case you should manually adjust lastIndex if it matched the empty string.

https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp/test
As with exec (or in combination with it), test called multiple times on the same global regular expression instance will advance past the previous match.
Essentially, the RegExp object x keeps track of its last match internally. When you call .test again, it attempts to match starting after the "w"
Of course this is only true of a regex object instance.
> /w/gi.test('Women')
true
> /w/gi.test('Women')
true

Related

javascript regular expression error in test function? [duplicate]

What is the meaning of the g flag in regular expressions?
What is is the difference between /.+/g and /.+/?
g is for global search. Meaning it'll match all occurrences. You'll usually also see i which means ignore case.
Reference: global - JavaScript | MDN
The "g" flag indicates that the regular expression should be tested against all possible matches in a string.
Without the g flag, it'll only test for the first.
Additionally, make sure to check cchamberlain's answer below for details on how it sets the lastIndex property, which can cause unexpected side effects when re-using a regex against a series of values.
Example in Javascript to explain:
> 'aaa'.match(/a/g)
[ 'a', 'a', 'a' ]
> 'aaa'.match(/a/)
[ 'a', index: 0, input: 'aaa' ]
As #matiska pointed out, the g flag sets the lastIndex property as well.
A very important side effect of this is if you are reusing the same regex instance against a matching string, it will eventually fail because it only starts searching at the lastIndex.
// regular regex
const regex = /foo/;
// same regex with global flag
const regexG = /foo/g;
const str = " foo foo foo ";
const test = (r) => console.log(
r,
r.lastIndex,
r.test(str),
r.lastIndex
);
// Test the normal one 4 times (success)
test(regex);
test(regex);
test(regex);
test(regex);
// Test the global one 4 times
// (3 passes and a fail)
test(regexG);
test(regexG);
test(regexG);
test(regexG);
g is the global search flag.
The global search flag makes the RegExp search for a pattern throughout the string, creating an array of all occurrences it can find matching the given pattern.
So the difference between /.+/g and /.+/ is that the g version will find every occurrence instead of just the first.
There is no difference between /.+/g and /.+/ because they will both only ever match the whole string once. The g makes a difference if the regular expression could match more than once or contains groups, in which case .match() will return an array of the matches instead of an array of the groups.
g -> returns all matches
without g -> returns first match
example:
'1 2 1 5 6 7'.match(/\d+/) returns ["1", index: 0, input: "1 2 1 5 6 7", groups: undefined]. As you see we can only take first match "1".
'1 2 1 5 6 7'.match(/\d+/g) returns an array of all matches ["1", "2", "1", "5", "6", "7"].
Beside already mentioned meaning of g flag, it influences regexp.lastIndex property:
The lastIndex is a read/write integer property of regular expression
instances that specifies the index at which to start the next match.
(...) This property is set only if the regular expression instance
used the "g" flag to indicate a global search.
Reference: Mozilla Developer Network
G in regular expressions is a defines a global search, meaning that it would search for all the instances on all the lines.
Will give example based on string. If we want to remove all occurences from a
string.
Lets say if we want to remove all occurences of "o" with "" from "hello world"
"hello world".replace(/o/g,'');
In my case i have a problem that i need to reevaluate string each time from the first letter, for this a have to remove /my_regexp/g(global flag) to stop moving lastIndex.
as mentioned in mdn:
Be sure that the global (g) flag is set, or lastIndex will never be advanced.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec#specifications

Why is Javascript Regex matching every second time?

I'm defining a regex object and then matching it in a loop. It only matches sometimes, to be precise - every second time. So I created a smallest working sample of this problem.
I tried this code in Opera and Firefox. The behavior is the same in both:
>>> domainRegex = /(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/g;
/(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/g
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
Why is this happening? Is this behaviour documented? Is there a way around, other than defining the regex inside loop body?
exec() works in the manner you have described; with the /g modifier present, it will return a match, starting from lastIndex with every invocation until there are no more matches, at which point it returns null and the value of lastIndex is reset to 0.
However, because you have anchored the expression using $ there won't be more than one match, so you can use String.match() instead and lose the /g modifier:
var domainRegex = /(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/;
'mail-we0-f174.google.com'.match(domainRegex); // [".google.com", "google.com"]
Additional Info to Ja͢cks response:
You can also set lastIndex
var myRgx = /test/g;
myRgx.exec(someString);
myRgx.lastIndex = 0;
or just create a new regex for each execution, which i find even cleaner
new RegExp(myRgx).exec(someString);
When performing a global search with a RegExp, the exec method starts matching beginning at
the lastIndex property. The lastIndex property is set at each exec invocation and
is set to the position following the last match found. If a match fails, lastIndex is reset to 0, which causes exec to match from the start again.
var a = 'asdfeeeasdfeedxasdf'
undefined
var p = /asdf/g
p.lastIndex
4
p.exec(a)
["asdf"]
p.lastIndex
11
p.exec(a)
["asdf"]
p.lastIndex
19
p.exec(a)
null //match failed
p.lastIndex
0 //lastIndex reset. next match will start at the beginning of the string a
p.exec(a)
["asdf"]
Each time you run the exec method of your regex it gets you the next match.
Once it reaches the end of the string, it returns null to let you know you've got all of the matches. The next time, it starts again from the begining.
As you only have one match (which returns an array of the full match and the match from the brackets), The first time, the regex starts searching from the start. It finds a match and returns it. The next time, it gets to the end and returns null. So if you had this in a loop, you could do something like this to loop through all matches:
while(regExpression.exec(string)){
// do something
}
Then the next time, it starts again from position 0.
"Is there a way around?"
Well, if you know there's only one match, or you only want the first match, you can save te result to a variable. There's no need to resuse .exec. If you are interested in all the matches, then you need to keep going until you get null.
why don't you use simple match method for string like
'mail-we0-f174.google.com'.match(/(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/)

Why `pattern.test(name)` opposite results on consecutive calls [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 7 years ago.
Why is this code returning first true, then false
var pattern = new RegExp("mstea", 'gi'), name = "Amanda Olmstead";
console.log('1', pattern.test(name));
console.log('1', pattern.test(name));
Demo: Fiddle
g is for repeating searches. It changes the regular expression object into an iterator. If you want to use the test function to check your string is valid according to your pattern, remove this modifier :
var pattern = new RegExp("mstea", 'i'), name = "Amanda Olmstead";
The test function, contrary to replace or match doesn't consume the whole iteration, which lets it in a "bad" state. You should probably never use this modifier when using the test function.
You don't want to use gi in combination with pattern.test. The g flag means that it keeps track of where you are running so it can be reused. So instead, you should use:
var pattern = new RegExp("mstea", 'i'), name = "Amanda Olmstead";
console.log('1', pattern.test(name));
console.log('1', pattern.test(name));
Also, you can use /.../[flags] syntax for regex, like so:
var pattern = /mstea/i;
Because you set the g modifier.
Remove it for your case.
var pattern = new RegExp("mstea", 'i'), name = "Amanda Olmstead";
It isn't a bug.
The g causes it to carry out the next attempted match for the substring, after the first match. And that is why it returns false in every even attempt.
First attempt:
It is testing "Amanda Olmstead"
Second attempt:
It is testing "d" //match found in previous attempt (performs substring there)
Third attempt:
It is testing "Amanda Olmstead" again //no match found in previous attempt
... so on
MDN page for Regexp.exec states:
If your regular expression uses the "g" flag, you can use the exec
method multiple times to find successive matches in the same string.
When you do so, the search starts at the substring of str specified by
the regular expression's lastIndex property
MDN page for test states:
As with exec (or in combination with it), test called multiple times
on the same global regular expression instance will advance past the
previous match.

Why does Javascript's regex.exec() not always return the same value? [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 6 years ago.
In the Chrome or Firebug console:
reg = /ab/g
str = "abc"
reg.exec(str)
==> ["ab"]
reg.exec(str)
==> null
reg.exec(str)
==> ["ab"]
reg.exec(str)
==> null
Is exec somehow stateful and depends on what it returned the previous time? Or is this just a bug? I can't get it to happen all the time. For example, if 'str' above were "abc abc" it doesn't happen.
A JavaScript RegExp object is stateful.
When the regex is global, if you call a method on the same regex object, it will start from the index past the end of the last match.
When no more matches are found, the index is reset to 0 automatically.
To reset it manually, set the lastIndex property.
reg.lastIndex = 0;
This can be a very useful feature. You can start the evaluation at any point in the string if desired, or if in a loop, you can stop it after a desired number of matches.
Here's a demonstration of a typical approach to using the regex in a loop. It takes advantage of the fact that exec returns null when there are no more matches by performing the assignment as the loop condition.
var re = /foo_(\d+)/g,
str = "text foo_123 more text foo_456 foo_789 end text",
match,
results = [];
while (match = re.exec(str))
results.push(+match[1]);
DEMO: http://jsfiddle.net/pPW8Y/
If you don't like the placement of the assignment, the loop can be reworked, like this for example...
var re = /foo_(\d+)/g,
str = "text foo_123 more text foo_456 foo_789 end text",
match,
results = [];
do {
match = re.exec(str);
if (match)
results.push(+match[1]);
} while (match);
DEMO: http://jsfiddle.net/pPW8Y/1/
From MDN docs:
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test will also advance the lastIndex property).
Since you are using the g flag, exec continues from the last matched string until it gets to the end (returns null), then starts over.
Personally, I prefer to go the other way around with str.match(reg)
Multiple Matches
If your regex need the g flag (global match), you will need to reset the index (position of the last match) by using the lastIndex property.
reg.lastIndex = 0;
This is due to the fact that exec() will stop on each occurence so you can run again on the remaining part. This behavior also exists with test()) :
If your regular expression uses the "g" flag, you can use the exec
method multiple times to find successive matches in the same string.
When you do so, the search starts at the substring of str specified by
the regular expression's lastIndex property (test will also advance
the lastIndex property)
Single Match
When there is only one possible match, you can simply rewrite you regex by omitting the g flag, as the index will be automatically reset to 0.

Bug with RegExp in JavaScript when do global search [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 5 years ago.
Possible Duplicate:
Javascript regex returning true.. then false.. then true.. etc
First of all, apologize for my bad english.
I'm trying to test string to match the pattern, so I has wrote this:
var str = 'test';
var pattern = new RegExp('te', 'gi'); // yes, I know that simple 'i' will be good for this
But I have this unexpected results:
>>> pattern.test(str)
true
>>> pattern.test(str)
false
>>> pattern.test(str)
true
Can anyone explain this?
The reason for this behavior is that RegEx isn't stateless. Your second test will continue to look for the next match in the string, and reports that it doesn't find any more. Further searches starts from the beginning, as lastIndex is reset when no match is found:
var pattern = /te/gi;
pattern.test('test');
>> true
pattern.lastIndex;
>> 2
pattern.test('test');
>> false
pattern.lastIndex;
>> 0
You'll notice how this changes when there are two matches, for instance:
var pattern = /t/gi;
pattern.test('test');
>> true
pattern.lastIndex;
>> 1
pattern.test('test');
>> true
pattern.lastIndex;
>> 4
pattern.test('test');
>> false
pattern.lastIndex;
>> 0
I suppose you ran into this problem: https://bugzilla.mozilla.org/show_bug.cgi?id=237111
Removing the g parameter solves the issue. Basically its due to the lastindex property that remember last value every time you execute test() method
To quote the MDN Docs (emphasis mine):
When you want to know whether a pattern is found in a string use the test method (similar to the String.search method); for more information (but slower execution) use the exec method (similar to the String.match method). As with exec (or in combination with it), test called multiple times on the same global regular expression instance will advance past the previous match.
This is the intended behavior of the RegExp.test(str) method. The regex instance (pattern) stores state which can be seen in the lastIndex property; each time you call "test" it updates that value and following calls with the same argument may or may not yield the same result:
var str="test", pattern=new RegExp("te", "gi");
pattern.lastIndex; // => 0, since it hasn't found any matches yet.
pattern.test(str); // => true, since it matches at position "0".
pattern.lastIndex; // => 2, since the last match ended at position "1".
pattern.test(str); // => false, since there is no match after position "2".

Categories

Resources