Unusual javascript Regex result, explanation please! - javascript

I'm developing in VS2005 and have some JS code in my page. I have set a breakpoint during a particular loop where I was having an issue. Here is my little conversation with the IDE--
? ind
/d/g
? ind.test("d")
true
? ind.test("dtn")
false
? ind.test("dtn")
true
? ind.test("dtn")
false
? ind.test("dtn")
true
? ind.test("dtn")
false
Why is the test alternating between true and false? ind is my RegEx - I set it like this:
case "datetime" : ind = new RegExp("d","g");break;
UPDATE
So I've solved my issue by changing my declaration to
ind = /d/;
ie omitting the global modifier. I suppose that
ind = RegExp("d");
would work equally as well.
The question remains though. Why was the global modifier causing the test to alternate between true and false?

As with exec (or in combination with
it), test called multiple times on the
same global regular expression
instance will advance past the
previous match.
Source: https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp/test
So what exactly happens here is that since you're using the global option for the regex, it will continue to search the string after it found an match.
ind.test("d")
This will find d at position 0.
ind.test("d")
This will now search for d starting at position 1, but since that's the end of the string it will not find anything therefore returning false.
We can use the lastIndex property of the regex to proof that:
ind.lastIndex
>> 0
ind.test("d")
>> true
ind.lastIndex
>> 1
ind.test("d")
>> false

Calling re.test(str) is equivalent to re.exec(str) != null (see specification of RegExp.prototype.test(string)).
And when calling exec on a regular expression with g modifier repeatedly, the search is not started at the begin of the string but at the position of where the previous search ended (lastIndex, initialized with 0):
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test will also advance the lastIndex property).
That’s why you’re getting this odd result.

Related

javascript regular expression error in test function? [duplicate]

What is the meaning of the g flag in regular expressions?
What is is the difference between /.+/g and /.+/?
g is for global search. Meaning it'll match all occurrences. You'll usually also see i which means ignore case.
Reference: global - JavaScript | MDN
The "g" flag indicates that the regular expression should be tested against all possible matches in a string.
Without the g flag, it'll only test for the first.
Additionally, make sure to check cchamberlain's answer below for details on how it sets the lastIndex property, which can cause unexpected side effects when re-using a regex against a series of values.
Example in Javascript to explain:
> 'aaa'.match(/a/g)
[ 'a', 'a', 'a' ]
> 'aaa'.match(/a/)
[ 'a', index: 0, input: 'aaa' ]
As #matiska pointed out, the g flag sets the lastIndex property as well.
A very important side effect of this is if you are reusing the same regex instance against a matching string, it will eventually fail because it only starts searching at the lastIndex.
// regular regex
const regex = /foo/;
// same regex with global flag
const regexG = /foo/g;
const str = " foo foo foo ";
const test = (r) => console.log(
r,
r.lastIndex,
r.test(str),
r.lastIndex
);
// Test the normal one 4 times (success)
test(regex);
test(regex);
test(regex);
test(regex);
// Test the global one 4 times
// (3 passes and a fail)
test(regexG);
test(regexG);
test(regexG);
test(regexG);
g is the global search flag.
The global search flag makes the RegExp search for a pattern throughout the string, creating an array of all occurrences it can find matching the given pattern.
So the difference between /.+/g and /.+/ is that the g version will find every occurrence instead of just the first.
There is no difference between /.+/g and /.+/ because they will both only ever match the whole string once. The g makes a difference if the regular expression could match more than once or contains groups, in which case .match() will return an array of the matches instead of an array of the groups.
g -> returns all matches
without g -> returns first match
example:
'1 2 1 5 6 7'.match(/\d+/) returns ["1", index: 0, input: "1 2 1 5 6 7", groups: undefined]. As you see we can only take first match "1".
'1 2 1 5 6 7'.match(/\d+/g) returns an array of all matches ["1", "2", "1", "5", "6", "7"].
Beside already mentioned meaning of g flag, it influences regexp.lastIndex property:
The lastIndex is a read/write integer property of regular expression
instances that specifies the index at which to start the next match.
(...) This property is set only if the regular expression instance
used the "g" flag to indicate a global search.
Reference: Mozilla Developer Network
G in regular expressions is a defines a global search, meaning that it would search for all the instances on all the lines.
Will give example based on string. If we want to remove all occurences from a
string.
Lets say if we want to remove all occurences of "o" with "" from "hello world"
"hello world".replace(/o/g,'');
In my case i have a problem that i need to reevaluate string each time from the first letter, for this a have to remove /my_regexp/g(global flag) to stop moving lastIndex.
as mentioned in mdn:
Be sure that the global (g) flag is set, or lastIndex will never be advanced.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec#specifications

Eloquent Javascript Looping over RegExp Matches

The following example is a bit confusing to me:
var text = "A string with 3 numbers in it ... 42 and 88.";
var number = /\b(\d+)\b/g;
var match;
while (match = number.exec(text)){
console.log("Found", match[1], "at", match.index);
}
Specifically, I don't understand how this has a "looping" effect. How does it run through all the matches within one string if it keeps calling match[1]. Is there some kind of side effect with exec that I am unaware of?
Edit:
I still would like an answer to how match[1] is working.
How does match[1] produce any answer? When I test this type of thing myself, I get undefined, look
> var y = /\d+/g.exec('5')
undefined
> y
[ '5', index: 0, input: '5' ]
> y[1]
undefined
Whats going on here? Wouldn't it be y[0], or in the case above, match[0]? Like:
> y[0]
'5'
The RegExp object remembers the last matched position with lastIndex property.
Quoting MDN Documentation,
If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property).
Important Note: The first part of the first line of the quoted section is important. If your regular expression uses the"g"flag. Only if the RegEx has g flag you will get this behavior.

Why is Javascript Regex matching every second time?

I'm defining a regex object and then matching it in a loop. It only matches sometimes, to be precise - every second time. So I created a smallest working sample of this problem.
I tried this code in Opera and Firefox. The behavior is the same in both:
>>> domainRegex = /(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/g;
/(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/g
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
>>> domainRegex.exec('mail-we0-f174.google.com');
Array [".google.com", "google.com"]
>>> domainRegex.exec('mail-we0-f174.google.com');
null
Why is this happening? Is this behaviour documented? Is there a way around, other than defining the regex inside loop body?
exec() works in the manner you have described; with the /g modifier present, it will return a match, starting from lastIndex with every invocation until there are no more matches, at which point it returns null and the value of lastIndex is reset to 0.
However, because you have anchored the expression using $ there won't be more than one match, so you can use String.match() instead and lose the /g modifier:
var domainRegex = /(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/;
'mail-we0-f174.google.com'.match(domainRegex); // [".google.com", "google.com"]
Additional Info to Ja͢cks response:
You can also set lastIndex
var myRgx = /test/g;
myRgx.exec(someString);
myRgx.lastIndex = 0;
or just create a new regex for each execution, which i find even cleaner
new RegExp(myRgx).exec(someString);
When performing a global search with a RegExp, the exec method starts matching beginning at
the lastIndex property. The lastIndex property is set at each exec invocation and
is set to the position following the last match found. If a match fails, lastIndex is reset to 0, which causes exec to match from the start again.
var a = 'asdfeeeasdfeedxasdf'
undefined
var p = /asdf/g
p.lastIndex
4
p.exec(a)
["asdf"]
p.lastIndex
11
p.exec(a)
["asdf"]
p.lastIndex
19
p.exec(a)
null //match failed
p.lastIndex
0 //lastIndex reset. next match will start at the beginning of the string a
p.exec(a)
["asdf"]
Each time you run the exec method of your regex it gets you the next match.
Once it reaches the end of the string, it returns null to let you know you've got all of the matches. The next time, it starts again from the begining.
As you only have one match (which returns an array of the full match and the match from the brackets), The first time, the regex starts searching from the start. It finds a match and returns it. The next time, it gets to the end and returns null. So if you had this in a loop, you could do something like this to loop through all matches:
while(regExpression.exec(string)){
// do something
}
Then the next time, it starts again from position 0.
"Is there a way around?"
Well, if you know there's only one match, or you only want the first match, you can save te result to a variable. There's no need to resuse .exec. If you are interested in all the matches, then you need to keep going until you get null.
why don't you use simple match method for string like
'mail-we0-f174.google.com'.match(/(?:\.|^)([a-z0-9\-]+\.[a-z0-9\-]+)$/)

Javascript RegExp Producing Unusual Results with Global Modifer [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 9 years ago.
I was able to work around this issue as it turned out I didn't need /g. But I was wondering if anyone would be able to explain why the following behavior occurred.
x = RegExp( "w", "gi" )
x.test( "Women" )
= true
x.test( "Women" )
= false
It would continue to alternate between true and false when evaluating the expression. Which was an issue because I was using the same compiled RegExp on a list of strings, leading some to evaluate to false when they should have been true.
You should not be using global modifier in a regex used for test, because it preserves the index of the last search and starts the next test from there.
I'd asked the same question.
When you use the g flag, the regex stores the end position of the match in its lastIndex property. The next time you call any of test(), exec(), or match(), the regex will start from that index in the string to try and find a match.
When no match is found, it will return null, and lastIndex is reset to 0. This is why your test kept alternating. It would match the W, and then lastIndex would be set to 1. The next time you called it, null would be returned, and lastIndex would be reset.
A pitfall related to this is when your regex can match the empty string. In that case, lastIndex will not change, and if you are getting all matches, there will be an infinite loop. In this case you should manually adjust lastIndex if it matched the empty string.
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp/test
As with exec (or in combination with it), test called multiple times on the same global regular expression instance will advance past the previous match.
Essentially, the RegExp object x keeps track of its last match internally. When you call .test again, it attempts to match starting after the "w"
Of course this is only true of a regex object instance.
> /w/gi.test('Women')
true
> /w/gi.test('Women')
true

Bug with RegExp in JavaScript when do global search [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 5 years ago.
Possible Duplicate:
Javascript regex returning true.. then false.. then true.. etc
First of all, apologize for my bad english.
I'm trying to test string to match the pattern, so I has wrote this:
var str = 'test';
var pattern = new RegExp('te', 'gi'); // yes, I know that simple 'i' will be good for this
But I have this unexpected results:
>>> pattern.test(str)
true
>>> pattern.test(str)
false
>>> pattern.test(str)
true
Can anyone explain this?
The reason for this behavior is that RegEx isn't stateless. Your second test will continue to look for the next match in the string, and reports that it doesn't find any more. Further searches starts from the beginning, as lastIndex is reset when no match is found:
var pattern = /te/gi;
pattern.test('test');
>> true
pattern.lastIndex;
>> 2
pattern.test('test');
>> false
pattern.lastIndex;
>> 0
You'll notice how this changes when there are two matches, for instance:
var pattern = /t/gi;
pattern.test('test');
>> true
pattern.lastIndex;
>> 1
pattern.test('test');
>> true
pattern.lastIndex;
>> 4
pattern.test('test');
>> false
pattern.lastIndex;
>> 0
I suppose you ran into this problem: https://bugzilla.mozilla.org/show_bug.cgi?id=237111
Removing the g parameter solves the issue. Basically its due to the lastindex property that remember last value every time you execute test() method
To quote the MDN Docs (emphasis mine):
When you want to know whether a pattern is found in a string use the test method (similar to the String.search method); for more information (but slower execution) use the exec method (similar to the String.match method). As with exec (or in combination with it), test called multiple times on the same global regular expression instance will advance past the previous match.
This is the intended behavior of the RegExp.test(str) method. The regex instance (pattern) stores state which can be seen in the lastIndex property; each time you call "test" it updates that value and following calls with the same argument may or may not yield the same result:
var str="test", pattern=new RegExp("te", "gi");
pattern.lastIndex; // => 0, since it hasn't found any matches yet.
pattern.test(str); // => true, since it matches at position "0".
pattern.lastIndex; // => 2, since the last match ended at position "1".
pattern.test(str); // => false, since there is no match after position "2".

Categories

Resources