Local Regex Match Variable Not Updated - javascript

I am looping through an array of objects and mapping them to my own custom objects. I am extracting data via regular expressions. My first run through the loop works fine, but in subsequent iterations, although they match, the match variables do not get set.
Here is one of the regex's:
var gameRegex = /(\^?)([A-z]+)\s?(\d+)?\s+(at\s)?(\^?)([A-z]+)\s?(\d+)?\s\((.*)\)/g;
Here is the initial part of my loop:
for(var i = 1; i <= data.count; i++) {
var gameMatch = gameRegex.exec(data["left" + i]);
var idMatch = idRegex.exec(data["url" + i]);
First time through, gameMatch and idMatch have values. The following iterations do not work even though I have tested that they do work.
Is there something about regular expressions, maybe especially in Node.js, that I need to do if I use them more than once?

When you have a regular expression with a global flag /.../g and use exec() with it, JavaScript sets a property named lastIndex on that regex.
s = "abab";
r = /a/g;
r.exec(s); // ["a"]
r.lastIndex // 1
r.exec(s); // ["a"]
r.lastIndex // 3
r.exec(s); // null
r.lastIndex // 0
This is meant to be used for multiple matches in the same string. You could call exec() again and again and with every call lastIndex is increased - defining automagically where the next execution will start:
while (match = r.exec(s)) {
console.log(match);
}
Now lastIndex will be off after the first invocation of exec(). But since you pass in a different string every time, the expression will not match anymore.
There are two ways to solve this:
manually set the r.lastIndex = 0 every time or
remove the g global flag
In your case the latter option would be the right one.
Further reading:
.exec() on the MDN
.exec() on the MSDN
"How to Use The JavaScript RegExp Object" on regular-expressions.info

Related

Regular expressions in JavaScript with the global flag

It appears that the RegExp intrinsic is stateful.
So calling it twice on the same string will yield different results when the global flag g is supplied, as it advances a search along the string.
So:
var r = /(\d{3})/g;
console.log(r.test('123')); // true
console.log(r.test('123')); // false - because the search has moved past the first match
But if I add an intermediate test, I get the following:
var r = /(\d{3})/g;
console.log(r.test('123')); // true
console.log(r.test('456')); // true
console.log(r.test('123')); // true!
So is it correct to say that RegExp instances operate on the principle of considering only the last string evaluated? If the string differs from the last, it is effectively reset?
So is it correct to say that RegExp instances operate on the principle of considering only the last string evaluated?
yes
If the string differs from the last, it is effectively reset?
correct
If the global flag is omitted, is the regular expression reset in between tests?
right
Check out RegExp#lastIndex

Something strange about regular expression [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 9 years ago.
Say I have the following regular expression code:
var str = "{$ for ( var i = 0, len = O.length; i < len; i++ ) { $}";
var reg = /\{\$(.*?)\$\}/g;
console.log(reg.test(str));
console.log(reg.test(str));
console.log(reg.test(str));
Why is the result an alternation of True and False?
Per docs:
When you want to know whether a pattern is found in a string use the test method (similar to the String.search method); for more information (but slower execution) use the exec method (similar to the String.match method). As with exec (or in combination with it), test called multiple times on the same global regular expression instance will advance past the previous match.
To illustrate what's happening, we can use exec and see what's happening. In order of passes:
original string (truthy)
The entire string is matched which is evaluated as true.
null (falsy)
Consecutive calls continue on, so since the first called returned the entire result, we are left with null.
original string (truthy)
And the pattern returns back to start and continues on.
For proof, run the following:
var str = '{$ for ( var i = 0, len = O.length; i < len; i++ ) { $}';
var reg = /\{\$(.*?)\$\}/g;
for (var i = 0; i < 3; i++){
var result = reg.exec(str);
console.log(result);
console.log(!!result);
}
JavaScript RegExps maintain state internally, including fields such as the last index matched. Because of that, you can see some interesting behavior when reusing a regex as in the example you give.
The /g flag would cause it to return true once for each successive match against the given string (of which there only happens to be one in your example), after which it would return false once and then start all over again. Between each call, the aforementioned lastIndex property would be updated accordingly.
Consider the following:
var str = "12";
var regex = /\d/g;
console.log(regex.test(str)); // true
console.log(regex.test(str)); // true
console.log(regex.test(str)); // false
Versus:
console.log(/\d/g.test(str)); // true
console.log(/\d/g.test(str)); // true
console.log(/\d/g.test(str)); // true
console.log(/\d/g.test(str)); // true
// ...and so on, since you're instantiating a new RegExp each time

Replace a Regex capture group with uppercase in Javascript

I'd like to know how to replace a capture group with its uppercase in JavaScript. Here's a simplified version of what I've tried so far that's not working:
> a="foobar"
'foobar'
> a.replace( /(f)/, "$1".toUpperCase() )
'foobar'
> a.replace( /(f)/, String.prototype.toUpperCase.apply("$1") )
'foobar'
Would you explain what's wrong with this code?
You can pass a function to replace.
var r = a.replace(/(f)/, function(v) { return v.toUpperCase(); });
Explanation
a.replace( /(f)/, "$1".toUpperCase())
In this example you pass a string to the replace function. Since you are using the special replace syntax ($N grabs the Nth capture) you are simply giving the same value. The toUpperCase is actually deceiving because you are only making the replace string upper case (Which is somewhat pointless because the $ and one 1 characters have no upper case so the return value will still be "$1").
a.replace( /(f)/, String.prototype.toUpperCase.apply("$1"))
Believe it or not the semantics of this expression are exactly the same.
I know I'm late to the party but here is a shorter method that is more along the lines of your initial attempts.
a.replace('f', String.call.bind(a.toUpperCase));
So where did you go wrong and what is this new voodoo?
Problem 1
As stated before, you were attempting to pass the results of a called method as the second parameter of String.prototype.replace(), when instead you ought to be passing a reference to a function
Solution 1
That's easy enough to solve. Simply removing the parameters and parentheses will give us a reference rather than executing the function.
a.replace('f', String.prototype.toUpperCase.apply)
Problem 2
If you attempt to run the code now you will get an error stating that undefined is not a function and therefore cannot be called. This is because String.prototype.toUpperCase.apply is actually a reference to Function.prototype.apply() via JavaScript's prototypical inheritance. So what we are actually doing looks more like this
a.replace('f', Function.prototype.apply)
Which is obviously not what we have intended. How does it know to run Function.prototype.apply() on String.prototype.toUpperCase()?
Solution 2
Using Function.prototype.bind() we can create a copy of Function.prototype.call with its context specifically set to String.prototype.toUpperCase. We now have the following
a.replace('f', Function.prototype.apply.bind(String.prototype.toUpperCase))
Problem 3
The last issue is that String.prototype.replace() will pass several arguments to its replacement function. However, Function.prototype.apply() expects the second parameter to be an array but instead gets either a string or number (depending on if you use capture groups or not). This would cause an invalid argument list error.
Solution 3
Luckily, we can simply substitute in Function.prototype.call() (which accepts any number of arguments, none of which have type restrictions) for Function.prototype.apply(). We have now arrived at working code!
a.replace(/f/, Function.prototype.call.bind(String.prototype.toUpperCase))
Shedding bytes!
Nobody wants to type prototype a bunch of times. Instead we'll leverage the fact that we have objects that reference the same methods via inheritance. The String constructor, being a function, inherits from Function's prototype. This means that we can substitute in String.call for Function.prototype.call (actually we can use Date.call to save even more bytes but that's less semantic).
We can also leverage our variable 'a' since it's prototype includes a reference to String.prototype.toUpperCase we can swap that out with a.toUpperCase. It is the combination of the 3 solutions above and these byte saving measures that is how we get the code at the top of this post.
Why don't we just look up the definition?
If we write:
a.replace(/(f)/, x => x.toUpperCase())
we might as well just say:
a.replace('f','F')
Worse, I suspect nobody realises that their examples have been working only because they were capturing the whole regex with parentheses. If you look at the definition, the first parameter passed to the replacer function is actually the whole matched pattern and not the pattern you captured with parentheses:
function replacer(match, p1, p2, p3, offset, string)
If you want to use the arrow function notation:
a.replace(/xxx(yyy)zzz/, (match, p1) => p1.toUpperCase()
Old post but it worth to extend #ChaosPandion answer for other use cases with more restricted RegEx. E.g. ensure the (f) or capturing group surround with a specific format /z(f)oo/:
> a="foobazfoobar"
'foobazfoobar'
> a.replace(/z(f)oo/, function($0,$1) {return $0.replace($1, $1.toUpperCase());})
'foobazFoobar'
// Improve the RegEx so `(f)` will only get replaced when it begins with a dot or new line, etc.
I just want to highlight the two parameters of function makes finding a specific format and replacing a capturing group within the format possible.
SOLUTION
a.replace(/(f)/,(m,g)=>g.toUpperCase())
for replace all grup occurrences use /(f)/g regexp. The problem in your code: String.prototype.toUpperCase.apply("$1") and "$1".toUpperCase() gives "$1" (try in console by yourself) - so it not change anything and in fact you call twice a.replace( /(f)/, "$1") (which also change nothing).
let a= "foobar";
let b= a.replace(/(f)/,(m,g)=>g.toUpperCase());
let c= a.replace(/(o)/g,(m,g)=>g.toUpperCase());
console.log("/(f)/ ", b);
console.log("/(o)/g", c);
Given a dictionary (object, in this case, a Map) of property, values, and using .bind() as described at answers
const regex = /([A-z0-9]+)/;
const dictionary = new Map([["hello", 123]]);
let str = "hello";
str = str.replace(regex, dictionary.get.bind(dictionary));
console.log(str);
Using a JavaScript plain object and with a function defined to get return matched property value of the object, or original string if no match is found
const regex = /([A-z0-9]+)/;
const dictionary = {
"hello": 123,
[Symbol("dictionary")](prop) {
return this[prop] || prop
}
};
let str = "hello";
str = str.replace(regex, dictionary[Object.getOwnPropertySymbols(dictionary)[0]].bind(dictionary));
console.log(str);
In the case of string conversion from CamelCase to bash_case (ie: for filenames), use a callback with ternary operator.
The captured group selected with a regexp () in the first (left) replace arg is sent to the second (right) arg that is a callback function.
x and y give the captured string (don't know why 2 times!) and index (the third one) gives the index of the beginning of the captured group in the reference string.
Therefor a ternary operator can be used not to place _ at first occurence.
let str = 'MyStringName';
str = str.replace(/([^a-z0-9])/g, (x,y,index) => {
return index != 0 ? '_' + x.toLowerCase() : x.toLowerCase();
});
console.log(str);

javascript JSON and Array elements, help me understand the rule about quotes

When using a returned value to determine the number of an element in an array, does javascript throw quotes around it?
Example :
This tallys the number of times unique characters are used.
var uniques = {};
function charFreq(s)
{
for(var i = 0; i < s.length; i++)
{
if(isNaN(uniques[s.charAt(i)])) uniques[s.charAt(i)] = 1;
else uniques[s.charAt(i)] = uniques[s.charAt(i)] + 1;
}
return uniques;
}
console.log(charFreq("ahasdsadhaeytyeyeyehahahdahsdhadhahhhhhhhhhha"));
It just seems funny that uniques[s.charAt(i)] works, and uniques[a] wont work (due to lack of quotes). uniques[a] will get you a nasty 'a is undefined'.
When you access a JavaScript object using the [] notation, you are using a string as a key in the object. You can also address properties using the . notation:
uniques.a is the same as uniques['a']
The reason you aren't adding quotes to the s.charAt(i) is that it returns a string, which is then used as the property to check on the uniques object.
uniques[a] will create an error, because no variable with the name a has been defined.
In the first version -- uniques[s.charAt(i)] -- you're doing the lookup using an expression. JavaScript evaluates the expression -- s.charAt(i) -- and uses the evaluated value (maybe a) to perform the lookup in the uniques map.
In the second version -- uniques[a] -- you want to do the lookup using the literal character a, but unless you wrap it in quotes then JavaScript treats the a as an expression rather than a literal. When it tries to evaluate the "expression" then you get an error.
So the rule is: character/string literals need quotes; expressions that evaluate to characters/strings don't.
This is how Javascript evaluates the expression between [] like uniques[s.charAt(i)] which is of the type MemberExpression[ Expression ] :
Let propertyNameReference be the result of evaluating Expression.
Let propertyNameValue be GetValue(propertyNameReference).
Let propertyNameString be ToString(propertyNameValue).
So in the 3rd step it is converting the property name into a string.

javascript string exec strange behavior [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 8 months ago.
have funciton in my object which is called regularly.
parse : function(html)
{
var regexp = /...some pattern.../
var match = regexp.exec(html);
while (match != null)
{
...
match = regexp.exec(html);
}
...
var r = /...pattern.../g;
var m = r.exec(html);
}
with unchanged html the m returns null each other call. let's say
parse(html);// ok
parse(html);// m is null!!!
parse(html);// ok
parse(html);// m is null!!!
// ...and so on...
is there any index or somrthing that has to be reset on html ... I'm really confused. Why match always returns proper result?
This is a common behavior when you deal with patterns that have the global g flag, and you use the exec or test methods.
In this case the RegExp object will keep track of the lastIndex where a match was found, and then on subsequent matches it will start from that lastIndex instead of starting from 0.
Edit: In response to your comment, why doesn't the RegExp object being re-created when you call the function again:
This is the behavior described for regular expression literals, let me quote the specification:
ยง 7.8.5 - Regular Expression Literals
...
The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object.
....
You can make a simple proof by:
function createRe() {
var re = /foo/g;
return re;
}
createRe() === createRe(); // true, it's the same object
You can be sure that is the same object, because "two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical", e.g.:
/foo/ === /foo/; // always false...
However this behavior is respected on all browser but not by IE, which initializes a new RegExp object every time.
To avoid this behavior as it might be needed in this case, simply set
var r = /...pattern.../g;
var m = r.exec(html);
r.lastIndex=0;
This worked for me.

Categories

Resources