javascript string exec strange behavior [duplicate] - javascript

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 8 months ago.
have funciton in my object which is called regularly.
parse : function(html)
{
var regexp = /...some pattern.../
var match = regexp.exec(html);
while (match != null)
{
...
match = regexp.exec(html);
}
...
var r = /...pattern.../g;
var m = r.exec(html);
}
with unchanged html the m returns null each other call. let's say
parse(html);// ok
parse(html);// m is null!!!
parse(html);// ok
parse(html);// m is null!!!
// ...and so on...
is there any index or somrthing that has to be reset on html ... I'm really confused. Why match always returns proper result?

This is a common behavior when you deal with patterns that have the global g flag, and you use the exec or test methods.
In this case the RegExp object will keep track of the lastIndex where a match was found, and then on subsequent matches it will start from that lastIndex instead of starting from 0.
Edit: In response to your comment, why doesn't the RegExp object being re-created when you call the function again:
This is the behavior described for regular expression literals, let me quote the specification:
§ 7.8.5 - Regular Expression Literals
...
The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object.
....
You can make a simple proof by:
function createRe() {
var re = /foo/g;
return re;
}
createRe() === createRe(); // true, it's the same object
You can be sure that is the same object, because "two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical", e.g.:
/foo/ === /foo/; // always false...
However this behavior is respected on all browser but not by IE, which initializes a new RegExp object every time.

To avoid this behavior as it might be needed in this case, simply set
var r = /...pattern.../g;
var m = r.exec(html);
r.lastIndex=0;
This worked for me.

Related

ES6 square brackets, comma separated values, as expression [duplicate]

I read this question about the "comma operator" in expressions (,) and the MDN docs about it, but I can't think of a scenario where it is useful.
So, when is the comma operator useful?
The following is probably not very useful as you don't write it yourself, but a minifier can shrink code using the comma operator. For example:
if(x){foo();return bar()}else{return 1}
would become:
return x?(foo(),bar()):1
The ? : operator can be used now, since the comma operator (to a certain extent) allows for two statements to be written as one statement.
This is useful in that it allows for some neat compression (39 -> 24 bytes here).
I'd like to stress the fact that the comma in var a, b is not the comma operator because it doesn't exist within an expression. The comma has a special meaning in var statements. a, b in an expression would be referring to the two variables and evaluate to b, which is not the case for var a, b.
The comma operator allows you to put multiple expressions in a place where one expression is expected. The resulting value of multiple expressions separate by a comma will be the value of the last comma separated expression.
I don't personally use it very often because there aren't that many situations where more than one expression is expected and there isn't a less confusing way to write the code than using the comma operator. One interesting possibility is at the end of a for loop when you want more than one variable to be incremented:
// j is initialized to some other value
// as the for loop executes both i and j are incremented
// because the comma operator allows two statements to be put in place of one
for (var i = 0; i < items.len; i++, j++) {
// loop code here that operates on items[i]
// and sometimes uses j to access a different array
}
Here you see that i++, j++ can be put in a place where one expression is allowed. In this particular case, the multiple expressions are used for side affects so it does not matter that the compound expressions takes on the value of the last one, but there are other cases where that might actually matter.
The Comma Operator is frequently useful when writing functional code in Javascript.
Consider this code I wrote for a SPA a while back which had something like the following
const actions = _.chain(options)
.pairs() // 1
.filter(selectActions) // 2
.map(createActionPromise) // 3
.reduce((state, pair) => (state[pair[0]] = pair[1], state), {}) // 4
.value();
This was a fairly complex, but real-world scenario. Bear with me while I explain what is happening, and in the process make the case for the Comma Operator.
This uses Underscore's chaining to
Take apart all of the options passed to this function using pairs
which will turn { a: 1, b: 2} into [['a', 1], ['b', 2]]
This array of property pairs is filtered by which ones are deemed to be 'actions' in the system.
Then the second index in the array is replaced with a function that returns a promise representing that action (using map)
Finally the call to reduce will merge each "property array" (['a', 1]) back into a final object.
The end result is a transformed version of the options argument, which contains only the appropriate keys and whose values are consumable by the calling function.
Looking at just
.reduce((state, pair) => (state[pair[0]] = pair[1], state), {})
You can see the reduce function starts with an empty state object, state, and for each pair representing a key and value, the function returns the same state object after adding a property to the object corresponding to the key/value pair. Because of ECMAScript 2015's arrow function syntax, the function body is an expression, and as a result, the Comma Operator allows a concise and useful "iteratee" function.
Personally I have come across numerous cases while writing Javascript in a more functional style with ECMAScript 2015 + Arrow Functions. Having said that, before encountering arrow functions (such as at the time of the writing of the question), I'd never used the comma operator in any deliberate way.
Another use for the comma operator is to hide results you don't care about in the repl or console, purely as a convenience.
For example, if you evaluate myVariable = aWholeLotOfText in the repl or console, it will print all the data you just assigned. This might be pages and pages, and if you'd prefer not to see it, you can instead evaluate myVariable = aWholeLotOfText, 'done', and the repl/console will just print 'done'.
Oriel correctly points out† that customized toString() or get() functions might even make this useful.
Comma operator is not specific to JavaScript, it is available in other languages like C and C++. As a binary operator this is useful when the first operand, which is generally an expression, has desired side effect required by second operand. One example from wikipedia:
i = a += 2, a + b;
Obviously you can write two different lines of codes, but using comma is another option and sometimes more readable.
I'd disagree with Flanagan, and say, that comma is really useful and allows to write more readable and elegant code, especially when you know what you're doing:
Here's the greatly detailed article on comma usage:
Several examples from out from there for the proof of demonstration:
function renderCurve() {
for(var a = 1, b = 10; a*b; a++, b--) {
console.log(new Array(a*b).join('*'));
}
}
A fibonacci generator:
for (
var i=2, r=[0,1];
i<15;
r.push(r[i-1] + r[i-2]), i++
);
// 0,1,1,2,3,5,8,13,21,34,55,89,144,233,377
Find first parent element, analogue of jQuery .parent() function:
function firstAncestor(el, tagName) {
while(el = el.parentNode, el && (el.tagName != tagName.toUpperCase()));
return el;
}
//element in http://ecma262-5.com/ELS5_HTML.htm
var a = $('Section_15.1.1.2');
firstAncestor(a, 'div'); //<div class="page">
I haven't found practical use of it other than that but here is one scenario in which James Padolsey nicely uses this technique for IE detection in a while loop:
var ie = (function(){
var undef,
v = 3,
div = document.createElement('div'),
all = div.getElementsByTagName('i');
while ( // <-- notice no while body here
div.innerHTML = '<!--[if gt IE ' + (++v) + ']><i></i><![endif]-->',
all[0]
);
return v > 4 ? v : undef;
}());
These two lines must to execute :
div.innerHTML = '<!--[if gt IE ' + (++v) + ']><i></i><![endif]-->',
all[0]
And inside comma operator, both are evaluated though one could have made them separate statements somehow.
There is something "odd" that can be done in JavaScript calling a function indirectly by using the comma operator.
There is a long description here:
Indirect function call in JavaScript
By using this syntax:
(function() {
"use strict";
var global = (function () { return this || (1,eval)("this"); })();
console.log('Global === window should be true: ', global === window);
var not_global = (function () { return this })();
console.log('not_global === window should be false: ', not_global === window);
}());
You can get access to the global variable because eval works differently when called directly vs called indirectly.
I've found the comma operator most useful when writing helpers like this.
const stopPropagation = event => (event.stopPropagation(), event);
const preventDefault = event => (event.preventDefault(), event);
const both = compose(stopPropagation, preventDefault);
You could replace the comma with either an || or &&, but then you'd need to know what the function returns.
More important than that, the comma separator communicates intent -- the code doesn't care what the left-operand evaluates to, whereas the alternatives may have another reason for being there. This in turn makes it easier to understand and refactor. If the function return type ever changes, the code above would not be affected.
Naturally you can achieve the same thing in other ways, but not as succinctly. If || and && found a place in common usage, so too can the comma operator.
One typical case I end up using it is during optional argument parsing. I think it makes it both more readable and more concise so that the argument parsing doesn't dominate the function body.
/**
* #param {string} [str]
* #param {object} [obj]
* #param {Date} [date]
*/
function f(str, obj, date) {
// handle optional arguments
if (typeof str !== "string") date = obj, obj = str, str = "default";
if (obj instanceof Date) date = obj, obj = {};
if (!(date instanceof Date)) date = new Date();
// ...
}
Let's say you have an array:
arr = [];
When you push onto that array, you are rarely interested in push's return value, namely the new length of the array, but rather the array itself:
arr.push('foo') // ['foo'] seems more interesting than 1
Using the comma operator, we can push onto the array, specify the array as the last operand to comma, and then use the result -- the array itself -- for a subsequent array method call, a sort of chaining:
(arr.push('bar'), arr.push('baz'), arr).sort(); // [ 'bar', 'baz', 'foo' ]
It saves you from using return in nested conditionals and it's very handy especially with the ternary operator. Such as;
function insert(v){
return this.node > v ? this.left.size < this.right.size ? ( this.left.insert(v)
, this
)
: ( this.left.insert(this.node)
, this.node = this.right.popmin()
, this.insert(v)
, this
)
: this.left.size < this.right.size ? ( this.right.insert(this.node)
, this.node = this.left.popmax()
, this.insert(v)
, this
)
: ( this.right.insert(v)
, this
)
}
I just came across this today looking at the proposals for pipeline operator proposal and partial application...
(https://github.com/tc39/proposal-pipeline-operator
(https://github.com/tc39/proposal-partial-application#hack-style-pipelines)
Also, Hack-style pipelines are already feasible without introducing new syntax today:
let $; // Hack-style topic variable
let result = (
$= books,
$= filter($, _ => _.title = "..."),
$= map($, _ => _.author),
$);
The use of comma expressions here can kind of fake the pipeline operator that isn't in the language yet.
Eliminating the space between $= simulates the feeling of a proper pipe token, |>. Note that the "topic" variable, $, can be anything here and that it's just shorthand for repeatedly overwriting the variable. So something more akin to ...
// blocking inside an IIFE
let result = (() => {
let $;
$ = books;
$ = filter($, _ => _.title = "..."),
$ = map($, _ => _.author),
return $;
})()
The "comma" version successfully cuts out some of the noise, getting you closer to what the proposal would be:
let result = books
|> filter($, _ => _.title = "..."
|> map($, _ => _.author)
Here's another example using it to compose functions:
const double = (x) => 2 * x;
const add = (x, y) => x + y;
const boundScore = (min, max, score) => Math.max(min, Math.min(max, score));
const calculateScore = ($) => (
$= double($),
$= add($, 20),
$= boundScore(0, 100, $),
(console.log($), $)
)
const score = calculateScore(28)
The comma operator (,) evaluates each of its operands (from left to right) and returns the value of the last operand. This lets you create a compound expression in which multiple expressions are evaluated, with the compound expression's final value being the value of the rightmost of its member expressions. This is commonly used to provide multiple parameters to a for loop.
let x = 1;
x = (x++, x);
console.log(x);
// expected output: 2
x = (2, 3);
console.log(x);
// expected output: 3
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Comma_Operator
Another area where comma operator can be used is Code Obfuscation.
Let's say a developper writes some code like this:
var foo = 'bar';
Now, she decides to obfuscate the code. The tool used may changed the code like this:
var Z0b=(45,87)>(195,3)?'bar':(54,65)>(1,0)?'':'baz';// Z0b == 'bar'
Demo: http://jsfiddle.net/uvDuE/

Javascript regex test weirdness

Can anyone explain this (using node version 4.2.4 repl)?
var n; //undefined
/^[a-z0-9]+$/.test(n); // true!
/^[a-f0-9]+$/.test(n); // false
The variable passed to .test() is first converted to string. This is specified in the spec:
http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.6.3
which points to:
http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.6.2
which says:
Let R be this RegExp object.
Let S be the value of ToString(string).
So basically you're testing:
/^[a-z0-9]+$/.test("undefined"); // true!
/^[a-f0-9]+$/.test("undefined"); // false
It should now be obvious why the second test returns false. The letters u, n and i are not included in the test pattern.
Note: The function ToString() in the spec refers to the type coercion function in the underlying implementation (most probably C or C++ though there exist other implementations of js in other languages like Java and Go). It does not refer to the global function toString() in js. As such, that second line in the spec basically means that undefined will be treated as "" + undefined which returns "undefined".
Probably it's converting undefined to a string. So:
var pattern1 = /^[a-z0-9]+$/
var pattern2 = /^[a-f0-9]+$/
pattern1.test("undefined") // There are only letters
pattern2.test("undefined") // defed match, but unin does not.
RegExp.test treats n as a string "undefined".So, the range [a-f] does not cover all the characters of undefined string.In your case, the "mimimum allowable" range for passing regexp check would be [a-u]
var n; //undefined
console.log(/^[a-u]+$/.test(n)); // true

Something strange about regular expression [duplicate]

This question already has answers here:
Why does a RegExp with global flag give wrong results?
(7 answers)
Closed 9 years ago.
Say I have the following regular expression code:
var str = "{$ for ( var i = 0, len = O.length; i < len; i++ ) { $}";
var reg = /\{\$(.*?)\$\}/g;
console.log(reg.test(str));
console.log(reg.test(str));
console.log(reg.test(str));
Why is the result an alternation of True and False?
Per docs:
When you want to know whether a pattern is found in a string use the test method (similar to the String.search method); for more information (but slower execution) use the exec method (similar to the String.match method). As with exec (or in combination with it), test called multiple times on the same global regular expression instance will advance past the previous match.
To illustrate what's happening, we can use exec and see what's happening. In order of passes:
original string (truthy)
The entire string is matched which is evaluated as true.
null (falsy)
Consecutive calls continue on, so since the first called returned the entire result, we are left with null.
original string (truthy)
And the pattern returns back to start and continues on.
For proof, run the following:
var str = '{$ for ( var i = 0, len = O.length; i < len; i++ ) { $}';
var reg = /\{\$(.*?)\$\}/g;
for (var i = 0; i < 3; i++){
var result = reg.exec(str);
console.log(result);
console.log(!!result);
}
JavaScript RegExps maintain state internally, including fields such as the last index matched. Because of that, you can see some interesting behavior when reusing a regex as in the example you give.
The /g flag would cause it to return true once for each successive match against the given string (of which there only happens to be one in your example), after which it would return false once and then start all over again. Between each call, the aforementioned lastIndex property would be updated accordingly.
Consider the following:
var str = "12";
var regex = /\d/g;
console.log(regex.test(str)); // true
console.log(regex.test(str)); // true
console.log(regex.test(str)); // false
Versus:
console.log(/\d/g.test(str)); // true
console.log(/\d/g.test(str)); // true
console.log(/\d/g.test(str)); // true
console.log(/\d/g.test(str)); // true
// ...and so on, since you're instantiating a new RegExp each time

Local Regex Match Variable Not Updated

I am looping through an array of objects and mapping them to my own custom objects. I am extracting data via regular expressions. My first run through the loop works fine, but in subsequent iterations, although they match, the match variables do not get set.
Here is one of the regex's:
var gameRegex = /(\^?)([A-z]+)\s?(\d+)?\s+(at\s)?(\^?)([A-z]+)\s?(\d+)?\s\((.*)\)/g;
Here is the initial part of my loop:
for(var i = 1; i <= data.count; i++) {
var gameMatch = gameRegex.exec(data["left" + i]);
var idMatch = idRegex.exec(data["url" + i]);
First time through, gameMatch and idMatch have values. The following iterations do not work even though I have tested that they do work.
Is there something about regular expressions, maybe especially in Node.js, that I need to do if I use them more than once?
When you have a regular expression with a global flag /.../g and use exec() with it, JavaScript sets a property named lastIndex on that regex.
s = "abab";
r = /a/g;
r.exec(s); // ["a"]
r.lastIndex // 1
r.exec(s); // ["a"]
r.lastIndex // 3
r.exec(s); // null
r.lastIndex // 0
This is meant to be used for multiple matches in the same string. You could call exec() again and again and with every call lastIndex is increased - defining automagically where the next execution will start:
while (match = r.exec(s)) {
console.log(match);
}
Now lastIndex will be off after the first invocation of exec(). But since you pass in a different string every time, the expression will not match anymore.
There are two ways to solve this:
manually set the r.lastIndex = 0 every time or
remove the g global flag
In your case the latter option would be the right one.
Further reading:
.exec() on the MDN
.exec() on the MSDN
"How to Use The JavaScript RegExp Object" on regular-expressions.info

Replace a Regex capture group with uppercase in Javascript

I'd like to know how to replace a capture group with its uppercase in JavaScript. Here's a simplified version of what I've tried so far that's not working:
> a="foobar"
'foobar'
> a.replace( /(f)/, "$1".toUpperCase() )
'foobar'
> a.replace( /(f)/, String.prototype.toUpperCase.apply("$1") )
'foobar'
Would you explain what's wrong with this code?
You can pass a function to replace.
var r = a.replace(/(f)/, function(v) { return v.toUpperCase(); });
Explanation
a.replace( /(f)/, "$1".toUpperCase())
In this example you pass a string to the replace function. Since you are using the special replace syntax ($N grabs the Nth capture) you are simply giving the same value. The toUpperCase is actually deceiving because you are only making the replace string upper case (Which is somewhat pointless because the $ and one 1 characters have no upper case so the return value will still be "$1").
a.replace( /(f)/, String.prototype.toUpperCase.apply("$1"))
Believe it or not the semantics of this expression are exactly the same.
I know I'm late to the party but here is a shorter method that is more along the lines of your initial attempts.
a.replace('f', String.call.bind(a.toUpperCase));
So where did you go wrong and what is this new voodoo?
Problem 1
As stated before, you were attempting to pass the results of a called method as the second parameter of String.prototype.replace(), when instead you ought to be passing a reference to a function
Solution 1
That's easy enough to solve. Simply removing the parameters and parentheses will give us a reference rather than executing the function.
a.replace('f', String.prototype.toUpperCase.apply)
Problem 2
If you attempt to run the code now you will get an error stating that undefined is not a function and therefore cannot be called. This is because String.prototype.toUpperCase.apply is actually a reference to Function.prototype.apply() via JavaScript's prototypical inheritance. So what we are actually doing looks more like this
a.replace('f', Function.prototype.apply)
Which is obviously not what we have intended. How does it know to run Function.prototype.apply() on String.prototype.toUpperCase()?
Solution 2
Using Function.prototype.bind() we can create a copy of Function.prototype.call with its context specifically set to String.prototype.toUpperCase. We now have the following
a.replace('f', Function.prototype.apply.bind(String.prototype.toUpperCase))
Problem 3
The last issue is that String.prototype.replace() will pass several arguments to its replacement function. However, Function.prototype.apply() expects the second parameter to be an array but instead gets either a string or number (depending on if you use capture groups or not). This would cause an invalid argument list error.
Solution 3
Luckily, we can simply substitute in Function.prototype.call() (which accepts any number of arguments, none of which have type restrictions) for Function.prototype.apply(). We have now arrived at working code!
a.replace(/f/, Function.prototype.call.bind(String.prototype.toUpperCase))
Shedding bytes!
Nobody wants to type prototype a bunch of times. Instead we'll leverage the fact that we have objects that reference the same methods via inheritance. The String constructor, being a function, inherits from Function's prototype. This means that we can substitute in String.call for Function.prototype.call (actually we can use Date.call to save even more bytes but that's less semantic).
We can also leverage our variable 'a' since it's prototype includes a reference to String.prototype.toUpperCase we can swap that out with a.toUpperCase. It is the combination of the 3 solutions above and these byte saving measures that is how we get the code at the top of this post.
Why don't we just look up the definition?
If we write:
a.replace(/(f)/, x => x.toUpperCase())
we might as well just say:
a.replace('f','F')
Worse, I suspect nobody realises that their examples have been working only because they were capturing the whole regex with parentheses. If you look at the definition, the first parameter passed to the replacer function is actually the whole matched pattern and not the pattern you captured with parentheses:
function replacer(match, p1, p2, p3, offset, string)
If you want to use the arrow function notation:
a.replace(/xxx(yyy)zzz/, (match, p1) => p1.toUpperCase()
Old post but it worth to extend #ChaosPandion answer for other use cases with more restricted RegEx. E.g. ensure the (f) or capturing group surround with a specific format /z(f)oo/:
> a="foobazfoobar"
'foobazfoobar'
> a.replace(/z(f)oo/, function($0,$1) {return $0.replace($1, $1.toUpperCase());})
'foobazFoobar'
// Improve the RegEx so `(f)` will only get replaced when it begins with a dot or new line, etc.
I just want to highlight the two parameters of function makes finding a specific format and replacing a capturing group within the format possible.
SOLUTION
a.replace(/(f)/,(m,g)=>g.toUpperCase())
for replace all grup occurrences use /(f)/g regexp. The problem in your code: String.prototype.toUpperCase.apply("$1") and "$1".toUpperCase() gives "$1" (try in console by yourself) - so it not change anything and in fact you call twice a.replace( /(f)/, "$1") (which also change nothing).
let a= "foobar";
let b= a.replace(/(f)/,(m,g)=>g.toUpperCase());
let c= a.replace(/(o)/g,(m,g)=>g.toUpperCase());
console.log("/(f)/ ", b);
console.log("/(o)/g", c);
Given a dictionary (object, in this case, a Map) of property, values, and using .bind() as described at answers
const regex = /([A-z0-9]+)/;
const dictionary = new Map([["hello", 123]]);
let str = "hello";
str = str.replace(regex, dictionary.get.bind(dictionary));
console.log(str);
Using a JavaScript plain object and with a function defined to get return matched property value of the object, or original string if no match is found
const regex = /([A-z0-9]+)/;
const dictionary = {
"hello": 123,
[Symbol("dictionary")](prop) {
return this[prop] || prop
}
};
let str = "hello";
str = str.replace(regex, dictionary[Object.getOwnPropertySymbols(dictionary)[0]].bind(dictionary));
console.log(str);
In the case of string conversion from CamelCase to bash_case (ie: for filenames), use a callback with ternary operator.
The captured group selected with a regexp () in the first (left) replace arg is sent to the second (right) arg that is a callback function.
x and y give the captured string (don't know why 2 times!) and index (the third one) gives the index of the beginning of the captured group in the reference string.
Therefor a ternary operator can be used not to place _ at first occurence.
let str = 'MyStringName';
str = str.replace(/([^a-z0-9])/g, (x,y,index) => {
return index != 0 ? '_' + x.toLowerCase() : x.toLowerCase();
});
console.log(str);

Categories

Resources