Do RegExps made by expression literals share a single instance? - javascript

The following snippet of code (from Crockford's Javascript: The Good Parts) demonstrates that RegExp objects made by regular expression literals share a single instance:
function make_a_matcher( ) {
return /a/gi;
}
var x = make_a_matcher( );
var y = make_a_matcher( );
// Beware: x and y are the same object!
x.lastIndex = 10;
document.writeln(y.lastIndex); // 10
Question: Is this the same with any other literals? I tried modifying the code above to work with the string "string", but got a bunch of errors.

No, they are not shared. From the spec on Regular Expression Literals:
A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.
However, this changed with ES5. Old ECMAScript 3 had a different behavior:
A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. The object is created before evaluation of the containing program or function begins. Evaluation of the literal produces a reference to that object; it does not create a new object. Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.
This was supposed to share the compilation result of the regex engine across evaluations, but obviously led to buggy progams.
You should throw your book away and get a newer edition.

Related

When does Javascript "initialize" a RegEx pattern?

I take care to declare a RegEx pattern once and reuse if possible, for performance reasons. I'm not entirely certain why - something I probably read once many years ago and has been filed away in the ol' skull sponge.
I find myself in a regex-heavy situation, and a thought occurred... does declaring a RegEx pattern "instantiate" or "initialize" that pattern, or does it just store the pattern until it's needed?
var NonNumbers = /[^0-9]/g; //"initialized" here?
"h5u4i15h1iu".replace(NonNumbers, "*"); //or "initialized" here?
Maybe RegExp() actually creates one and the literal waits until it's used, even though both patterns return the same results?
var NonNumbers = /[^0-9]/g; //just stores the pattern
var NonNumbers = RegExp(/[^0-9]/, 'g'); //actually creates the RegExp
Just an itch I'm hoping someone who understands the inner workings can scratch.
From the Mozilla spec:
You construct a regular expression in one of two ways:
Using a regular expression literal, which consists of a pattern enclosed between slashes, as follows:
var re = /ab+c/;
Regular expression literals provide compilation of the regular expression when the script is loaded. If the regular expression remains constant, using this can improve performance.
Or calling the constructor function of the RegExp object, as follows:
var re = new RegExp('ab+c');
Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.
Since the spec indicates that the regular expression is being compiled when using the literal syntax, it is also safe to assume that it is being initialized as a full, bona-fide regular expression object at that point.
Another advantage of using literals is that regular expressions can be interned, meaning that if the same regular expression literal is found in multiple places, both literals can refer to the same object, saving both memory and initialization costs.

Why does a number inside parentheses have methods, but a number outside parentheses does not? [duplicate]

If I try to write
3.toFixed(5)
there is a syntax error. Using double dots, putting in a space, putting the three in parentheses or using bracket notation allows it to work properly.
3..toFixed(5)
3 .toFixed(5)
(3).toFixed(5)
3["toFixed"](5)
Why doesn't the single dot notation work and which one of these alternatives should I use instead?
The period is part of the number, so the code will be interpreted the same as:
(3.)toFixed(5)
This will naturally give a syntax error, as you can't immediately follow the number with an identifier.
Any method that keeps the period from being interpreted as part of the number would work. I think that the clearest way is to put parentheses around the number:
(3).toFixed(5)
You can't access it because of a flaw in JavaScript's tokenizer. Javascript tries to parse the dot notation on a number as a floating point literal, so you can't follow it with a property or method:
2.toString(); // raises SyntaxError
As you mentioned, there are a couple of workarounds which can be used in order make number literals act as objects too. Any of these is equally valid.
2..toString(); // the second point is correctly recognized
2 .toString(); // note the space left to the dot
(2).toString(); // 2 is evaluated first
To understand more behind object usage and properties, check out the Javascript Garden.
It doesn't work because JavaScript interprets the 3. as being either the start of a floating-point constant (such as 3.5) or else an entire floating-point constant (with 3. == 3.0), so you can't follow it by an identifier (in your case, a property-name). It fails to recognize that you intended the 3 and the . to be two separate tokens.
Any of your workarounds looks fine to me.
This is an ambiguity in the Javascript grammar. When the parser has got some digits and then encounters a dot, it has a choice between "NumberLiteral" (like 3.5) or "MemberExpression" (like 3.foo). I guess this ambiguity cannot be resolved by lookahead because of scientific notation - should 3.e2 be interpreted as 300 or a property e2 of 3? Therefore they voluntary decided to prefer NumberLiterals here, just because there's actually not very much demand for things like 3.foo.
As others have mentioned, Javascript parser interprets the dot after Integer literals as a decimal point and hence it won't invoke the methods or properties on Number object.
To explicitly inform JS parser to invoke the properties or methods on Integer literals, you can use any of the below options:
Two Dot Notation
3..toFixed()
Separating with a space
3 .toFixed()
Write integer as a decimal
3.0.toFixed()
Enclose in parentheses
(3).toFixed()
Assign to a constant or variable
const nbr = 3;
nbr.toFixed()

What is the algorithm of the search() function?

Does any body know what the algorithm used for the search() function in javascript is?
var myRegExp = /Alex/;
var string1 = "Today John went to the store and talked with Alex.";
var matchPos1 = string1.search(myRegExp);
if(matchPos1 != -1)
document.write("There was a match at position " + matchPos1);
else
document.write("There was no match in the first string");
Example copied tizaq.com
I need to use this function to search a text document for different string values. But I need to document what the algorithm behind this method is, and what the complexity is. Otherwise I have to write my own method that searches the text file that I have.
The specification says it's implemented as a regular expression match:
3) If Type(regexp) is Object and the value of the [[Class]] internal
property of regexp is "RegExp", then let rx be regexp;
4) Else, let rx be a new RegExp object created as if by the
expression new RegExp( regexp) where RegExp is the standard built-in
constructor with that name.
5) Search the value string from its beginning for an occurrence of
the regular expression pattern rx. Let result be a Number indicating
the offset within string where the pattern matched, or –1 if there was
no match. (...)
(Section 15.5.4.12 String.prototype.search (regexp)).
This means your question boils down to the regex matching algorithm. But that is not in the specification either, it depends on the implementation:
The value of the [[Match]] internal property is an implementation dependent representation of the Pattern of the RegExp object.
(Section 15.10.7 Properties of RegExp Instances).
So, if documenting the complexity of that algorithm is really a requirement, I guess you'll have to write your own method. But keep in mind that, by doing that, you'll probably come up with something less efficient, and probably dependent on other built-in methods whose complexity is unknown (maybe even RegExp itself). So, can't you convince the powers that be that documenting the complexity of a built-in, implementation-dependent js method is not your job?

Can I avoid code duplication by referencing one Javascript Regexp literal inside another?

Is there a simple way to refer to one Javascript literal (e.g. "string") within another regexp literal?
Kind of familiar with Javascript Regexp but far from a guru. Trying to write a simple parser for a small handful of expression types. E.g. One type is expressions like:
`value gender 1='Male' 2 ='Female' 3="Didn't answer" >3 = 'Other';
Rather than write a whole parser in say, Jison, and the attendant learning curve, I thought it would be simple enough to use RegExp.
It appears Javascript Regexp can't capture an arbitrary number of repeating subgroups, and there's no clear character to split on, I'm parsing subgroups with their own regexps.
The following works okay, but the regexp literals are far from DRY, and all but unreadable. Each higher level construct repeats the lower level constructs.
var re_value_stmt = /value\s+(\w+)((?:\s+(?:[^=]+[=](?:(?:["][^"]+["])|(?:['][^']+[']))))+)/i
var re_value_clause = /([^=]+[=](\s*(?:(['][^']*['])|(["][^"]*["])))+)/ig
var re_value_elems = /([^=]+)[=]\s*(?:(?:[']([^']*)['])|(?:["]([^"]*)["]))/ig
console.log(re_value_elems.exec("1='Male'"));
console.log(re_value_clause.exec("1=\"Male\" 2=\"Female\""));
console.log(re_value_stmt.exec("value gender 1='Male' 2='Female'"));
For instance, (?:(?:["][^"]+["])|(?:['][^']+['])) just means QuotedString. Can I write that instead?
Is there a simple way to refer to one Javascript literal (e.g. "string") within another regexp literal? Specifying regexp by munging strings might work, but also seems awkward and error prone (e.g. needing to escape quote marks and escape escapes).
Or is this already the poster child for why people create parsers based on grammars and move out of Regexp?

Why can't I access a property of an integer with a single dot?

If I try to write
3.toFixed(5)
there is a syntax error. Using double dots, putting in a space, putting the three in parentheses or using bracket notation allows it to work properly.
3..toFixed(5)
3 .toFixed(5)
(3).toFixed(5)
3["toFixed"](5)
Why doesn't the single dot notation work and which one of these alternatives should I use instead?
The period is part of the number, so the code will be interpreted the same as:
(3.)toFixed(5)
This will naturally give a syntax error, as you can't immediately follow the number with an identifier.
Any method that keeps the period from being interpreted as part of the number would work. I think that the clearest way is to put parentheses around the number:
(3).toFixed(5)
You can't access it because of a flaw in JavaScript's tokenizer. Javascript tries to parse the dot notation on a number as a floating point literal, so you can't follow it with a property or method:
2.toString(); // raises SyntaxError
As you mentioned, there are a couple of workarounds which can be used in order make number literals act as objects too. Any of these is equally valid.
2..toString(); // the second point is correctly recognized
2 .toString(); // note the space left to the dot
(2).toString(); // 2 is evaluated first
To understand more behind object usage and properties, check out the Javascript Garden.
It doesn't work because JavaScript interprets the 3. as being either the start of a floating-point constant (such as 3.5) or else an entire floating-point constant (with 3. == 3.0), so you can't follow it by an identifier (in your case, a property-name). It fails to recognize that you intended the 3 and the . to be two separate tokens.
Any of your workarounds looks fine to me.
This is an ambiguity in the Javascript grammar. When the parser has got some digits and then encounters a dot, it has a choice between "NumberLiteral" (like 3.5) or "MemberExpression" (like 3.foo). I guess this ambiguity cannot be resolved by lookahead because of scientific notation - should 3.e2 be interpreted as 300 or a property e2 of 3? Therefore they voluntary decided to prefer NumberLiterals here, just because there's actually not very much demand for things like 3.foo.
As others have mentioned, Javascript parser interprets the dot after Integer literals as a decimal point and hence it won't invoke the methods or properties on Number object.
To explicitly inform JS parser to invoke the properties or methods on Integer literals, you can use any of the below options:
Two Dot Notation
3..toFixed()
Separating with a space
3 .toFixed()
Write integer as a decimal
3.0.toFixed()
Enclose in parentheses
(3).toFixed()
Assign to a constant or variable
const nbr = 3;
nbr.toFixed()

Categories

Resources