Javascript Browser Quirks - array.Length - javascript

Code:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Unusual Array Lengths!</title>
<script type="text/javascript">
var arrayList = new Array();
arrayList = [1, 2, 3, 4, 5, ];
alert(arrayList.length);
</script>
</head>
<body>
</body>
</html>
Notice the extra comma in the array declaration.
The code above gives different outputs for various browsers:
Safari: 5
Firefox: 5
IE: 6
The extra comma in the array is being ignored by Safari and FF while IE treats it as another object in the array.
On some search, I have found mixed opinions about which answer is correct. Most people say that IE is correct but then Safari is also doing the same thing as Firefox. I haven't tested this on other browsers like Opera but I assume that there are discrepancies.
My questions:
i. Which one of these is correct?
Edit: By general consensus (and ECMAScript guidelines) we assume that IE is again at fault.
ii. Are there any other such Javascript browser quirks that I should be wary of?
Edit: Yes, there are loads of Javascript quirks. www.quirksmode.org is a good resource for the same.
iii. How do I avoid errors such as these?
Edit: Use JSLint to validate your javascript. Or, use some external libraries. Or, sanitize your code.
Thanks to DamienB, JasonBunting, John and Konrad Rudolph for their inputs.

It seems to me that the Firefox behavior is correct. What is the value of the 6th value in IE (sorry I don't have it handy to test). Since there is no actual value provided, I imagine it's filling it with something like 'null' which certainly doesn't seem to be what you intended to have happen when you created the array.
At the end of the day though, it doesn't really matter which is "correct" since the reality is that either you are targeting only one browser, in which case you can ignore what the others do, or you are targeting multiple browsers in which case your code needs to work on all of them. In this case the obvious solution is to never include the dangling comma in an array initializer.
If you have problems avoiding it (e.g. for some reason you have developed a (bad, imho) habit of including it) and other problems like this, then something like JSLint might help.

I was intrigued so I looked it up in the definition of ECMAScript 262 ed. 3 which is the basis of JavaScript 1.8. The relevant definition is found in section 11.1.4 and unfortunately is not very clear. The section explicitly states that elisions (= omissions) at the beginning or in the middle don't define an element but do contribute to the overall length.
There is no explicit statements about redundant commas at the end of the initializer but by omission I conclude that the above statement implies that they do not contribute to the overall length so I conclude that MSIE is wrong.
The relevant paragraph reads as follows:
Array elements may be elided at the beginning, middle or end of the element list. Whenever a comma in the element list is not preceded by an Assignment Expression (i.e., a comma at the beginning or after another comma), the missing array element contributes to the length of the Array and increases the index of subsequent elements. Elided array elements are not defined.

"3" for those cases, I usually put in my scripts
if(!arrayList[arrayList.length -1]) arrayList.pop();
You could make a utility function out of that.

First off, Konrad is right to quote the spec, as that is what defines the language and answers your first question.
To answer your other questions:
Are there any other such Javascript browser quirks that I should be wary of?
Oh, too many to list here! Try the QuirksMode website for a good place to find nearly everything known.
How do I avoid errors such as these?
The best way is to use a library that abstracts these problems away for you so that you can get down to worrying about the logic of the application. Although a bit esoteric, I prefer and recommend MochiKit.

Which one of these is correct?
Opera also returns 5. That means IE is outnumbered and majority rules as far as what you should expect.

Ecma 262 edition 5.1 section 11.1.4 array initializer states that a comma at the end if the array does not contribute to the length if the array. "if an element is elided at the end of the array it does not contribute to the length of the array"
That means [ "x", ] is perfectly legal javascript and should return an array of length 1

#John: The value of arrayList[5] comes out to be 'undefined'.
Yes, there should never be a dangling comma in declarations. Actually, I was just going through someone else's long long javascript code which somehow was not working correctly in different browers. Turned out that the dangling comma was the culprit that has accidently been typed in! :)

Related

In a stringified array is it possible to differentiate between quotes that were in a string and those that surrounded the string itself?

Some Context:
• I'm still learning to code atm (started less than a year ago)
• I'm mostly self taught at that since I think my computer science class feels
too slow.
• The website I'm learning on is code.org, specifically in the "game lab"
• The site's coding environments only use ES5 because they don't want to
update them to ES6 or something like that
• In class we're making function libraries and while not required, I want
mine to be "highly usable," for lack of a better term, while also being
reasonably short (prefer not to automate things if I can get them done
quicker somehow, but that's just personal preference).
So now for where the actual question comes in: in a stringified array, is it possible to differentiate between a quotation mark that was inside a string and a quotation mark that actually denotes a string? Because I noticed something confusing with the output of JSON.parse(JSON.stringify()) on code.org, specifically, if you write something like,
JSON.parse(JSON.stringify(['hi","hi']))
the output will be ["hi","hi"] which looks just like an array containing two strings (on code.org it doesn't show the \'s), but still contains just one, which is fine unless you're using a regular expression to detect whether or not a match is within a string (if every quotation mark after the match has a "partner"), which is what I'm doing in 4 different functions. One flattens a list (since ES5 doesn't have Array.prototype.flat()), one removes all instances of the arguments from a list, one removes all instances of specified operand types, and one replaces all instances of an argument with the one that follows it.
Now I know the odds of a string containing an odd number of quotation marks (whether single or double) is likely extremely low, but it still bothers me that not having a way to differentiate between quotes formerly within a string and quotes which formerly denoted a string (in an array after it's been stringified) as these functions otherwise function exactly as intended. The regular expression I'm using to determine if there's an even number of quotes left in the stringified array is /(?=[^"]*(?:(?:"[^"]*){2})*$)/ where you put the match before the lookahead assertion and anything you absolutely want to follow before the first [^"]*.
To highlight the actual issue I'm trying to solve, this is my flatten function (since it's the shortest of the 4), and yeah, yeah, I know "eval bad" but it's extremely convenient to use here since it shortens the actual modification into a single line, and I highly doubt anyone's actually going to find a way to abuse it given its implementation ("this" needs to be an array for splice to work, so if I'm not mistaken, there isn't really a way to abuse it, but tell me if I'm wrong, since I probably am).
Array.prototype.flatten = function() {
eval(('this.splice(0,this.length,' + JSON.stringify(this).replace(/[\[\]](?=[^"]*(?:(?:"[^"]*){2})*$)/g, '') + ')').replace(/,(?=((,[^"]*(?:(?:"[^"]*){2})*)*.$))/g, ''));
return this;
};
This works really well outside of the previously specified conditions, but if I were to call it with something like [1,'"'] it'd find 3 quotation marks after the \[ and wouldn't be able to remove it but would be able to remove the \], thus when eval actually gets to .splice(), it would look like eval('this.splice(0,this.length,[1,"\"")') causing the error Unexpected token ')' to be thrown
Any help on this is appreciated, even if it's just telling me it isn't possible, thanks for reading my ramblings.
TL;DR: in a stringified array is it possible to differentiate between " and \" (string wrapping quotes of strings within a stringified array and quotes within a string within a stringified array) in a regular expression or any other method using only the tools available in ES5 (site I'm learning on doesn't want to update their project environments for whatever reason)
You are having a problem because your input is not a context free grammar and can not be correctly parsed with regular expressions.
Can you explain why JSON.parse is unacceptable? It is even in ancient browsers and versions of node.js.
Someone writing a json parser might use bison or yacc, so if this is a learning experience consider playing with jison.
I ended up finding a way to do this, for whatever reason (either I didn't notice last night because I was tired or it legitimately changed overnight, though likely the former) I can now see the " when viewing the value of the the stringified array, and lo and behold modifying the regular expression so that it ignored instances of " resolved the issue.
New regular expression for quotation mark pair matching now reads:
// old even number of quotation marks after match check
/(?=[^"]*(?:(?:"[^"]*){2})*$)/
// new even number of quotation marks after match check
/(?=(\\"|[^"])*(?:(?:(?<!\\)"(\\"|[^"])*){2})*$)/
// (only real difference is that it accounts for the \)
Sorry for anyone who may have misunderstood the question due to how all over the place it was, I'm aware that I tend to end up writing a lot more than is necessary and it often leads to tangents that muddle my view of what I was initially asking, which in turn makes the point I'm actually trying to get across even harder to grasp at. Thanks to those who still tried to help me regardless of how much of a mess of a first question this was.

Why is toString of JavaScript function implementation-dependent?

From the EcmaScript 5 specification
15.3.4.2 Function.prototype.toString( )
An implementation-dependent representation of the function is returned. This representation has the syntax of a FunctionDeclaration. Note in particular that the use and placement of white space, line terminators, and semicolons within the representation String is implementation-dependent.
Why is it implementation-dependent? It shouldn't be too hard to make it output standardized string consisting of the original code of the function. Also the reasons that I can come up with such as optimization, doesn't seem to be too heavily used as pretty much all the browsers give the original code as result of toString.
If the toString wouldn't be implementation-dependent and thus would be standardized to be the original code for function (with the new lines etc. handled on standard way), wouldn't it make it possible to include the functions on JSON?
I do realize that the JSON, despite its name, is independent of JavaScript and thus the functions shouldn't be part of it. But this way the functions could in theory be passed with it as strings, without losing the cross-browser support.
Internally, Function.prototype.toString() has to get the function declaration code for the function, which it may or may not have. According to the MDN page, FF used to decompile the function, and now it stores the declaration with the function, so it doesn't have to decompile it.
Since Gecko 17.0 (Firefox 17 / Thunderbird 17 / SeaMonkey 2.14),
Function.prototype.toString() has been implemented by saving the
function's source. The decompiler was removed
*https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/toString
Decompiling it requires extra work. Storing it requires extra memory. A given ECMAscript implementation may have different resource requirements.
Further, if it is decompiled, that is dependent on how it was stored in the first place. An engine may be unable to return comments in the original, because it didn't store them when the function was evaluated. Or whitespace/newlines might be different if the engine collapsed them. Or the engine may have optimized the code, such as by ignoring unreachable code, making it not possible to return that code back in the toString() call.
...some engines omit newlines. And others omit comments. And others
omit "dead code". And others include comments around (!) function. And
others hide source completely...
*http://perfectionkills.com/state-of-function-decompilation-in-javascript/
These are just a few reasons why Function.prototype.toString() is implementation dependent.

why {key:value}["key"] doesn't work?

1:{key:value}["key"]
2:({key:value})["key"]
I'm wondering how the JS interpreter works on the above codes, and why 1 doesn't work and why 2 works?
I assume you are asking the question because you saw this effect in a JavaScript REPL (shell). You are using a JavaScript shell which assumes the leading "{" begins a block statement instead of an object literal.
For example, if you use the JavaScript interpreter that comes with the Chrome browser, you see the following:
> {key:"value"}["key"]
["key"]
Here, Chrome saw what you entered as a block statement, followed by the expression that was an array of one element, the string "key". So it responded with the result of that expression, namely the array ["key"]
But not all shells work this way. If you use the interpreter with node.js, then #1 will work for you!
$ node
> {key:"value"}["key"]
'value'
>
In interpreters like Chrome, you have to use parentheses to tell it that you want the first part to be an object literal. (This technique, by the way, is guaranteed to work in all shells, including node's).
EDIT
As pointed out in one of the comments, if you use that construct in an expression context anywhere in an actual script, it will produce "value". It's the use in the shell that looks confusing.
This fact was actually exploited in the famous WAT video by Gary Bernhardt.

Finding comments in HTML

I have an HTML file and within it there may be Javascript, PHP and all this stuff people may or may not put into their HTML file.
I want to extract all comments from this html file.
I can point out two problems in doing this:
What is a comment in one language may not be a comment in another.
In Javascript, remainder of lines are commented out using the // marker. But URLs also contain // within them and I therefore may well eliminate parts of URLs if I
just apply substituting // and then the
remainder of the line, with nothing.
So this is not a trivial problem.
Is there anywhere some solution for this already available?
Has anybody already done this?
Problem 2: Isn't every url quoted, with either "www.url.com" or 'www.url.com', when you write it in either language? I'm not sure. If that's the case then all you haft to do is to parse the code and check if there's any quote marks preceding the backslashes to know if it's a real url or just a comment.
Look into parser generators like ANTLR which has grammars for many languages and write a nesting parser to reliably find comments. Regular expressions aren't going to help you if accuracy is important. Even then, it won't be 100% accurate.
Consider
Problem 3, a comment in a language is not always a comment in a language.
<textarea><!-- not a comment --></textarea>
<script>var re = /[/*]not a comment[*/]/, str = "//not a comment";</script>
Problem 4, a comment embedded in a language may not obviously be a comment.
<button onclick="// this is a comment//
notAComment()">
Problem 5, what is a comment may depend on how the browser is configured.
<noscript><!-- </noscript> Whether this is a comment depends on whether JS is turned on -->
<!--[if IE 8]>This is a comment, except on IE 8<![endif]-->
I had to solve this problem partially for contextual templating systems that elide comments from source code to prevent leaking software implementation details.
https://github.com/mikesamuel/html-contextual-autoescaper-java/blob/master/src/tests/com/google/autoesc/HTMLEscapingWriterTest.java#L1146 shows a testcase where a comment is identified in JavaScript, and later testcases show comments identified in CSS and HTML. You may be able to adapt that code to find comments. It will not handle comments in PHP code sections.
It seems from your word that you are pondering some approach based on regular expressions: it is a pain to do so on the whole file, try to use some tools to highlight or to discard interesting or uninteresting text and then work on what is left from your sieve according to the keep/discard criteria. Have a look at HTML::Tree and TreeBuilder, it could be very useful to deal with the HTML markup.
I would convert the HTML file into a character array and parse it. You can detect key strings like "<", "--" ,"www", "http", as you move forward and either skip or delete those segments.
The start/end indices will have to be identified properly, which is a challenge but you will have full power.
There are also other ways to simplify the process if performance is not a problem. For example, all tags can be grabbed with XML::Twig and the string can be parsed to detect JS comments.

Is there any practical reason to use quoted strings for JSON keys?

According to Crockford's json.org, a JSON object is made up of members, which is made up of pairs.
Every pair is made of a string and a value, with a string being defined as:
A string is a sequence of zero or more
Unicode characters, wrapped in double
quotes, using backslash escapes. A
character is represented as a single
character string. A string is very
much like a C or Java string.
But in practice most programmers don't even know that a JSON key should be surrounded by double quotes, because most browsers don't require the use of double quotes.
Does it make any sense to bother surrounding your JSON in double quotes?
Valid Example:
{
"keyName" : 34
}
As opposed to the invalid:
{
keyName : 34
}
The real reason about why JSON keys should be in quotes, relies in the semantics of Identifiers of ECMAScript 3.
Reserved words cannot be used as property names in Object Literals without quotes, for example:
({function: 0}) // SyntaxError
({if: 0}) // SyntaxError
({true: 0}) // SyntaxError
// etc...
While if you use quotes the property names are valid:
({"function": 0}) // Ok
({"if": 0}) // Ok
({"true": 0}) // Ok
The own Crockford explains it in this talk, they wanted to keep the JSON standard simple, and they wouldn't like to have all those semantic restrictions on it:
....
That was when we discovered the
unquoted name problem. It turns out
ECMA Script 3 has a whack reserved
word policy. Reserved words must be
quoted in the key position, which is
really a nuisance. When I got around
to formulizing this into a standard, I
didn't want to have to put all of the
reserved words in the standard,
because it would look really stupid.
At the time, I was trying to convince
people: yeah, you can write
applications in JavaScript, it's
actually going to work and it's a good
language. I didn't want to say, then,
at the same time: and look at this
really stupid thing they did! So I
decided, instead, let's just quote the
keys.
That way, we don't have to tell
anybody about how whack it is.
That's why, to this day, keys are quoted in
JSON.
...
The ECMAScript 5th Edition Standard fixes this, now in an ES5 implementation, even reserved words can be used without quotes, in both, Object literals and member access (obj.function Ok in ES5).
Just for the record, this standard is being implemented these days by software vendors, you can see what browsers include this feature on this compatibility table (see Reserved words as property names)
Yes, it's invalid JSON and will be rejected otherwise in many cases, for example jQuery 1.4+ has a check that makes unquoted JSON silently fail. Why not be compliant?
Let's take another example:
{ myKey: "value" }
{ my-Key: "value" }
{ my-Key[]: "value" }
...all of these would be valid with quotes, why not be consistent and use them in all cases, eliminating the possibility of a problem?
One more common example in the web developer world: There are thousands of examples of invalid HTML that renders in most browsers...does that make it any less painful to debug or maintain? Not at all, quite the opposite.
Also #Matthew makes the best point of all in comments below, this already fails, unquoted keys will throw a syntax error with JSON.parse() in all major browsers (and any others that implement it correctly), you can test it here.
If I understand the standard correctly, what JSON calls "objects" are actually much closer to maps ("dictionaries") than to actual objects in the usual sense. The current standard easily accommodates an extension allowing keys of any type, making
{
"1" : 31.0,
1 : 17,
1n : "valueForBigInt1"
}
a valid "object/map" of 3 different elements.
If not for this reason, I believe the designers would have made quotes around keys optional for all cases (maybe except keywords).
YAML, which is in fact a superset of JSON, supports what you want to do. ALthough its a superset, it lets you keep it as simple as you want.
YAML is a breath of fresh air and it may be worth your time to have a look at it. Best place to start is here: http://en.wikipedia.org/wiki/YAML
There are libs for every language under the sun, including JS, eg https://github.com/nodeca/js-yaml

Categories

Resources