1:{key:value}["key"]
2:({key:value})["key"]
I'm wondering how the JS interpreter works on the above codes, and why 1 doesn't work and why 2 works?
I assume you are asking the question because you saw this effect in a JavaScript REPL (shell). You are using a JavaScript shell which assumes the leading "{" begins a block statement instead of an object literal.
For example, if you use the JavaScript interpreter that comes with the Chrome browser, you see the following:
> {key:"value"}["key"]
["key"]
Here, Chrome saw what you entered as a block statement, followed by the expression that was an array of one element, the string "key". So it responded with the result of that expression, namely the array ["key"]
But not all shells work this way. If you use the interpreter with node.js, then #1 will work for you!
$ node
> {key:"value"}["key"]
'value'
>
In interpreters like Chrome, you have to use parentheses to tell it that you want the first part to be an object literal. (This technique, by the way, is guaranteed to work in all shells, including node's).
EDIT
As pointed out in one of the comments, if you use that construct in an expression context anywhere in an actual script, it will produce "value". It's the use in the shell that looks confusing.
This fact was actually exploited in the famous WAT video by Gary Bernhardt.
Related
Some Context:
• I'm still learning to code atm (started less than a year ago)
• I'm mostly self taught at that since I think my computer science class feels
too slow.
• The website I'm learning on is code.org, specifically in the "game lab"
• The site's coding environments only use ES5 because they don't want to
update them to ES6 or something like that
• In class we're making function libraries and while not required, I want
mine to be "highly usable," for lack of a better term, while also being
reasonably short (prefer not to automate things if I can get them done
quicker somehow, but that's just personal preference).
So now for where the actual question comes in: in a stringified array, is it possible to differentiate between a quotation mark that was inside a string and a quotation mark that actually denotes a string? Because I noticed something confusing with the output of JSON.parse(JSON.stringify()) on code.org, specifically, if you write something like,
JSON.parse(JSON.stringify(['hi","hi']))
the output will be ["hi","hi"] which looks just like an array containing two strings (on code.org it doesn't show the \'s), but still contains just one, which is fine unless you're using a regular expression to detect whether or not a match is within a string (if every quotation mark after the match has a "partner"), which is what I'm doing in 4 different functions. One flattens a list (since ES5 doesn't have Array.prototype.flat()), one removes all instances of the arguments from a list, one removes all instances of specified operand types, and one replaces all instances of an argument with the one that follows it.
Now I know the odds of a string containing an odd number of quotation marks (whether single or double) is likely extremely low, but it still bothers me that not having a way to differentiate between quotes formerly within a string and quotes which formerly denoted a string (in an array after it's been stringified) as these functions otherwise function exactly as intended. The regular expression I'm using to determine if there's an even number of quotes left in the stringified array is /(?=[^"]*(?:(?:"[^"]*){2})*$)/ where you put the match before the lookahead assertion and anything you absolutely want to follow before the first [^"]*.
To highlight the actual issue I'm trying to solve, this is my flatten function (since it's the shortest of the 4), and yeah, yeah, I know "eval bad" but it's extremely convenient to use here since it shortens the actual modification into a single line, and I highly doubt anyone's actually going to find a way to abuse it given its implementation ("this" needs to be an array for splice to work, so if I'm not mistaken, there isn't really a way to abuse it, but tell me if I'm wrong, since I probably am).
Array.prototype.flatten = function() {
eval(('this.splice(0,this.length,' + JSON.stringify(this).replace(/[\[\]](?=[^"]*(?:(?:"[^"]*){2})*$)/g, '') + ')').replace(/,(?=((,[^"]*(?:(?:"[^"]*){2})*)*.$))/g, ''));
return this;
};
This works really well outside of the previously specified conditions, but if I were to call it with something like [1,'"'] it'd find 3 quotation marks after the \[ and wouldn't be able to remove it but would be able to remove the \], thus when eval actually gets to .splice(), it would look like eval('this.splice(0,this.length,[1,"\"")') causing the error Unexpected token ')' to be thrown
Any help on this is appreciated, even if it's just telling me it isn't possible, thanks for reading my ramblings.
TL;DR: in a stringified array is it possible to differentiate between " and \" (string wrapping quotes of strings within a stringified array and quotes within a string within a stringified array) in a regular expression or any other method using only the tools available in ES5 (site I'm learning on doesn't want to update their project environments for whatever reason)
You are having a problem because your input is not a context free grammar and can not be correctly parsed with regular expressions.
Can you explain why JSON.parse is unacceptable? It is even in ancient browsers and versions of node.js.
Someone writing a json parser might use bison or yacc, so if this is a learning experience consider playing with jison.
I ended up finding a way to do this, for whatever reason (either I didn't notice last night because I was tired or it legitimately changed overnight, though likely the former) I can now see the " when viewing the value of the the stringified array, and lo and behold modifying the regular expression so that it ignored instances of " resolved the issue.
New regular expression for quotation mark pair matching now reads:
// old even number of quotation marks after match check
/(?=[^"]*(?:(?:"[^"]*){2})*$)/
// new even number of quotation marks after match check
/(?=(\\"|[^"])*(?:(?:(?<!\\)"(\\"|[^"])*){2})*$)/
// (only real difference is that it accounts for the \)
Sorry for anyone who may have misunderstood the question due to how all over the place it was, I'm aware that I tend to end up writing a lot more than is necessary and it often leads to tangents that muddle my view of what I was initially asking, which in turn makes the point I'm actually trying to get across even harder to grasp at. Thanks to those who still tried to help me regardless of how much of a mess of a first question this was.
Looking at the output of UglifyJS2, I noticed that no spaces are required between literals and the in operator (e.g., 'foo'in{foo:'bar'} is valid).
Playing around with Chrome's DevTools, however, I noticed that hex and binary number literals require a space before the in keyword:
Internet explorer returned true to all three tests, while FireFox 48.0.1 threw a SyntaxError for the first one (1in foo), however it is okay with string literals ('1'in foo==true).
It seems that there should be no problem parsing JavaScript, allowing for keywords to be next to numeric literals, but I can't find any explicit rule in the ECMAScript specification (any of them).
Further testing shows that statements like for(var i of[1,2,3])... are allowed in both Chrome and FireFox (IE11 doesn't support for..of loops), and typeof"string" works in all three.
Which behavior is correct? Is it, in fact, defined somewhere that I missed, or are all these effects a result of idiosyncrasies of each browser's parser?
Not an expert - I haven't done a JS compiler, but have done others.
ecma-262.pdf is a bit vague, but it's clear that an expression such as 1 in foo should be parsed as 3 input elements, which are all tokens. Each token is a CommonToken (11.5); in this case, we get numericLiteral, identifierName (yes, in is an identifierName), and identifierName. Exactly the same is true when parsing 0b1 in foo (see 11.8.3).
So, what happens when you take out the WS? It's not covered explicitly (as far as I can see), but it's common practice (in other languages) when writing a lexer to scan the longest character sequence that will match something you could potentially be looking for. The introduction to section 11 pretty much says exactly that:
The source text is scanned from left to right, repeatedly taking the
longest possible sequence of code points as the next input element.
So, for 0b1in foo the lexer goes through 0b1, which matches a numeric literal, and reaches i, giving 0b1i, which doesn't match anything. So it passes the longest match (0b1) to the rest of the parser as a token, and starts again at i. It finds n, followed by WS, so passes in as the second token, and so on.
So, basically, and rather bizarrely, it looks like IE is correct.
TL;DR
There would be no change to how code would be interpreted if whitespace weren't required in these circumstances, but it's part of the spec.
Looking at the source code of v8 that handles number literal parsing, it cites ECMA 262 § 7.8.3:
The source character immediately following a NumericLiteral must not be an IdentifierStart or DecimalDigit.
NOTE For example:
3in
is an error and not the two input elements 3 and in.
This section seems to contradict the introduction of section 7. However, it does not seem that there would be any problems with breaking that rule and allowing for 3in to be parsed. There are cases where allowing for no spaces between literals and identifiers would change how the source is parsed, but all cases merely change which errors are generated.
i saw this piece of code in an obfuscated javascript :
if(1s Q.is.ep=='a')
do you have any idea what this might mean? Im quite confused about the space..
thanks :)
The code looks like generated by Dean Edwards' packer (or another similar one). You could unpack it with this tool.
It's indeed JavaScript, however replaced keywords, method, variables with meaningless strings. The bottom half of the file you provided is actually a mapper between obscured and original.
And this, it the power of eval (and don't use eval if by all means you could do without it).
I'm currently working on a small little dsl, not unlike rabl. I'm struggling with the implementation of one of my rules. Before we get to the problem, I'll explain a bit about my syntax/grammar.
In my little language you can define properties, object/array blocks, or custom blocks (these are all used to build a json object/array). A "custom block" can either be one that contains my standard expressions (property, object/array block, etc) or some JavaScript. These expressions are written as such -
-- An object block
object #model
-- A property node
property some, key(name="value")
-- A custom node
object custom_obj as
property some(name="key")
end
-- A custom script node
property full_name as (u)
// This is JavaScript
return u.first_name + ' ' + u.last_name;
end
end
The problem I'm running into is with my custom script node. I'm having a real hard defining the script token so that JISON can properly capture the stuff inside the block.
In my lexer, I currently have...
# script_param is basically a regex to match "(some_ident)"
{script_param} { this.begin('js'); return 'SCRIPT_PARAM'; }
<js>(.|\n|\r)*?"end" %{
this.popState();
yytext = yytext.substr(0, yyleng - 3).trim();
return 'SCRIPT';
%}
That SCRIPT token will basically match anything after (u) up to (and including) the end token (which usually ends a block). I really dislike this because my usual block terminator (end) is actually part of the script token, which feels totally hacky to me. Unfortunately, I'm not able to find a better way to capture ANYTHING between (..) and end.
I've tried writing a regex that captures anything that ends with a ";", but that poses problems when I have multiple script nodes in my dsl code. I've only been able to make this work by including the "end" keyword as part of my capture.
Here are the links to my grammar and lexer files.
I'd greatly appreciate any insight into solving my problem! If I didn't explain my problem clearly, let me know and I'll try my best to clarify!
Many thanks in advance!!
I will also happily accept any advice as to how to clean up my grammar. I'm still fairly new at this stuff and feel like my stuff is a mess right now :)
It's easy enough to match a string up to but not including the first instance of end:
([^e]|e[^n]|en[^d])*
(And it doesn't even need non-greedy repetition.)
However, that's not what you want. The included JavaScript might include:
variables whose names happen to include the characters end (tendency)
comments (/* Take the values up to the end of the line */)
character strings (if (word == "end"))
and, indeed, the word end itself, which is not a reserved word in js.
Really, the only clean solution is to be able to lex javascript. Fortunately, you don't have to do it precisely, because you're not interpreting it, but even so it is a bit of work. The most annoying part of javascript lexing, like other similar languages, is identifying when / is the beginning of a regular expression, and when it is just division; getting that right requires most of a javascript parser, particularly since it also interacts with the semicolon rule.
To deal with the fact that the included javascript might actually use a variable named end, you have a couple of choices, as far as I can see:
Document the fact that end is a reserved word.
Only recognize end when it appears outside of parentheses and in a place where a statement might start (not too difficult if you end up building enough of a JS parser to correctly identify regular expressions)
Only recognize end when it appears by itself on a line.
This last choice would really simplify your problem a lot, so you might want to think about it, although it's not really very elegant.
Given there is no cross browser const in Javascript and most of the work-arounds are more complex than I care for, I am just going to go with the naming convention of THIS_IS_A_CONSTANT. All well and good, but what occurred to me is that if there was way to get my IDE (VS.NET 2010 with Resharper 6) to give me a warning on any Javascript code that makes an assignment to a variable with that naming convention except in the variable declaration this would handle most of the potential issues around the lack of real constants in Javascript (at least for my needs).
So does anyone know of a good way to generate such warnings? In-IDE would be the best thing but other solutions are fine as well. I have looked for something like FX-Cop for Javascript; jslint doesn't seem to allow the creation of new rules but maybe I didn't look deep enough. I may also suggest this as a feature in Resharper (assuming I am not missing a way to make it do so already).
Thanks,
Matthew
So you want to find any assigment of the form:
id = exp ;
where id doesn't contain the substring CONSTANT and exp is a numeric constant?
Our Source Code Search Engine (SCSE) can do this pretty directly. The SCSE reads source code for a large set of files for many languages (including JavaScript), breaks it into tokens ignoring whitespace, and indexes it all to enable fast search for token sequences. Any hits are displayed in a hit window and can be clicked to see the actual file text in context.
Your particular query would be stated:
(I - I=*CONSTANT*) '=' N ( ';' | O | K | I)
This hunts for any assignment in which the target identifier doesn't contain the string constant (see wildcard stars around the match string), assigned a constant *N*umber is not followed by a ';' or an *O*perator, *K*word or *I*dentifier (all this extra stuff is because JavaScript might not have a semicolon to terminate the statement). It probably picks up some cases it should not but
these are easily inspected.