I just started to learn JavaScript with the book "Eloquent JavaScript", which is accessible free of charge at eloquentjavascript.net. So far I really like the book, there's just one section I don't understand. It's the one about expressions and statements:
http://eloquentjavascript.net/chapter2.html#p65af5913
I know this topic has been mentioned on StackOverflow before, however, these were more specific questions and I - quite frankly - don't get the whole thing.
In the beginning of the paragraph, the author explains what expressions are: If I understand it correctly, atomic values such as 42 or "23" are being considered expressions. If one applied those values to an operator (as in 42 - 19), this would be considered an expression, too. (I guess because it obviously turns out to be 23, which is an atomic value once again.) I interpret this the following way: Every value - no matter whether it's directly typed in or is yet to be calculated - is being called an expression. Is that correct?
Then the author says the following: "There exists a unit that is bigger than an expression. It is called a statement. [...] Most statements end with a semicolon (;). The simplest kind of statement is an expression with a semicolon after it." As an example he mentions !false; as an example statement. My questions is "What makes this a statement? Just the semicolon at the end?" When I use the JavaScript console, it makes absolutely no difference whether I type that in with or without the semicolon. It always returns true. However, the author says that "[A] statement only amounts to something if it somehow changes the world." So the given example is not even a statement since it "just produce[s] the value [...] true, and then immediately throw[s it] into the bit bucket"? I am really confused now... If I did not entirely mess it up in my head, a statement has to have some "side-effect" (like a declaration of a variable), right?
However, I would be very happy if someone could explain what a statement is. It would also be very helpful if someone could give an example in which the distinction of those terms is actually useful, because right now I cannot even imagine why the author even bothers to introduce these vocabularies. Thank you very much in advance!
A simple, albeit vague, analogy would be a text in a natural language, consisting of phrases grouped into sentences. A phrase, like "it's raining" can form a sentence by itself, like "I won't go out. It's raining." or be a part of a bigger sentence, like in "Terrible weather, it's raining all the time."
That said, the distinction between expressions and statements is very blurred in javascript. Unlike other C-alike languages, you can have not only expressions in statements, but also statements inside expressions:
a = 1 + function(x) { if(x > 1) return 10 } (20)
Some modern javascript programs, such as Jquery, use declaration techniques that basically make them one single expression.
This blurry distinction (not to say confusion) comes from the fact that Javascript, being a C/pascal/algol-like imperative language, was at the same time heavily influenced by functional languages like Lisp, which don't have a notion of "statements". In a functional language, everything is an expression.
To make things more interesting, the semicolon between statements is (sometimes) optional, so that there's no easy way to tell if two expressions belong to one statement or form two distinct ones. Consider:
a = 1 // two
!2 // statements
a = 1 // one
+2 // statement
Related
I'm currently working on a small little dsl, not unlike rabl. I'm struggling with the implementation of one of my rules. Before we get to the problem, I'll explain a bit about my syntax/grammar.
In my little language you can define properties, object/array blocks, or custom blocks (these are all used to build a json object/array). A "custom block" can either be one that contains my standard expressions (property, object/array block, etc) or some JavaScript. These expressions are written as such -
-- An object block
object #model
-- A property node
property some, key(name="value")
-- A custom node
object custom_obj as
property some(name="key")
end
-- A custom script node
property full_name as (u)
// This is JavaScript
return u.first_name + ' ' + u.last_name;
end
end
The problem I'm running into is with my custom script node. I'm having a real hard defining the script token so that JISON can properly capture the stuff inside the block.
In my lexer, I currently have...
# script_param is basically a regex to match "(some_ident)"
{script_param} { this.begin('js'); return 'SCRIPT_PARAM'; }
<js>(.|\n|\r)*?"end" %{
this.popState();
yytext = yytext.substr(0, yyleng - 3).trim();
return 'SCRIPT';
%}
That SCRIPT token will basically match anything after (u) up to (and including) the end token (which usually ends a block). I really dislike this because my usual block terminator (end) is actually part of the script token, which feels totally hacky to me. Unfortunately, I'm not able to find a better way to capture ANYTHING between (..) and end.
I've tried writing a regex that captures anything that ends with a ";", but that poses problems when I have multiple script nodes in my dsl code. I've only been able to make this work by including the "end" keyword as part of my capture.
Here are the links to my grammar and lexer files.
I'd greatly appreciate any insight into solving my problem! If I didn't explain my problem clearly, let me know and I'll try my best to clarify!
Many thanks in advance!!
I will also happily accept any advice as to how to clean up my grammar. I'm still fairly new at this stuff and feel like my stuff is a mess right now :)
It's easy enough to match a string up to but not including the first instance of end:
([^e]|e[^n]|en[^d])*
(And it doesn't even need non-greedy repetition.)
However, that's not what you want. The included JavaScript might include:
variables whose names happen to include the characters end (tendency)
comments (/* Take the values up to the end of the line */)
character strings (if (word == "end"))
and, indeed, the word end itself, which is not a reserved word in js.
Really, the only clean solution is to be able to lex javascript. Fortunately, you don't have to do it precisely, because you're not interpreting it, but even so it is a bit of work. The most annoying part of javascript lexing, like other similar languages, is identifying when / is the beginning of a regular expression, and when it is just division; getting that right requires most of a javascript parser, particularly since it also interacts with the semicolon rule.
To deal with the fact that the included javascript might actually use a variable named end, you have a couple of choices, as far as I can see:
Document the fact that end is a reserved word.
Only recognize end when it appears outside of parentheses and in a place where a statement might start (not too difficult if you end up building enough of a JS parser to correctly identify regular expressions)
Only recognize end when it appears by itself on a line.
This last choice would really simplify your problem a lot, so you might want to think about it, although it's not really very elegant.
I'm interested in finding a more-sophisticated-than-typical algorithm for finding differences between strings, that can be "tuned" via some parameters, to balance between such things as "maximize count of identical characters" vs. "maximize the length of spans" vs. "try to keep whole words intact".
Ultimately, I want to be able to make the results as human readable as possible. For instance, if a long sentence has been replaced with an entirely new sentence, where the only things it has in common with the original are the words "the" "and" and "a" in that order, I might want it treated as if the whole sentence is changed, rather than just that 4 particular spans are changed --- just like how a reasonable person would see it.
Does such a thing exist? Although I'm working in javascript/node.js, an algorithm in any language would be helpful.
I'm actually ok with something that uses Monte Carlo methods or the like, if its results are better. Computation time is not an issue (within reason), nor is determinism.
Note: although this is beyond the scope of what I'm asking, I'll throw one more thing out there just in case: It would also be great if it could recognize changes that are out of order....for instance if someone changes the order of two paragraphs while leaving them otherwise identical, it would be awesome if it recognized it as a simple move, rather than as one subtraction and and one unrelated addition.
I've had good luck with diff_match_patch. There are some good options for tuning it for readability.
Try http://prettydiff.com/ Its code is already formatted for compatibility with CommonJS, which is the framework Node uses.
I am looking for a way, using static analysis of two JavaScript functions, to tell if they are the same. Let me define multiple definitions of "the same".
Level 1: The functions are the same except for possible different whitespace, e.g. TABS, CR, LF and SPACES.
Level 2 The functions may have different whitespace like Level 1, but also may have different variable names.
Level 3 ???
For level one, I think I could just remove all (non-literal, which may be tough) whitespace from each string containing the two JS function definitions, and then compare the strings.
For level two, I think I would need to use something like SpiderMonkey's parser to generate a two parse trees, and then write a comparer which walks the trees and allows variables to have different names.
[Edit] Williham, makes a good point below. I do mean identical. Now, I'm looking for some practical strategies, particularly with regards to using parse trees.
Reedit:
To expound on my suggestion for determining identical functions, the following flow can be suggested:
Level 1: Remove any whitespace that is not part of a string literal; insert newlines after each {, ; and } and compare. If equal; the functions are identical, if not:
Level 2: Move all variable declarations and assignments that don't depend on the state of other variables defined in the same scope to the start of the scope they are declared in (or if not wanting to actually parse the JS; the start of the braces); and order them by line length; treating all variable names as being 4 characters long, and falling back to alphabetical ordering ignoring variable names in case of tied lengths. Reorder all collections in alphabetical order, and rename all variables vSNN, where v is literal, S is the number of nested braces and NN is the order in which the variable was encountered.
Compare; if equal, the functions are identical, if not:
Level 3: Replace all string literals with "sNN", where " and s are literal, and NN is the order in which the string was encountered. Compare; if equal, the functions are identical, if not:
Level 4: Normalize the names of any functions known to be the same by using the name of the function with the highest priority according to alphabetical order (in the example below, any calls to p_strlen() would be replaced with c_strlen(). Repeat re-orderings as per level 1 if necessary. Compare; if equal, the functions are identical, if not; the functions are almost certainly not identical.
Original answer:
I think you'll find that you mean "identical", not "the same".
The difference, as you'll find, is critical:
Two functions are identical if, following some manner of normalization, (removing non-literal whitespace, renaming and reordering variables to a normalized order, replacing string literals with placeholders, …) they compare to literally equal.
Two functions are the same if, when called for any given input value they give the same return value. Consider, in the general case, a programming language which has counted, zero-terminated strings (hybrid Pascal/C strings, if you will). A function p_strlen(str) might look at the character count of the string and return that. A function c_strlen(str) might count the number of characters in the string and return that.
While these functions certainly won't be identical, they will be the same: For any given (valid) input value they will give the same value.
My point is:
Determining wether two functions are identical (what you seem to want to achieve) is a (moderately) trivial problem, done as you describe.
Determining wether two functions are truly the same (what you might actually want to achieve) is non-trivial; in fact, it's downright Hard, probably related to the Halting Problem, and not something that can be done with static analysis.
Edit: Of course, functions that are identical are also the same; but in a highly specific and rarely useful way for complete analysis.
Your approach for level 1 seems reasonable.
For level 2, how about do some rudimentary variable substitution on each function and then do approch for level 1? Start at the beginning and for each variable declaration you encounter rename them to var1, var2, ... varX.
This does not help if the functions declare variables in different orders... var i and var j may be used the same way in both functions but are declared in different orders. Then you are probably left doing a comparison of parse trees like you mention.
See my company's (Semantic Designs) Smart Differencer tool. This family of tools parses source code according to compiler-level-detail grammar for the language of interest (in your case, JavaScript), builds ASTs, and then compares the ASTs (which effectively ignores whitespace and comments). Literal values are normalized, so it doesn't matter how they are "spelled"; 10E17 has the same normalized value as 1E18.
If the two trees are the same, it will tell you "no differences". If they differ by a consistent renaming of an identifier, the tool will tell you the consisten renaming and the block in which it occurs. Other differences are reported as language-element (identifier, statement, block, class,...) insertions, deletions, copies, or moves. The goal is to report the small set of deltas that plausibly explain the differences. You can see examples for a number of languages at the web site.
You can't in practice go much beyond this; to determine if two functions compute the same answer, in principle you have to solve the halting problem. You might be able to detect where two language elements that are elements of a commutative list, can be commuted without effect; we're working on this. You might be able to apply normalization rewrites to canonicalize certain forms (e.g., map all multiple declarations into a sequence of lexically sorted single declarations). You might be able to convert the source code into its equivalent set of dataflows, and do a graph isomorphism match (the Programmer's Apprentice from MIT back in the 1980's proposed to do this, but I don't think they ever got there).
All of there are more work to do than you might expect.
I'm working on creating a basic RPG game engine prototype using JavaScript and canvas. I'm still working out some design specs on paper, and I've hit a bit of a problem I'm not quite sure how to tackle.
I will have a Character object that will have an array of Attribute objects. Attributes will look something like this:
function(name, value){
this.name = name;
this.value = value;
...
}
A Character will also have "skills" that are calculated off attributes. A skills value can also be determined by a formula entered by the user. A legit formula would look something like this:
((#attribute1Name + (#attribute2Name / 2) * 5)
where any text following the # sign represents the name of an attribute belonging to that character. The formula will be entered into a text field as a string.
What I'm having a problem with is understanding the proper way to parse and evaluate this formula. Initially, my plan was to do a simple replace on the attribute names and eval the expression (if invalid, the eval would fail). However, this presents a problem as it would allow for JavaScript injection into the field. I'm assuming I'll need some kind of FSM similar to an infix calculator to solve this, but I'm a little rusty on my computation theory (thanks corporate world!). I'm really not asking for someone to just hand me the code so much as I'd like to get your input on what is the best solution to this problem?
EDIT: Thanks for the responses. Unfortunately life has kept me busy and I haven't tried a solution yet. Will update when I get a result (good or bad).
Different idea, hence a separate suggestion:
eval() works fine, and there's no need to re-invent the wheel.
Assuming that there's only a small and fixed number of variables in your formula language, it would be sufficient to scan your way through the expression and verify that everything you encounter is either a parenthesis, an operator or one of your variable names. I don't think there would be any way to assemble those pieces into a piece of code that could have malicious side effects on eval.
So:
Scan the expression to verify that it draws from just a very limited vocabulary.
Let eval() work it out.
Probably the compromise with the least amount of work and code while bringing risk down to (near?) 0. At worst, a misuser could tack parentheses on a variable name in an attempt to execute the variable.
I think instead of letting them put the whole formula in, you could have select tags that have operations and values, and let them choose.
ie. a set of tags with attribute-operation-number:
<select> <select> <input type="text">
#attribute1Name1 + (check if input is number)
#attribute1Name2 -
#attribute1Name3 *
#attribute1Name4 /
etc.
There is a really simple solution: Just enter a normal JavaScript formula (i.e. as if you were writing a method for your object) and use this to reference the object you're working on.
To change this when evaluating the method use apply() or call() (see this answer).
I recently wrote a similar application. I probably invested far too much work, but I went the whole 9 yards and wrote both a scanner and a parser.
The scanner converted the text into a series of tokens; tokens are simple objects consisting of token type and value. For the punctuation marks, value = character, for numbers the values would be integers corresponding to the numeric value of the number, and for variables it would be (a reference to) a variable object, where that variable would be sitting in a list of objects having a name. Same variable object = same variable, natch.
The parser was a simple brute force recursive descent parser. Here's the code.
My parser does logic expressions, with AND/OR taking the place of +/-, but I think you can see the idea. There are several levels of expressions, and each tries to assemble as much of itself as it can, and calls to lower levels for parsing nested constructs. When done, my parser has generated a single Node containing a tree structure that represents the expression.
In your program, I guess you could just store that Node, as its structure will essentially represent the formula for its evaluation.
Given all that work, though, I'd understand just how tempting it would be to just cave in and use eval!
I'm fascinated by the task of getting this done by the simplest means possible.
Here's another approach:
Convert infix to postfix;
use a very simple stack-based calculator to evaluate the resulting expression.
The rationale here being, once you get rid of the complication of "* before +" and parentheses, the remaining calculation is very straightforward.
You could look at running the user-defined code in a sandbox to prevent attacks:
Is It Possible to Sandbox JavaScript Running In the Browser?
This question already has answers here:
Do you recommend using semicolons after every statement in JavaScript?
(11 answers)
Closed 9 years ago.
Are there any reasons, apart from subjective visual perception and cases where you have multiple statements on the same line, to use semicolon at the end of statements in JavaScript?
It looks like that there's plenty of evidence suggesting that use of semicolons is highly optional and is required in only few of the specific cases.
Because JavaScript does nasty things to you when it guesses where to put semicolons. It's better to be explicit and let the interpreter know exactly what you meant than it is to let the idiot box guess on your behalf.
References:
http://www.webmasterworld.com/forum91/521.htm
http://www.howtocreate.co.uk/tutorials/javascript/semicolons
http://robertnyman.com/2008/10/16/beware-of-javascript-semicolon-insertion/
...and a cast of thousands.
It looks like there are very few reasons, or, actually, edge cases, when one would want to use semicolons.
http://aresemicolonsnecessaryinjavascript.com/ <- this is down now, use
https://github.com/aresemicolonsnecessaryinjavascript/aresemicolonsnecessaryinjavascript.github.com
If you asked, because you come from a Python background: The difference is:
in Python you shouldn't terminate lines with anything, but are allowed to use the semicolon, if you must
in JavaScript you should terminate the lines with a semicolon, but are allowed (PDF, page 26, point 7.9) to omit it, if it's unambiguous
Because
Debugging the subtle bugs that occur when you don't is a waste of time you could be spending doing something cool
It makes it clearer to someone maintaining the code later what you intend
Not all code maintainers understand the rules for automatic insertion well enough to maintain code with them left out
Leaving them out relies on all tools that process JavaScript code in your toolchain getting the rules exactly right (for instance, some minifiers/packers/compressors don't, including Crockford's jsmin, which will break code that relies on ASI in some places)
As Douglas Crockford suggests -
Put a ; (semicolon) at the end of every simple statement. Note that an assignment statement which is assigning a function literal or object literal is still an assignment statement and must end with a semicolon.