The interpreter built by my university, codeboot.org, offers step-by-step execution for an expression. As a result, I was able to see how the program reads an arithmetic expression. And this is where I start to confus.
For example, this expression: 10-5+(7+2)/3
We always say that we should calculate the expression in the parenthesis, as a result, this is what the order that I expect
7+2=9, 9/3=3, 10-5=5, 5+3=8
However, what the interpreter executes is completely different.
10-5=5, 7+2=9, 9/3=3, 5+3=8
Even though the result is the same, but why would it calculate 10-5 first? and what happens with "we have to calculate whatever is in the parenthesis first"? This makes me really confusing
I would like to know if this is the right behavior or not that the interpreter always goes from left to right and calculate whatever it can calculate first. Instead of jumping right into the () as we would expect
"Do the parentheses first" is not a rule in JS. And "go from left to right" isn't really a rule either. E.g. consider 1 + 4 * 6. Strict left-to-right would result in
1+4 = 5, 5*6 = 30
and that's not what JS does.
Instead, JS parses your expression into an expression tree, and then evaluates it starting at the root of the tree. (Strictly speaking, a JS implementation implementation isn't required to build a tree, but it's required to give the same results as if it did.)
For instance, your example expression 10-5+(7+2)/3 would result in a tree roughly like this:
AdditiveExpression:
AdditiveExpression
AdditiveExpression
MultiplicativeExpression
... NumericLiteral 10
- -
MultiplicativeExpression
... NumericLiteral 5
+ +
MultiplicativeExpression
MultiplicativeExpression
... ParenthesizeExpression
( (
Expression
... AdditiveExpression 7+2
) )
MultiplicativeOperator /
ExponentiationExpression
... NumericLiteral 3
where:
I've used indentation to convey nesting;
I've used "..." when I've left out lots of intermediate derivations; and
I haven't bothered to give the full sub-tree for "7+2".
(I couldn't find a way to get codeboot.org to show its parse tree. If there is some way, or if you use some other tool to show an expression's parse tree, note that it may not look exactly as above, but it should be similar enough that it will give the same behavior.)
To evaluate the expression, it starts at the root, an AdditiveExpression whose children are:
another AdditiveExpression (for 10-5),
the + token, and
a MultiplicativeExpression (for (7+2)/3).
The rule is to
(a) evaluate the left operand, then
(b) evaluate the right, then
(c) perform the addition on the results.
So that's why (a) 10-5 => 5 is the first thing your interpreter calculates.
Next is to (b) evaluate the MultiplicativeExpression for (7+2)/3. The rule here is similar, so we need to:
(b1) evaluate the left operand (the MultiplicativeExpression for (7+2)), then
(b2) evaluate the right operand (the ExponentiationExpression for 3), then
(b3) perform the operation indicated by the MultiplicativeOperator /.
So (b1) 7+2 => 9 is the next thing,
then (b2) 3 => 3,
then (b3) 9/3 => 3.
We're now finished step (b), so we proceed to (c) 5+3 => 8.
This matches the series of calculations that your interpreter performs.
Related
I'm working on a expression parser made in Jison, which supports basic things like arithmetics, comparisons etc. I want to allow chained comparisons like 1 < a < 10 and x == y != z. I've already implemented the logic needed to compare multiple values, but I'm strugling with the grammar – Jison keeps grouping the comparisons like (1 < a) < 10 or x == (y != z) and I can't make it recognize the whole thing as one relation.
This is roughly the grammar I have:
expressions = e EOF
e = Number
| e + e
| e - e
| Relation %prec '=='
| ...
Relation = e RelationalOperator Relation %prec 'CHAINED'
| e RelationalOperator Relation %prec 'NONCHAINED'
RelationalOperator = '==' | '!=' | ...
(Sorry, I don't know the actual Bison syntax, I use JSON. Here's the entire source.)
The operator precedence is roughly: NONCHAINED, ==, CHAINED, + and -.
I have an action set up on e → Relation, so I need that Relation to match the whole chained comparison, not only a part of it. I tried many things, including tweaking the precedence and changing the right-recursive e RelationalOperator Relation to a left-recursive Relation RelationalOperator e, but nothing worked so far. Either the parser matches only the smallest Relation possible, or it warns me that the grammar is ambiguous.
If you decided to experiment with the program, cloning it and running these commands will get you started:
git checkout develop
yarn
yarn test
There are basically two relatively easy solutions to this problem:
Use a cascading grammar instead of precedence declarations.
This makes it relatively easy to write a grammar for chained comparison, and does not really complicate the grammar for binary operators nor for tight-binding unary operators.
You'll find examples of cascading grammars all over the place, including most programming languages. A reasonably complete example is seen in this grammar for C expressions (just look at the grammar up to constant_expression:).
One of the advantages of cascading grammars is that they let you group operators at the same precedence level into a single non-terminal, as you try to do with comparison operators and as the linked C grammar does with assignment operators. That doesn't work with precedence declarations because precedence can't "see through" a unit production; the actual token has to be visibly part of the rule with declared precedence.
Another advantage is that if you have specific parsing needs for chained operators, you can just write the rule for the chained operators accordingly; you don't have to worry about it interfering with the rest of the grammar.
However, cascading grammars don't really get unary operators right, unless the unary operators are all at the top of the precedence hierarchy. This can be seen in Python, which uses a cascading grammar and has several unary operators low in the precedence hierarchy, such as the not operator, leading to the following oddity:
>>> if False == not True: print("All is well")
File "<stdin>", line 1
if False == not True: print("All is well")
^
SyntaxError: invalid syntax
That's a syntax error because == has higher precedence than not. The cascading grammar only allows an expression to appear as the operand of an operator with lower precedence than any operator in the expression, which means that the expression not True cannot be the operand of ==. (The precedence ordering allows not a == b to be grouped as not (a == b).) That prohibition is arguably ridiculous, since there is no other possible interpretation of False == not True other than False == (not True), and the fact that the precedence ordering forbids the only possible interpretation makes the only possible interpretation a syntax error. This doesn't happen with precedence declarations, because the precedence declaration is only used if there is more than one possible parse (that is, if there is really an ambiguity).
Your grammar puts not at the top of the precedence hierarchy, although it should really share that level with unary minus rather than being above unary minus [Note 1]. So that's not an impediment to using a cascading grammar. However, I see that you also want to implement an if … then … else operator, which is syntactically a low-precedence prefix operator. So if you wanted 4 + if x then 0 else 1 to have the value 5 when x is false (rather than being a syntax error), the cascading grammar would be problematic. You might not care about this, and if you don't, that's probably the way to go.
Stick with precedence declarations and handle the chained comparison as an exception in the semantic action.
This will allow the simplest possible grammar, but it will complicate your actions a bit. To implement it, you'll want to implement the comparison operators as left-associative, and then you'll need to be able to distinguish in the semantic actions between a comparison (which is a list of expressions and comparison operators) from any other expression (which is a string). The semantic action for a comparison operator needs to either extend or create the list, depending on whether the left-hand operand is a list or a string. The semantic action for any other operator (including parenthetic grouping) and for the right-hand operand in a comparison needs to check if it has received a list, and if so compile it into a string.
Whichever of those two options you choose, you'll probably want to fix the various precedence errors in the existing grammar, some of which were already present in your upstream source (like the unary minus / not confusion mentioned above). These include:
Exponentiation is configured as left-associative, whereas it is almost universally considered a right-associative operator. Many languages also make it higher precedence than unary minus, as well, since -a2 is pretty well always read as the negative of a squared rather than the square of minus a (which would just be a squared).
I suppose you are going to ditch the ternary operator ?: in favour of your if … then … else operator. But if you leave ?: in the grammar, you should make it right associative, as it is in every language other than PHP. (And the associativity in PHP is generally recognised as a design error. See this summary.)
The not in operator is actually two token, not and in, and not has quite high precedence. And that's how it will be parsed by your grammar, with the result that 4 + 3 in (7, 8) evaluates to true (because it was grouped as (4 + 3) in (7, 8)), while 4 + 3 not in (7, 8) evaluates rather surprisingly to 5, having been grouped as 4 + (3 not in (7, 8)).
Notes
If you used a cascading precedence grammar, you'd see that only one of - not 0 and not - 0 is parseable. Of course, both are probably type violations, but that's not something the syntax should concern itself with.
I've been coding for years and suddenly stuck to some simple thing about operators precedence in case of increment/decrement operators.
According to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Operator_Precedence
the postfix increment/decrement has higher priority than the prefix one.
So, I expect that in expression
x = --a + a++;
the increment will be calculated first and only after that the decrement.
But, in tests this expression calculates left-to-right like that operators have the same priority. And as result
a=1;x = --a + a++ equals to 0 instead of 2.
Ok. Assuming that prefix/postfix operators have the same precedence, I try to reorder it with parentheses:
a=1;x = --a + ( a++ )
But again, the result will be 0 and not 2 as I expected.
Can someone explain that please? why parentheses here do not affect anything? How can I see that postfix has higher precedence than prefix?
In the expression, evaluation proceeds like this:
--a is evaluated. The variable a is decremented, and the value of --a is therefore 0.
a++ is evaluated. The value of a is obtained, and then a is incremented. The value of a++ is therefore 0.
The + operation is performed, and the result is 0.
The final value of a is 1.
Because --a and a++ are on either side of the lower-precedence + operator, the difference in precedence between pre-decrement and post-increment doesn't matter; the + operator evaluates the left-hand subexpression before it evaluates the right-hand subexpression.
Operator precedence is not the same thing as evaluation order.
Operator precedence tells you that
f() + g() * h()
is parsed as
f() + (g() * h())
(because * has higher precedence than +), but not which function is called first. That is controlled by evaluation order, which in JavaScript is always left-to-right.
Parentheses only override precedence (i.e. they affect how subexpressions are grouped), not order of evaluation:
(f() + g()) * h()
performs addition before multiplication, but in all cases f is called first and h last.
In your example
--a + a++
the relative precedence of prefix -- and postfix ++ doesn't matter because they're not attached to the same operand. Infix + has much lower precedence, so this expression parses as
(--a) + (a++)
As always, JS expressions are evaluated from left to right, so --a is done first.
If you had written
--a++
that would have been parsed as
--(a++)
(and not (--a)++) because postfix ++ has higher precedence, but the difference doesn't matter here because either version is an error: You can't increment/decrement the result of another increment/decrement operation.
However, in the operator precedence table you can see that all prefix operators have the same precedence, so we can show an alternative:
// ! has the same precedence as prefix ++
!a++
is valid code because it parses as !(a++) due to postfix ++ having higher precedence than !. If it didn't, it would be interpreted as (!a)++, which is an error.
Well, it strange, but looks like increment/decrement behaves in expressions like a function call, not like an operator.
I could not find any documentation regarding order of function evaluations in expression, but looks like it ignores any precedence rules. So strange expression like that:
console.log(1) + console.log(2)* ( console.log(3) + console.log(4))
will show 1234 in console. That the only explanation I could found why parentheses do not affect to order of evaluation of dec/inc in expressions
I am trying to make a web-app in JavaScript that converts arithmetic expressions to i486-compatible assembly. You can see it here:
http://flatassembler.000webhostapp.com/compiler.html
I have tried to make it able to deal with expressions that contain the incremental and decremental operators ("--" and "++"). Now it appears to correctly deal with expressions such as:
c++
However, in a response to an expression such as:
c--
the web-app responds with:
Tokenizer error: Unable to assign the type to the operator '-' (whether it's unary or binary).
The error message seems quite self-explanatory. Namely, I made the tokenizer assign to the "-" operator a type (unary or binary) and put the parentheses where they are needed, so that the parser can deal with the expressions such as:
10 * -2
And now, because of that, I am not able to implement the decremental operator. I've been thinking about this for days, and I can't decide what to even try. Do you have any ideas?
Please note that the web-app now correctly deals with the expressions such as:
a - -b
The way that this works in all existing languages (that I know of anyway) that have these operators, is that -- is a single token. So when you see a -, you check whether the very next character is another -. If it is, you generate a -- token (consuming both - characters). If it isn't, you generate a - token (leaving the next character in the buffer).
Then in the parser an l-expression, followed by -- token becomes a postfix decrement expression and -- followed by an l-expression becomes a prefix decrement expression. A -- token in any other position is a syntax error.
This means that spaces between -s matter: --x is a prefix decrement (or a syntax error if the language doesn't allow prefix increment and decrement), - -x is a double negative that cancels out to just x.
I should also note that in languages where postfix increment/decrement is an expression, it evaluates to the original value of the operand, not the incremented value. So if x starts out as 5, the value of x++ should be 5 and afterwards the value of x should be 6. So your current code does not actually correctly implement postfix ++ (or at least not in a way consistent with other languages). Also x++ + y++ currently produces a syntax error, so it doesn't seem like it's really supported at all.
I have gone through similar questions and answers on StackOverflow and found this:
parseInt("123hui")
returns 123
Number("123hui")
returns NaN
As, parseInt() parses up to the first non-digit and returns whatever it had parsed and Number() tries to convert the entire string into a number, why unlikely behaviour in case of parseInt('') and Number('').
I feel ideally parseInt should return NaNjust like it does with Number("123hui")
Now my next question:
As 0 == '' returns true I believe it interprets like 0 == Number('') which is true. So does the compiler really treat it like 0 == Number('') and not like 0 == parseInt('') or am I missing some points?
The difference is due in part to Number() making use of additional logic for type coercion. Included in the rules it follows for that is:
A StringNumericLiteral that is empty or contains only white space is converted to +0.
Whereas parseInt() is defined to simply find and evaluate numeric characters in the input, based on the given or detected radix. And, it was defined to expect at least one valid character.
13) If S contains a code unit that is not a radix-R digit, let Z be the substring of S consisting of all code units before the first such code unit; otherwise, let Z be S.
14) If Z is empty, return NaN.
Note: 'S' is the input string after any leading whitespace is removed.
As 0=='' returns true I believe it interprets like 0==Number('') [...]
The rules that == uses are defined as Abstract Equality.
And, you're right about the coercion/conversion that's used. The relevant step is #6:
If Type(x) is Number and Type(y) is String,
return the result of the comparison x == ToNumber(y).
To answer your question about 0==''returning true :
Below is the comparison of a number and string:
The Equals Operator (==)
Type (x) Type(y) Result
-------------------------------------------
x and y are the same type Strict Equality (===) Algorithm
Number String x == toNumber(y)
and toNumber does the following to a string argument:
toNumber:
Argument type Result
------------------------
String In effect evaluates Number(string)
“abc” -> NaN
“123” -> 123
Number('') returns 0. So that leaves you with 0==0 which is evaluated using Strict Equality (===) Algorithm
The Strict Equals Operator (===)
Type values Result
----------------------------------------------------------
Number x same value as y true
(but not NaN)
You can find the complete list # javascriptweblog.wordpress.com - truth-equality-and-javascript.
parseInt("") is NaN because the standard says so even if +"" is 0 instead (also simply because the standard says so, implying for example that "" == 0).
Don't look for logic in this because there's no deep profound logic, just history.
You are in my opinion making a BIG mistake... the sooner you correct it the better will be for your programming life with Javascript. The mistake is that you are assuming that every choice made in programming languages and every technical detail about them is logical. This is simply not true.
Especially for Javascript.
Please remeber that Javascript was "designed" in a rush and, just because of fate, it became extremely popular overnight. This forced the community to standardize it before any serious thought to the details and therefore it was basically "frozen" in its current sad state before any serious testing on the field.
There are parts that are so bad they aren't even funny (e.g. with statement or the == equality operator that is so broken that serious js IDEs warn about any use of it: you get things like A==B, B==C and A!=C even using just normal values and without any "special" value like null, undefined, NaN or empty strings "" and not because of precision problems).
Nonsense special cases are everywhere in Javascript and trying to put them in a logical frame is, unfortunately, a wasted effort. Just learn its oddities by reading a lot and enjoy the fantastic runtime environment it provides (this is where Javascript really shines... browsers and their JIT are a truly impressive piece of technology: you can write a few lines and get real useful software running on a gajillion of different computing devices).
The official standard where all oddities are enumerated is quite hard to read because aims to be very accurate, and unfortunately the rules it has to specify are really complex.
Moreover as the language gains more features the rules will get even more and more complex: for example what is for ES5 just another weird "special" case (e.g. ToPrimitive operation behavior for Date objects) becomes a "normal" case in ES6 (where ToPrimitive can be customized).
Not sure if this "normalization" is something to be happy about... the real problem is the frozen starting point and there are no easy solutions now (if you don't want to throw away all existing javascript code, that is).
The normal path for a language is starting clean and nice and symmetric and small. Then when facing real world problems the language gains (is infected by) some ugly parts (because the world is ugly and asymmetrical).
Javascript is like that. Except that it didn't start nice and clean and moreover there was no time to polish it before throwing it in the game.
I'm implementing a pretty-printer for a JavaScript AST and I wanted to ask if someone is aware of a "proper" algorithm to automatically parenthesize expressions with minimal parentheses based on operator precedence and associativity. I haven't found any useful material on the google.
What seems obvious is that an operator whose parent has a higher precedence should be parenthesized, e.g.:
(x + y) * z // x + y has lower precedence
However, there are also some operators which are not associative, in which case parentheses are still are needed, e.g.:
x - (y - z) // both operators have the same precedence
I'm wondering what would be the best rule for this latter case. Whether it's sufficient to say that for division and subtraction, the rhs sub-expression should be parenthesized if it has less than or equal precedence.
I stumbled on your question in search of the answer myself. While I haven't found a canonical algorithm, I have found that, like you say, operator precedence alone is not enough to minimally parenthesize expressions. I took a shot at writing a JavaScript pretty printer in Haskell, though I found it tedious to write a robust parser so I changed the concrete syntax: https://gist.github.com/kputnam/5625856
In addition to precedence, you must take operator associativity into account. Binary operations like / and - are parsed as left associative. However, assignment =, exponentiation ^, and equality == are right associative. This means the expression Div (Div a b) c can be written a / b / c without parentheses, but Exp (Exp a b) c must be parenthesized as (a ^ b) ^ c.
Your intuition is correct: for left-associative operators, if the left operand's expression binds less tightly than its parent, it should be parenthesized. If the right operand's expression binds as tightly or less tightly than its parent, it should be parenthesized. So Div (Div a b) (Div c d) wouldn't require parentheses around the left subexpression, but the right subexpression would: a / b / (c / d).
Next, unary operators, specifically operators which can either be binary or unary, like negation and subtraction -, coercion and addition +, etc might need to be handled on a case-by-case basis. For example Sub a (Neg b) should be printed as a - (-b), even though unary negation binds more tightly than subtraction. I guess it depends on your parser, a - -b may not be ambiguous, just ugly.
I'm not sure how unary operators which can be both prefix and postfix should work. In expressions like ++ (a ++) and (++ a) ++, one of the operators must bind more tightly than the other, or ++ a ++ would be ambiguous. But I suspect even if parentheses aren't needed in one of those, for the sake of readability, you may want to add parentheses anyway.
It depends on the rules for the specific grammar. I think you have it right for operators with different precedence, and right for subtraction and division.
Exponentiation, however, is often treated differently, in that its right hand operand is evaluated first. So you need
(a ** b) ** c
when c is the right child of the root.
Which way the parenthesization goes is determined by what the grammar rules define. If your grammar is of the form of
exp = sub1exp ;
exp = sub1exp op exp ;
sub1exp = sub1exp ;
sub1exp = sub1exp op1 sub2exp ;
sub2exp = sub3exp ;
sub2exp = sub3exp op2 sub2exp ;
sub3exp = ....
subNexp = '(' exp ')' ;
with op1 and op2 being non-associative, then you want to parenthesize the right subtree of op1 if the subtree root is also op1, and you want to parenthesize the left subtree of op2 if the left subtree has root op2.
There is a generic approach to pretty printing expressions with minimal parentheses. Begin by defining an unambiguous grammar for your expression language which encodes precedence and associativity rules. For example, say I have a language with three binary operators (*, +, #) and a unary operator (~), then my grammar might look like
E -> E0
E0 -> E1 '+' E0 (+ right associative, lowest precedence)
E0 -> E1
E1 -> E1 '*' E2 (* left associative; # non-associative; same precedence)
E1 -> E2 '#' E2
E1 -> E2
E2 -> '~' E2 (~ binds the tightest)
E2 -> E3
E3 -> Num (atomic expressions are numbers and parenthesized expressions)
E3 -> '(' E0 ')'
Parse trees for the grammar contain all necessary (and unnecessary) parentheses, and it is impossible to construct a parse tree whose flattening results in an ambiguous expression. For example, there is no parse tree for the string
1 # 2 # 3
because '#' is non-associative and always requires parentheses. On the other hand, the string
1 # (2 # 3)
has parse tree
E(E0(E1( E2(E3(Num(1)))
'#'
E2(E3( '('
E0(E1(E2(E3(Num(2)))
'#'
E2(E3(Num(3)))))
')')))
The problem is thus reduced to the problem of coercing an abstract syntax tree to a parse tree. The minimal number of parentheses is obtained by avoiding coercing an AST node to an atomic expression whenever possible. This is easy to do in a systematic way:
Maintain a pair consisting of a pointer to the current node in the AST and the current production being expanded. Initialize the pair with the root AST node and the 'E' production. In each case for the possible forms of the AST node, expand the grammar as much as necessary to encode the AST node. This will leave an unexpanded grammar production for each AST subtree. Apply the method recursively on each (subtree, production) pair.
For example, if the AST is (* (+ 1 2) 3), then proceed as follows:
expand[ (* (+ 1 2) 3); E ] --> E( E0( E1( expand[(+ 1 2) ; E1]
'*'
expand[3 ; E2] ) ) )
expand[ (+ 1 2) ; E1 ] --> E1(E2(E3( '('
E0( expand[ 1 ; E1 ]
'+'
expand[ 2 ; E0 ] )
')' )))
...
The algorithm can of course be implemented in a much less explicit way, but the method can be used to guide an implementation without going insane :).