Re: Compiling expressions

torbenm@diku.dk (Torben Ęgidius Mogensen)
Thu, 03 Jan 2013 16:49:29 +0100

          From comp.compilers

Related articles
Compiling expressions james.harris.1@gmail.com (James Harris) (2012-12-29)
Re: Compiling expressions gah@ugcs.caltech.edu (glen herrmannsfeldt) (2012-12-29)
Re: Compiling expressions mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2012-12-30)
Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-02)
Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-02)
Re: Compiling expressions matzebraun@googlemail.com (matzebraun@googlemail.com) (2013-01-03)
Re: Compiling expressions torbenm@diku.dk (2013-01-03)
Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-03)
Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-03)
Re: Compiling expressions mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2013-01-04)
Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-06)
Re: Compiling expressions vonbrand@inf.utfsm.cl (Horst von Brand) (2013-01-14)
Re: Compiling expressions james.harris.1@gmail.com (James Harris \(es\)) (2013-03-07)
| List of all articles for this month |

From: torbenm@diku.dk (Torben Ęgidius Mogensen)
Newsgroups: comp.compilers
Date: Thu, 03 Jan 2013 16:49:29 +0100
Organization: SunSITE.dk - Supporting Open source
References: 12-12-035
Keywords: parse
Posted-Date: 03 Jan 2013 15:17:10 EST

James Harris <james.harris.1@gmail.com> writes:


> The requirements [for parsing expressions] are:
>
> 1. Hand-written, not the output of a parser generator.
> 2. Efficient and without backtracking.
> 3. Precedences (and possibly associativities) defined in tables.
> 4. Output to be a tree structure.
> 5. Parenthesised subexpressions allowed.
> 6. Some operator families are *not* to associate with each other. See
> below.
> 7. Monadic prefix, dyadic infix and monadic postfix operators are all
> allowed.
> 8. Prefix and infix operators can use some same symbols (e.g. minus
> sign).
>
> Infix and postfix operators use distinct symbols. For example, if a
> certain symbol were used as a postfix operator it could not also be
> used as an infix operator.


It sounds like you could use the classic precedence parsing method.
Basically, each infix operator is given two precedence values: One for
its left-hand side and one for its right-hand side. Left-associative
operators would have a higher left-hand precdence than right-hand
precedence and vice-versa. Left parens have the highest possible left
precedence and the lowest possible right precedence. Right parens
have the opposite.


If you only have infix operators, atoms (numbers, variables, etc) and
parentheses, the method is as follows:


Start with an empty stack of treess and an operator stack containing a
left paren. Then loop the following until explicitly stopped:


  - If the next input symbol is an atom, it is read and pushed on the
      tree stack.


  - If the next input symbol is an operator, compare its left-hand
      precedence to the right-hand precedence of the top-most operator on
      the operator stack.


      If the stacked symbol has higher precedence, pop the two top-most
      trees from the tree stack and the operator from the operator stack
      and push on the tree stack a tree build from the two popped trees and
      the popped symbol. The input symbol is kept unread.


      If the stacked symbol has lower precedence, read the input symbol and
      push it on the operator stack.


  - If the next input symbol is a left paren, read it and push it on the
      operator stack.


  - If the next input symbol is a right paren and the topmost symbol on
      the operator stack is a left paren, pop this and read the right
      paren.


  - If the next input symbol is a right paren and the topmost symbol on
      the operator stack is not a left paren, pop the two topmost trees and
      the topmost operator and push a tree built from these. Keep the
      right paren unread.


  - If the end of file is reached and the topmost symbol on the operator
      stack is a left paren, then stop.


  - If the end of file is reached and the topmost symbol on the operator
      stack is not a left paren, pop the two topmost trees and the topmost
      operator and push a tree built from these.


When stopped, the tree stack contains the desired syntax tree.


This needs some modification to catch all syntax errors. For example, 3
4 + + 5 would parse the same way as 3 + 4 + 5. The simplest solution is
to add a bit that tells whether the most recently read symbol was an
atom or an operator.


A prefix operator will never follow an atom, so by checking this bit,
you can treat a - as a prefix or infix symbol depending on context.
Prefix operators are pushed on the operator stack (with a bit saying
they are prefix operators) and the above algorithm is modified so a
stacked prefix operator with higher precedence than the input operator
builds a tree with only one child. A stacked prefix operator with lower
precedence causes the new operator to be read and pushed.


Postfix operators in input with higher predecence than the topmost
operator are read and a tree is built from this and the topmost tree.
Postfix operators with lower precedence causes a tree to be built from
the topmost operator and one or two trees from the tree stack (depending
on whether the topmost operator is prefix or infix).


Note that postfix operators are never pushed on the operator stack and
prefix operators never stay unread. Neither change the bit that
indicate whether an atom or infix operator was read.


If specific pairs of operators do not associate with each other, you
report an error if one is in the input and the other is at the top of
the operator stack.


Torben


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.