Thu, 03 Jan 2013 16:49:29 +0100

Related articles |
---|

Compiling expressions james.harris.1@gmail.com (James Harris) (2012-12-29) |

Re: Compiling expressions gah@ugcs.caltech.edu (glen herrmannsfeldt) (2012-12-29) |

Re: Compiling expressions mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2012-12-30) |

Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-02) |

Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-02) |

Re: Compiling expressions matzebraun@googlemail.com (matzebraun@googlemail.com) (2013-01-03) |

Re: Compiling expressions torbenm@diku.dk (2013-01-03) |

Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-03) |

Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-03) |

Re: Compiling expressions mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2013-01-04) |

Re: Compiling expressions james.harris.1@gmail.com (James Harris) (2013-01-06) |

Re: Compiling expressions vonbrand@inf.utfsm.cl (Horst von Brand) (2013-01-14) |

Re: Compiling expressions james.harris.1@gmail.com (James Harris \(es\)) (2013-03-07) |

From: | torbenm@diku.dk (Torben Ęgidius Mogensen) |

Newsgroups: | comp.compilers |

Date: | Thu, 03 Jan 2013 16:49:29 +0100 |

Organization: | SunSITE.dk - Supporting Open source |

References: | 12-12-035 |

Keywords: | parse |

Posted-Date: | 03 Jan 2013 15:17:10 EST |

James Harris <james.harris.1@gmail.com> writes:

*> The requirements [for parsing expressions] are:*

*>*

*> 1. Hand-written, not the output of a parser generator.*

*> 2. Efficient and without backtracking.*

*> 3. Precedences (and possibly associativities) defined in tables.*

*> 4. Output to be a tree structure.*

*> 5. Parenthesised subexpressions allowed.*

*> 6. Some operator families are *not* to associate with each other. See*

*> below.*

*> 7. Monadic prefix, dyadic infix and monadic postfix operators are all*

*> allowed.*

*> 8. Prefix and infix operators can use some same symbols (e.g. minus*

*> sign).*

*>*

*> Infix and postfix operators use distinct symbols. For example, if a*

*> certain symbol were used as a postfix operator it could not also be*

*> used as an infix operator.*

It sounds like you could use the classic precedence parsing method.

Basically, each infix operator is given two precedence values: One for

its left-hand side and one for its right-hand side. Left-associative

operators would have a higher left-hand precdence than right-hand

precedence and vice-versa. Left parens have the highest possible left

precedence and the lowest possible right precedence. Right parens

have the opposite.

If you only have infix operators, atoms (numbers, variables, etc) and

parentheses, the method is as follows:

Start with an empty stack of treess and an operator stack containing a

left paren. Then loop the following until explicitly stopped:

- If the next input symbol is an atom, it is read and pushed on the

tree stack.

- If the next input symbol is an operator, compare its left-hand

precedence to the right-hand precedence of the top-most operator on

the operator stack.

If the stacked symbol has higher precedence, pop the two top-most

trees from the tree stack and the operator from the operator stack

and push on the tree stack a tree build from the two popped trees and

the popped symbol. The input symbol is kept unread.

If the stacked symbol has lower precedence, read the input symbol and

push it on the operator stack.

- If the next input symbol is a left paren, read it and push it on the

operator stack.

- If the next input symbol is a right paren and the topmost symbol on

the operator stack is a left paren, pop this and read the right

paren.

- If the next input symbol is a right paren and the topmost symbol on

the operator stack is not a left paren, pop the two topmost trees and

the topmost operator and push a tree built from these. Keep the

right paren unread.

- If the end of file is reached and the topmost symbol on the operator

stack is a left paren, then stop.

- If the end of file is reached and the topmost symbol on the operator

stack is not a left paren, pop the two topmost trees and the topmost

operator and push a tree built from these.

When stopped, the tree stack contains the desired syntax tree.

This needs some modification to catch all syntax errors. For example, 3

4 + + 5 would parse the same way as 3 + 4 + 5. The simplest solution is

to add a bit that tells whether the most recently read symbol was an

atom or an operator.

A prefix operator will never follow an atom, so by checking this bit,

you can treat a - as a prefix or infix symbol depending on context.

Prefix operators are pushed on the operator stack (with a bit saying

they are prefix operators) and the above algorithm is modified so a

stacked prefix operator with higher precedence than the input operator

builds a tree with only one child. A stacked prefix operator with lower

precedence causes the new operator to be read and pushed.

Postfix operators in input with higher predecence than the topmost

operator are read and a tree is built from this and the topmost tree.

Postfix operators with lower precedence causes a tree to be built from

the topmost operator and one or two trees from the tree stack (depending

on whether the topmost operator is prefix or infix).

Note that postfix operators are never pushed on the operator stack and

prefix operators never stay unread. Neither change the bit that

indicate whether an atom or infix operator was read.

If specific pairs of operators do not associate with each other, you

report an error if one is in the input and the other is at the top of

the operator stack.

Torben

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.