1 Oct 2000 00:27:05 -0400

Related articles |
---|

Trying to understand the bison parser algorithm. comments@vrml3d.com (vrml3d.com) (2000-09-28) |

Re: Trying to understand the bison parser algorithm. cfc@world.std.com (Chris F Clark) (2000-10-01) |

From: | Chris F Clark <cfc@world.std.com> |

Newsgroups: | comp.compilers |

Date: | 1 Oct 2000 00:27:05 -0400 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 00-09-195 |

Keywords: | yacc, parse |

CC: | cfc@world.std.com, compilers@iecc.com, comments@vrml3d.com |

*> In other words, I can't make any sense out of their explanation. It*

*> seems to contradict itself.*

It's probably just overly terse and assumes things to be obvious that

are not.

*> Can somebody point me to a better explanation, prefereably with a*

*> flow chart? Simpler is better.*

Yes, find a copy of a good compiler text book that covers LALR

parsing, see the comp.compilers FAQ for recommendations. The LALR

parsing method is actually just a mechanical process.

*> Also, I've tried looking at the source of a Bison generated parser*

*> for a simple expression evaluator, and it was difficult to analyze,*

*> what with all the gotos and stuff.*

With this you are barking up the wrong tree, the code in the generated

parser is the same for all grammars. It is simply a hand optimized

program that traverses the data structures that represent the parsing

tables (the arrays of numbers). Those arrays of numbers represent the

"real" parser.

I have included the tables generated by Yacc++ (in its most generally

readable mode) for your grammar. Perhaps they will help you

understand. You start in state 0. The first part of the state

listing is the "items" the state represents. It is followed by the

table of actions that implement that state (first column is the token

# and name) followed by the action (first as an index into a table

that I didn't include and then as comments telling you what that

action does.

Ignore the "push" parts of the actions, they are a Yacc++ extension

and not part of the standard LALR notation. However, when you reduce

in Yacc++, you return to the state that executed the most recent

"push" and that state will tell you what to do with the non-terminal

you just reduced. Yacc++ does this by putting non-terminals on the

stack (a technique called non-canonical LR parsing). Normal LALR

parsers do the same thing, but hide the details in the "goto" table

and the semantics of how LALR reduce works. (I will omit the details

of the canonical/non-canonical LALR parsing, as I find non-canonical

parsing much easier to explain, at both terminal and non-terminals are

handled identically.)

Thus, for your 1 + 2 case:

Your input as tokens looks like this:

.NUMBER '+' NUMBER (yy_eof)

^ the dot divides what you haven't parsed yet, from what you have

and not having parsed anything yet the dot starts at the beginning

You are in state 0 and have 1 (a NUMBER). The entry for number in

that state says to shift the number and reduce to a term. The shift

swallows the number from your input (putting it "on the parser stack",

which means before the dot)

NUMBER .'+' NUMBER (yy_eof)

^ again, the dot divides what you haven't parsed yet, from what

you have

Then, the reduce takes the contents of the

parser stack (and removes the NUMBER from it) and says you are now

looking for what to do with a "term" in state 0.

.term '+' NUMBER (yy_eof)

^ we haven't parsed the term yet, so it is after the dot

In state 0, the rule for term, says you swallow that and

change to state 1. Your parser stack now looks like this:

term .'+' NUMBER (yy_eof)

Now, you are in state 1 with a lookahead of the next token '+'. In

state 1, the actions say to shift the '+' and goto state 4.

term '+' .NUMBER (yy_eof)

In state 4, you shift and reduce NUMBER to a term again. And the

action for term shifts and takes you back to state 1, with the

following stack:

term '+' term .(yy_eof)

I'll leave you to figure out what happens then.

BTW, this state 1 where I stopped explaining, is the one where the

Bison example left you cold. If you look at the "items" in state 1,

you can see where the shift versus reduce actions come from.

The item

Line 42: term : '(' expr .')' ; is a lookahead from state 5 for

Line 39: expr : term .;

says that you can be in state 1 in the case where some containing rule

(line 42) has the dot before a closing paren (that happens in state

5--if you take the example "(1+2)" you will pass through state 5) and

the current rule is expr : term (and we are at the end of the rule).

At the end of a rule, the parser always reduces.

The item

Line 43: term : term .'!' ;

says the you can be in state one with the dot after a term, but before

a '!'. Since that is not the end of the rule, the parser always shifts to

get the dot to the next spot in the rule.

A conflict occurs when you have two items in a state and the items

tell the parser to do two different things.

Note, if you have trouble understanding the above actions. Find

yourself a tool the executes the parse tables graphically. Dr Susan

Rodgers has written some. Visual Parse++ is another candidate.

/******************************************************************************

Parser State: 0

---- core items ----

Line 0: <start> : .expr yy_eof <<accept>>

---- expd items ----

Line 38: expr : .term '+' expr ;

Line 39: expr : .term ;

Line 42: term : .'(' expr ')' ;

Line 43: term : .term '!' ;

Line 44: term : .NUMBER ;

******************************************************************************/

int yy_psr_prog0[]

= {

/* 0 yy_eof */ 0, /* error */

/* 1 NUMBER */ 1, /* push->shift->reduce term */

/* 2 '+' */ 0, /* error */

/* 3 '(' */ 3, /* push->shift->change state 2 */

/* 4 ')' */ 0, /* error */

/* 5 '!' */ 0, /* error */

/* 6 term */ 5, /* push->shift->change state 1 */

/* 7 expr */ 7, /* shift->change state 3 */

*};*

/******************************************************************************

Parser State: 1

---- core items ----

Line 38: expr : term .'+' expr ;

Line 42: term : '(' expr .')' ; is a lookahead from state 5 for

Line 39: expr : term .;

Line 0: <start> : expr .yy_eof <<accept>> is a lookahead from state 3 for

Line 39: expr : term .;

Line 43: term : term .'!' ;

Line 39: expr : term .;

---- expd items ----

******************************************************************************/

int yy_psr_prog1[]

= {

/* 0 yy_eof */ 9, /* reduce expr */

/* 1 NUMBER */ 0, /* error */

/* 2 '+' */ 11, /* shift->change state 4 */

/* 3 '(' */ 0, /* error */

/* 4 ')' */ 9, /* reduce expr */

/* 5 '!' */ 13, /* shift->reduce term */

/* 6 term */ 0, /* error */

/* 7 expr */ 0, /* error */

*};*

/******************************************************************************

Parser State: 2

---- core items ----

Line 42: term : '(' .expr ')' ;

---- expd items ----

Line 38: expr : .term '+' expr ;

Line 39: expr : .term ;

Line 42: term : .'(' expr ')' ;

Line 43: term : .term '!' ;

Line 44: term : .NUMBER ;

******************************************************************************/

int yy_psr_prog2[]

= {

/* 0 yy_eof */ 0, /* error */

/* 1 NUMBER */ 1, /* push->shift->reduce term */

/* 2 '+' */ 0, /* error */

/* 3 '(' */ 15, /* push->shift */

/* 4 ')' */ 0, /* error */

/* 5 '!' */ 0, /* error */

/* 6 term */ 5, /* push->shift->change state 1 */

/* 7 expr */ 18, /* shift->change state 5 */

*};*

/******************************************************************************

Parser State: 3

---- core items ----

Line 0: <start> : expr .yy_eof <<accept>>

---- expd items ----

******************************************************************************/

int yy_psr_prog3[]

= {

/* 0 yy_eof */ 20, /* shift-><<accept>> */

/* 1 NUMBER */ 0, /* error */

/* 2 '+' */ 0, /* error */

/* 3 '(' */ 0, /* error */

/* 4 ')' */ 0, /* error */

/* 5 '!' */ 0, /* error */

/* 6 term */ 0, /* error */

/* 7 expr */ 0, /* error */

*};*

/******************************************************************************

Parser State: 4

---- core items ----

Line 38: expr : term '+' .expr ;

---- expd items ----

Line 38: expr : .term '+' expr ;

Line 39: expr : .term ;

Line 42: term : .'(' expr ')' ;

Line 43: term : .term '!' ;

Line 44: term : .NUMBER ;

******************************************************************************/

int yy_psr_prog4[]

= {

/* 0 yy_eof */ 0, /* error */

/* 1 NUMBER */ 1, /* push->shift->reduce term */

/* 2 '+' */ 0, /* error */

/* 3 '(' */ 3, /* push->shift->change state 2 */

/* 4 ')' */ 0, /* error */

/* 5 '!' */ 0, /* error */

/* 6 term */ 5, /* push->shift->change state 1 */

/* 7 expr */ 21, /* shift->reduce expr */

*};*

/******************************************************************************

Parser State: 5

---- core items ----

Line 42: term : '(' expr .')' ;

---- expd items ----

******************************************************************************/

int yy_psr_prog5[]

= {

/* 0 yy_eof */ 0, /* error */

/* 1 NUMBER */ 0, /* error */

/* 2 '+' */ 0, /* error */

/* 3 '(' */ 0, /* error */

/* 4 ')' */ 13, /* shift->reduce term */

/* 5 '!' */ 0, /* error */

/* 6 term */ 0, /* error */

/* 7 expr */ 0, /* error */

*};*

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

3 Proctor Street voice : (508) 435-5016

Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.