Re: Trying to understand the bison parser algorithm.

Chris F Clark <cfc@world.std.com>
1 Oct 2000 00:27:05 -0400

From comp.compilers

Related articles
Trying to understand the bison parser algorithm. comments@vrml3d.com (vrml3d.com) (2000-09-28)
*Re: Trying to understand the bison parser algorithm. cfc@world.std.com (Chris F Clark)* (2000-10-01)**

| List of all articles for this month |

From:	Chris F Clark <cfc@world.std.com>
Newsgroups:	comp.compilers
Date:	1 Oct 2000 00:27:05 -0400
Organization:	The World Public Access UNIX, Brookline, MA
References:	00-09-195
Keywords:	yacc, parse
CC:	cfc@world.std.com, compilers@iecc.com, comments@vrml3d.com

> In other words, I can't make any sense out of their explanation. It
> seems to contradict itself.

It's probably just overly terse and assumes things to be obvious that
are not.

> Can somebody point me to a better explanation, prefereably with a
> flow chart? Simpler is better.

Yes, find a copy of a good compiler text book that covers LALR
parsing, see the comp.compilers FAQ for recommendations. The LALR
parsing method is actually just a mechanical process.

> Also, I've tried looking at the source of a Bison generated parser
> for a simple expression evaluator, and it was difficult to analyze,
> what with all the gotos and stuff.

With this you are barking up the wrong tree, the code in the generated
parser is the same for all grammars. It is simply a hand optimized
program that traverses the data structures that represent the parsing
tables (the arrays of numbers). Those arrays of numbers represent the
"real" parser.

I have included the tables generated by Yacc++ (in its most generally
readable mode) for your grammar. Perhaps they will help you
understand. You start in state 0. The first part of the state
listing is the "items" the state represents. It is followed by the
table of actions that implement that state (first column is the token
# and name) followed by the action (first as an index into a table
that I didn't include and then as comments telling you what that
action does.

Ignore the "push" parts of the actions, they are a Yacc++ extension
and not part of the standard LALR notation. However, when you reduce
in Yacc++, you return to the state that executed the most recent
"push" and that state will tell you what to do with the non-terminal
you just reduced. Yacc++ does this by putting non-terminals on the
stack (a technique called non-canonical LR parsing). Normal LALR
parsers do the same thing, but hide the details in the "goto" table
and the semantics of how LALR reduce works. (I will omit the details
of the canonical/non-canonical LALR parsing, as I find non-canonical
parsing much easier to explain, at both terminal and non-terminals are
handled identically.)

Thus, for your 1 + 2 case:

Your input as tokens looks like this:

.NUMBER '+' NUMBER (yy_eof)
^ the dot divides what you haven't parsed yet, from what you have
    and not having parsed anything yet the dot starts at the beginning

You are in state 0 and have 1 (a NUMBER). The entry for number in
that state says to shift the number and reduce to a term. The shift
swallows the number from your input (putting it "on the parser stack",
which means before the dot)

NUMBER .'+' NUMBER (yy_eof)
          ^ again, the dot divides what you haven't parsed yet, from what
          you have

Then, the reduce takes the contents of the
parser stack (and removes the NUMBER from it) and says you are now
looking for what to do with a "term" in state 0.

.term '+' NUMBER (yy_eof)
^ we haven't parsed the term yet, so it is after the dot

In state 0, the rule for term, says you swallow that and
change to state 1. Your parser stack now looks like this:

term .'+' NUMBER (yy_eof)

Now, you are in state 1 with a lookahead of the next token '+'. In
state 1, the actions say to shift the '+' and goto state 4.

term '+' .NUMBER (yy_eof)

In state 4, you shift and reduce NUMBER to a term again. And the
action for term shifts and takes you back to state 1, with the
following stack:

term '+' term .(yy_eof)

I'll leave you to figure out what happens then.

BTW, this state 1 where I stopped explaining, is the one where the
Bison example left you cold. If you look at the "items" in state 1,
you can see where the shift versus reduce actions come from.

The item
Line 42: term : '(' expr .')' ; is a lookahead from state 5 for
Line 39: expr : term .;

says that you can be in state 1 in the case where some containing rule
(line 42) has the dot before a closing paren (that happens in state
5--if you take the example "(1+2)" you will pass through state 5) and
the current rule is expr : term (and we are at the end of the rule).
At the end of a rule, the parser always reduces.

The item
Line 43: term : term .'!' ;

says the you can be in state one with the dot after a term, but before
a '!'. Since that is not the end of the rule, the parser always shifts to
get the dot to the next spot in the rule.

A conflict occurs when you have two items in a state and the items
tell the parser to do two different things.

Note, if you have trouble understanding the above actions. Find
yourself a tool the executes the parse tables graphically. Dr Susan
Rodgers has written some. Visual Parse++ is another candidate.

/******************************************************************************

        Parser State: 0

        ---- core items ----

Line 0: <start> : .expr yy_eof <<accept>>

        ---- expd items ----

Line 38: expr : .term '+' expr ;

Line 39: expr : .term ;

Line 42: term : .'(' expr ')' ;

Line 43: term : .term '!' ;

Line 44: term : .NUMBER ;

******************************************************************************/

int yy_psr_prog0[]
= {
/* 0 yy_eof */ 0, /* error */
/* 1 NUMBER */ 1, /* push->shift->reduce term */
/* 2 '+' */ 0, /* error */
/* 3 '(' */ 3, /* push->shift->change state 2 */
/* 4 ')' */ 0, /* error */
/* 5 '!' */ 0, /* error */
/* 6 term */ 5, /* push->shift->change state 1 */
/* 7 expr */ 7, /* shift->change state 3 */
};

/******************************************************************************

        Parser State: 1

        ---- core items ----

Line 38: expr : term .'+' expr ;

Line 42: term : '(' expr .')' ; is a lookahead from state 5 for
Line 39: expr : term .;

Line 0: <start> : expr .yy_eof <<accept>> is a lookahead from state 3 for
Line 39: expr : term .;

Line 43: term : term .'!' ;

Line 39: expr : term .;

        ---- expd items ----

******************************************************************************/

int yy_psr_prog1[]
= {
/* 0 yy_eof */ 9, /* reduce expr */
/* 1 NUMBER */ 0, /* error */
/* 2 '+' */ 11, /* shift->change state 4 */
/* 3 '(' */ 0, /* error */
/* 4 ')' */ 9, /* reduce expr */
/* 5 '!' */ 13, /* shift->reduce term */
/* 6 term */ 0, /* error */
/* 7 expr */ 0, /* error */
};

/******************************************************************************

        Parser State: 2

        ---- core items ----

Line 42: term : '(' .expr ')' ;

        ---- expd items ----

Line 38: expr : .term '+' expr ;

Line 39: expr : .term ;

Line 42: term : .'(' expr ')' ;

Line 43: term : .term '!' ;

Line 44: term : .NUMBER ;

******************************************************************************/

int yy_psr_prog2[]
= {
/* 0 yy_eof */ 0, /* error */
/* 1 NUMBER */ 1, /* push->shift->reduce term */
/* 2 '+' */ 0, /* error */
/* 3 '(' */ 15, /* push->shift */
/* 4 ')' */ 0, /* error */
/* 5 '!' */ 0, /* error */
/* 6 term */ 5, /* push->shift->change state 1 */
/* 7 expr */ 18, /* shift->change state 5 */
};

/******************************************************************************

        Parser State: 3

        ---- core items ----

Line 0: <start> : expr .yy_eof <<accept>>

        ---- expd items ----

******************************************************************************/

int yy_psr_prog3[]
= {
/* 0 yy_eof */ 20, /* shift-><<accept>> */
/* 1 NUMBER */ 0, /* error */
/* 2 '+' */ 0, /* error */
/* 3 '(' */ 0, /* error */
/* 4 ')' */ 0, /* error */
/* 5 '!' */ 0, /* error */
/* 6 term */ 0, /* error */
/* 7 expr */ 0, /* error */
};

/******************************************************************************

        Parser State: 4

        ---- core items ----

Line 38: expr : term '+' .expr ;

        ---- expd items ----

Line 38: expr : .term '+' expr ;

Line 39: expr : .term ;

Line 42: term : .'(' expr ')' ;

Line 43: term : .term '!' ;

Line 44: term : .NUMBER ;

******************************************************************************/

int yy_psr_prog4[]
= {
/* 0 yy_eof */ 0, /* error */
/* 1 NUMBER */ 1, /* push->shift->reduce term */
/* 2 '+' */ 0, /* error */
/* 3 '(' */ 3, /* push->shift->change state 2 */
/* 4 ')' */ 0, /* error */
/* 5 '!' */ 0, /* error */
/* 6 term */ 5, /* push->shift->change state 1 */
/* 7 expr */ 21, /* shift->reduce expr */
};

/******************************************************************************

        Parser State: 5

        ---- core items ----

Line 42: term : '(' expr .')' ;

        ---- expd items ----

******************************************************************************/

int yy_psr_prog5[]
= {
/* 0 yy_eof */ 0, /* error */
/* 1 NUMBER */ 0, /* error */
/* 2 '+' */ 0, /* error */
/* 3 '(' */ 0, /* error */
/* 4 ')' */ 13, /* shift->reduce term */
/* 5 '!' */ 0, /* error */
/* 6 term */ 0, /* error */
/* 7 expr */ 0, /* error */
};

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Trying to understand the bison parser algorithm.

Chris F Clark <cfc@world.std.com>1 Oct 2000 00:27:05 -0400

Chris F Clark <cfc@world.std.com>
1 Oct 2000 00:27:05 -0400