Re: Is that difficult to write a Shift-Reduce parser (Rob Warnock)
Sat, 01 May 2010 21:31:49 -0500

          From comp.compilers

Related articles
Is that difficult to write a Shift-Reduce parser (kuangpma) (2010-04-29)
Re: Is that difficult to write a Shift-Reduce parser (russell kym horsell) (2010-05-01)
Is that difficult to write a Shift-Reduce parser (Chariton Karamitas) (2010-05-02)
Re: Is that difficult to write a Shift-Reduce parser (2010-05-01)
Re: Is that difficult to write a Shift-Reduce parser (Chris F Clark) (2010-05-02)
Re: Is that difficult to write a Shift-Reduce parser (Tony Finch) (2010-05-04)
Re: Is that difficult to write a Shift-Reduce parser (Stephen Horne) (2010-05-07)
Re: Is that difficult to write a Shift-Reduce parser (2010-05-09)
| List of all articles for this month |

From: (Rob Warnock)
Newsgroups: comp.compilers
Date: Sat, 01 May 2010 21:31:49 -0500
Organization: Rob Warnock, Consulting Systems Architect
References: 10-04-072 10-05-003
Keywords: parse, history
Posted-Date: 02 May 2010 22:24:07 EDT
Originator: (Rob Warnock)

russell kym horsell <> wrote:
| kuangpma <> wrote:
| > I am learning compiler theory, several books teach how to write (even
| > with source code) a Recursive Decent parser but not Shift Reduce
| > parser, only tell the theroy behind it. Particularly the Louden book,
| > it says that handcrafting a Shift Reduce parser is very difficult so
| > it suggests to use tools such like yacc to generate the parser.
| There are various shift/reduce stategies. Most modern texts probably
| assume or are locked onto LR(k) or LALR as the only S/R method around.
| Some old stuff is fairly easy to understand. E.g. the precedence
| grammar idea or -- even more pedagogic -- operator precedence.
| Usually S/R parsers have a fixed "engine" that is driven by tables.
| Making the tables compact takes up a lot of the intelligibility bandwidth.
| But some very old co-no ("current operator -- next operator") parsers(*)
| were nothing more than a fancy 2-d switch statement.
| (*) Yes, John, I'm talking about compilers that took 80 column cards as input.
| [Yes, I used them too. It sounds like you're talking about operator
| precedence expression parsers, which are simpler than what we usually refer
| to as shift/reduce. -John]

One of my favorite hand-coded parser styles is the hybrid Recursive Descent/
Simple Operator Precedence parser used by the BLISS family of compilers,
which uses a variant of what Russell calls "co-no"; namely, a *four*-
lexeme[1] "window" [lexeme shift register] with the elements called
SYM (current symbol or oter value), DEL (current operator/delimiter),
FUTSYM (next SYM), and FUTDEL (next operator/delimiter). Since at this
level the grammar is Simple Operator Precedence, FUTSYM might be empty
but DEL nor FUTDEL could be [that is, there must always be at least one
delimiter between any two "symbols" (values)].

The basic lexical reader is called RUND (Read Until Next Delimiter).
A simplified version in Scheme [cribbed from some old code of mine
so apologies for any incompleteness/inconsistencies]:

        (define (rund) ; Operates on globals, ugh!
            (set! oldsym sym) ; Slide the window
            (set! olddel del)
            (set! del (read-token))
            (if (token-is-del? del)
                (set! sym null-token) ; Empty value between delims
                    (set! sym del)
                    (set! del (read-token))
                    (unless (token-is-del? del)
                        (error "Two adjacent non-delimeters!")))))

and the parse code for (say) binary infix operators looks
something like this:

        (define (expr-infix)
            (let ((mysym sym) ; My left operand
(myprio (del-prio del)) ; Me
(mycode (del-code del)))
(rund) ; Step forward
(do ()
((<= (del-prio del) myprio) ; "<=" causes left-associativity
(make-graph-lexeme mycode mysym sym))
(set! sym ((del-reduce-function del))))))

and the generic expression parser is:

        (define (expression)
            (do ()
((<= (del-prio del) prio-stopper) ; Right-paren, semicolon, etc.
(set! sym ((del-reduce-function del)))))


The "trick" to hybridizing this with recursive descent is to assign
*very* high operator priorities to the start tokens of control expressions
[e.g., "keywords" such as BEGIN, IF, WHILE, etc.] and *very* low priorities
to the ends of expressions ["stoppers" such as END, ELSE, FI, right-paren,
right-bracket, comma, semicolon, &c]. This causes the simple operator
precedence parser to quite naturally behave as a recursive descent parser
when it encounters any of those constructs. ;-}


[1] In the BLISS parsers, a lexeme is either a terminal valued token
        [literal, variable, etc.] or the result of a reduce operation
        representing a value (in which case it is called a "graph table lexeme").

Rob Warnock <>
627 26th Avenue <URL:>
San Mateo, CA 94403 (650)572-2607

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.