Re: Is that difficult to write a Shift-Reduce parser

rpw3@rpw3.org (Rob Warnock)
Sat, 01 May 2010 21:31:49 -0500

          From comp.compilers

Related articles
Is that difficult to write a Shift-Reduce parser kuangpma@gmail.com (kuangpma) (2010-04-29)
Re: Is that difficult to write a Shift-Reduce parser kym@sdf.lonestar.org (russell kym horsell) (2010-05-01)
Is that difficult to write a Shift-Reduce parser chakaram@auth.gr (Chariton Karamitas) (2010-05-02)
Re: Is that difficult to write a Shift-Reduce parser rpw3@rpw3.org (2010-05-01)
Re: Is that difficult to write a Shift-Reduce parser cfc@shell01.TheWorld.com (Chris F Clark) (2010-05-02)
Re: Is that difficult to write a Shift-Reduce parser dot@dotat.at (Tony Finch) (2010-05-04)
Re: Is that difficult to write a Shift-Reduce parser sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-05-07)
Re: Is that difficult to write a Shift-Reduce parser rpw3@rpw3.org (2010-05-09)
| List of all articles for this month |

From: rpw3@rpw3.org (Rob Warnock)
Newsgroups: comp.compilers
Date: Sat, 01 May 2010 21:31:49 -0500
Organization: Rob Warnock, Consulting Systems Architect
References: 10-04-072 10-05-003
Keywords: parse, history
Posted-Date: 02 May 2010 22:24:07 EDT
Originator: rpw3@rpw3.org (Rob Warnock)

russell kym horsell <kym@sdf.lonestar.org> wrote:
+---------------
| kuangpma <kuangpma@gmail.com> wrote:
| > I am learning compiler theory, several books teach how to write (even
| > with source code) a Recursive Decent parser but not Shift Reduce
| > parser, only tell the theroy behind it. Particularly the Louden book,
| > it says that handcrafting a Shift Reduce parser is very difficult so
| > it suggests to use tools such like yacc to generate the parser.
|
| There are various shift/reduce stategies. Most modern texts probably
| assume or are locked onto LR(k) or LALR as the only S/R method around.
|
| Some old stuff is fairly easy to understand. E.g. the precedence
| grammar idea or -- even more pedagogic -- operator precedence.
|
| Usually S/R parsers have a fixed "engine" that is driven by tables.
| Making the tables compact takes up a lot of the intelligibility bandwidth.
| But some very old co-no ("current operator -- next operator") parsers(*)
| were nothing more than a fancy 2-d switch statement.
....
| (*) Yes, John, I'm talking about compilers that took 80 column cards as input.
| [Yes, I used them too. It sounds like you're talking about operator
| precedence expression parsers, which are simpler than what we usually refer
| to as shift/reduce. -John]
+---------------


One of my favorite hand-coded parser styles is the hybrid Recursive Descent/
Simple Operator Precedence parser used by the BLISS family of compilers,
which uses a variant of what Russell calls "co-no"; namely, a *four*-
lexeme[1] "window" [lexeme shift register] with the elements called
SYM (current symbol or oter value), DEL (current operator/delimiter),
FUTSYM (next SYM), and FUTDEL (next operator/delimiter). Since at this
level the grammar is Simple Operator Precedence, FUTSYM might be empty
but DEL nor FUTDEL could be [that is, there must always be at least one
delimiter between any two "symbols" (values)].


The basic lexical reader is called RUND (Read Until Next Delimiter).
A simplified version in Scheme [cribbed from some old code of mine
so apologies for any incompleteness/inconsistencies]:


        (define (rund) ; Operates on globals, ugh!
            (set! oldsym sym) ; Slide the window
            (set! olddel del)
            (set! del (read-token))
            (if (token-is-del? del)
                (set! sym null-token) ; Empty value between delims
                (begin
                    (set! sym del)
                    (set! del (read-token))
                    (unless (token-is-del? del)
                        (error "Two adjacent non-delimeters!")))))


and the parse code for (say) binary infix operators looks
something like this:


        (define (expr-infix)
            (let ((mysym sym) ; My left operand
(myprio (del-prio del)) ; Me
(mycode (del-code del)))
(rund) ; Step forward
(do ()
((<= (del-prio del) myprio) ; "<=" causes left-associativity
(make-graph-lexeme mycode mysym sym))
(set! sym ((del-reduce-function del))))))


and the generic expression parser is:


        (define (expression)
            (do ()
((<= (del-prio del) prio-stopper) ; Right-paren, semicolon, etc.
sym)
(set! sym ((del-reduce-function del)))))


See


The "trick" to hybridizing this with recursive descent is to assign
*very* high operator priorities to the start tokens of control expressions
[e.g., "keywords" such as BEGIN, IF, WHILE, etc.] and *very* low priorities
to the ends of expressions ["stoppers" such as END, ELSE, FI, right-paren,
right-bracket, comma, semicolon, &c]. This causes the simple operator
precedence parser to quite naturally behave as a recursive descent parser
when it encounters any of those constructs. ;-}




-Rob


[1] In the BLISS parsers, a lexeme is either a terminal valued token
        [literal, variable, etc.] or the result of a reduce operation
        representing a value (in which case it is called a "graph table lexeme").


-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.