Re: Parsing newbie -- recursion question (probably simple)

Hans-Peter Diettrich <DrDiettrich@compuserve.de>
13 Nov 2005 22:04:53 -0500

          From comp.compilers

Related articles
Parsing newbie -- recursion question (probably simple) sonicsmooth@gmail.com (2005-11-12)
Re: Parsing newbie -- recursion question (probably simple) wyrmwif@tsoft.org (SM Ryan) (2005-11-13)
Re: Parsing newbie -- recursion question (probably simple) DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-11-13)
Re: Parsing newbie -- recursion question (probably simple) desw@sussex.ac.uk (Des Watson) (2005-11-13)
Re: Parsing newbie -- recursion question (probably simple) 63q2o4i02@sneakemail.com (2005-11-15)
Re: Parsing newbie -- recursion question (probably simple) cfc@shell01.TheWorld.com (Chris F Clark) (2005-11-15)
Re: Parsing newbie -- recursion question (probably simple) nicola.musatti@gmail.com (Nicola Musatti) (2005-11-15)
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich@compuserve.de>
Newsgroups: comp.compilers
Date: 13 Nov 2005 22:04:53 -0500
Organization: Compilers Central
References: 05-11-059
Keywords: parse,
Posted-Date: 13 Nov 2005 22:04:53 EST

sonicsmooth@gmail.com wrote:


> I'm having trouble getting my head wrapped around recursion. If I have
> the following rule to generate a list of nodes:
>
> nodelist := nodelist node | node


This syntax is appropriate for LR (bottum up) parsers, but not for
handwritten LL (top down) parsers.


> I don't understand how to keep from getting caught in an infinite
> recursive loop. The only way out is to find a terminal node, but
> I'm not sure how to do that if the first thing in the rule is to
> check another nodelist.


Perhaps you should use something like EBNF notation, to get rid of the
recursion:
    nodelist ::= node { node }. or
    nodelist ::= { node }+. #1 to n occurences of node


Now it's easier to write an LL(1) recursive descent parser, using loop
statements wherever appropriate. Useful additional syntax elements are:
  term1 | term2 #alternative, either term1 or term2 (or...)
  [ term ]   { term } #any number [0..n] of occurences
and possibly another extension for [1..n] occurences, as indicated
above.
"term" here stands for one or more (sequential) terminals or
nonterminals.




> Should I check for the terminal first?


That's the usual way, to find out which rule(s) may apply at all. If
exactly one rule applies, you know immediately how to proceed.


> Should I
> carry around a flag someplace that says where I am in the rule? I'd
> like to avoid explicit stacks and just use the function return stack as
> my stack.


In an recursive descent parser a rule corresponds to one subroutine,
where everything on the right hand side is handled (sequentially) in the
subroutine code. A terminal must match the actual token, a nonterminal
becomes a call of the according subroutine.


> I also want to avoid an official parser generator because a)
> I'm doing this in python, b) SPICE syntax is simple enough that this
> shouldn't be toooo hard, and c) I want to learn it this way.


(c)? What do you want to learn this way? How not to write parsers? ;-)


IMO you should:
a) Learn what's a recursive descent (top down) parser; that's almost the
only useful model for hand written parsers.
b) Rewrite your grammar, using an appropriate notation.
c) Learn how to construct and use FIRST and FOLLOW sets.


Most of this can be understood and applied intuitively, without
bothering much with theory.




Sorry, I don't speak Python, but the subroutine for your example should
read like this:


def nodelist:
    if CurrentToken in FirstOfNode:
    #above check should have been made before calling nodelist at all!
        nodelist := node() #start construction of nodelist
        while CurrentToken in FirstOfNode:
            nodelist := nodelist + node() #append to nodelist
        return nodelist
    else if CurrentToken in FirstOfSomethingElse:
        SomethingElse() #doesn't apply in this example
    else:
        return nothing #signal nothing matched


Here CurrentToken is a global variable, holding the actual token.
Whenever a token is matched, CurrentToken receives the next token from
the input stream.


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.