Re:Some questions about recursive descent?

"Paul Robinson" <xdpascal@xdpascal.com>
5 Mar 2022 14:46:08 -0600

          From comp.compilers

Related articles
[3 earlier articles]
Re: Some questions about recursive descent? anton@mips.complang.tuwien.ac.at (2022-02-28)
Re: Some questions about recursive descent? alain@universite-de-strasbourg.fr.invalid (Alain Ketterlin) (2022-02-28)
Re: Some questions about recursive descent? johann@myrkraverk.invalid (Johann 'Myrkraverk' Oskarsson) (2022-03-01)
Re: Some questions about recursive descent? anton@mips.complang.tuwien.ac.at (2022-03-01)
Re: Some questions about recursive descent? alain@universite-de-strasbourg.fr (Alain Ketterlin) (2022-03-01)
Re: Some questions about recursive descent? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-03-02)
Re:Some questions about recursive descent? xdpascal@xdpascal.com (Paul Robinson) (2022-03-05)
| List of all articles for this month |

From: "Paul Robinson" <xdpascal@xdpascal.com>
Newsgroups: comp.compilers
Date: 5 Mar 2022 14:46:08 -0600
Organization: Compilers Central
References: 22-02-021 22-02-024
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="17429"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 05 Mar 2022 16:53:47 EST

On 27 Feb 2022 16:37:05 EST, Johann 'Myrkraverk' Oskarsson
<johann@myrkraverk.com> asked in Newsgroup comp.compilers:


| I have some questions about the construction of
| recursive descent parsers. The reason for my questions
| is that I know production compilers /use/ recursive
| descent. My first question is: how are production
| recursive descent parsers constructed?


If you want to see another example of this, since I added features to
it, is the XDPascal compiler, available from
https://github.com/electric-socket/xdpw or https://XDpascal.com. Now,
what I'm going to do is explain why and how this is done.


Pascal syntax requires that, after a procedure or function signature,
and any constants, types or variables are declared, it must have a
block. This is a BEGIN, zero or more statements, each separated by a
semicolon, then an optional semicolon, then END and a semicolon if it's
a procedure or function, a period if it's the end of a program. A syntax
diagram may help here:


Block: ^______________>|
                        | V
--->BEGIN---+-->Statement---+-->V--------->^----> END --->
                        ^_______ ; <____V |__> ;_>__|


In the code example below, any identifier ending
in TOK is the token for that symbol or keyword,
e.g. DOTOK for DO, DOTTOK for period, etc.


So at the end of the compile, the compiler would
have a piece of code similar to this:


    block;
    nexttoken;
    if thistoken<>DOTTOK then error('Period expected');
    finishcompiling;


    Procedure "block" would have something similar to
    the following:


    repeat
              statement;
    until thistoken = ENDTOK or END_OF_FILE ;




Procedure "Statement" would look like this:


    nextToken;
      Case thisToken of
            Identifier : Identstmt;
            FORTOK: ForStmt
            REPEATTOK: RepeatStmt;
            BEGINTOK: Block; // [1]
            ...
            WHILETOK: WhileStmt
        ELSE
              Error('Syntax error');




FOR statement:
      ----> FOR --> := --> ident ------> expression -->|
      |<-----------------------------------------------V
      V--V------- TO ----->----> expression --> DO --|
            V____> DOWNTO -->| |
                                                                                                    |
      |<---------------------------------------------V
      V--------- statement ----------------->


Now, procedure ForStmt might be coded like this:


        nexttoken;
        if thistok <> BECOMESTOK then
            error(' := expected');
        nexttoken;
        if thistok <> identTok then
              error(' Identifier expected ');
        ...


and so on until:


        nexttoken;
        if thistok <> DOTOK then
            error(' DO expected');
        nexttoken;
        statement;


Now, if you look, at // [1], the case selector for
the token BEGIN calls procedure block again, while
it itself was called by block earlier. And, after the
DO token in FOR, statement calls itself.


If you notice, recursive descent works because each
statement in the program is processed until a "trigger"
token exits that state back to where it was called.
On return, syntax checking continues with the next
token after the one that caused a block or statement
procedure to be called.


I hope this example helps you to understand recursive
descent parsing.




XDPascal.com: The (new) home of the XDPascal Compiler.
----
"The lessons of history teach us - if they teach us anything - that no
one learns the lessons that history teaches us."
Paul Robinson <XDpascal@XDPascal.com> / <paul@paul-robinson.us>


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.