Include files and yacc grammars

jao@red6.acpub.duke.edu (JOSEPH OSWALD)
Wed, 15 Apr 1992 02:27:05 GMT

          From comp.compilers

Related articles
Include files and yacc grammars jao@red6.acpub.duke.edu (1992-04-15)
Include files and yacc grammars smk@dcs.edinburgh.ac.uk (1992-04-16)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jao@red6.acpub.duke.edu (JOSEPH OSWALD)
Keywords: design, question
Organization: Compilers Central
Date: Wed, 15 Apr 1992 02:27:05 GMT

I'm creating a simple interpreter for a Pascal/C-like language.


One of the important features that I want the language to have is an
include facility. However, since this language is going to be used by
beginners, I want the include directive appear like a statement, like so:


    include("a.init"); /* contains statement that initializes a */
    print ("Variable a is ", a);


rather than some exceptional notation like #include that might be
confusing.


I would like for it to be an ordinary statement in the grammar:


    statement: IDENTIFIER '(' args ')' ';' /* procedure call */
| INCLUDE '(' arg ')' ';'
                              { tell lexer to insert file "arg" into the token stream. }


| IF condition ......
;


The problem with this is that I can't figure out how to make sure that the
include file's contents properly synchronize with the grammar. What do I
do with the lookahead token (if there is one??), since it belongs to the
main file, and I want to start parsing the include file.


(Note. I am very willing to restrict the include file's contents to be
complete declarations and statements, as opposed to code fragments.)


The solution I'm using is for the lexical analyzer to recognize the sequence
    INCLUDE '(' QUOTEDSTRING ')' ';'


    and replace it with the following:


    BEGININC ...tokenized input file contents... ENDINC ...rest of main file..


    where BEGININC and ENDINC are marker tokens.


In order that the include be recognized only when it is a proper
statement, my grammar is set up thus:


actionlist: /* empty */
| BEGININC actionlist ENDINC
| action
| actionlist action
;


action: declaration
| statement


(and the include statements is NOT part of the grammar.)


This works (so far as I can tell.), but requires that the lexical analyzer
do some parsing to recognize the include.


Is there a better way; i.e., one that permits me to put it in the grammar?


(One thing that worries me about my approach, other than its kludginess is
handling an error parsing the include statement, like missing the final
semicolon...)


--Thanks
Joe Oswald
[I suppose you could muck around in the parser and try to save and restore
the lookahead token, but handling the include feature in the lexer is
probably the best way to go. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.