Re: How do I implement check_type() ?

Michiel Helvensteijn <m.helvensteijn@gmail.com>
Mon, 25 May 2009 01:44:00 -0700 (PDT)

          From comp.compilers

Related articles
How do I implement check_type() ? seescreen@gmail.com (2009-05-22)
Re: How do I implement check_type() ? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-05-25)
Re: How do I implement check_type() ? m.helvensteijn@gmail.com (Michiel Helvensteijn) (2009-05-25)
| List of all articles for this month |

From: Michiel Helvensteijn <m.helvensteijn@gmail.com>
Newsgroups: comp.compilers
Date: Mon, 25 May 2009 01:44:00 -0700 (PDT)
Organization: Compilers Central
References: 09-05-108
Keywords: C++, parse
Posted-Date: 25 May 2009 05:09:10 EDT

seescreen@gmail.com wrote:


> I am transforming the YACC grammar and LEX specification into a very
> simple C++ interpreter. I am only adding classes from C++ to the
> grammar referenced below. I parse all of the input source code file
> into an Abstract Syntax Tree.
>
> In the case of this project I need to return TYPE_NAME from the lexer
> whenever I encounter a Class name. The problem seems to be that only
> the parser knows whether or not the IDENTIFIER is a class name or not,
> and yet the lexer sees the name before it goes to the parser.
>
> What is the simplest way to get around this problem?


The lexer/parser stage may not be the right time to make that
distinction. The grammar should just specify IDENTIFIER, wherever you
now have TYPE_NAME. Then, in a later stage of compilation, you check
the symbol-table to see if that particular identifier was indeed a
type. To resolve resulting grammar ambiguities, you might consider
using a GLR parser, rather than an LALR parser. I'm not sure if YACC
supports them, but Bison does. The two largely compatible, as I
understand it, so you shouldn't have a steep learning curve.


There may be another way. Since C++ requires declarations textually
before any reference to them, you could write the parser to
dynamically build the symbol-table. If you make it global, or somehow
otherwise available to the lexer, the lexer could then check it to
return the right lexeme (TYPE_NAME instead of IDENTIFIER). It's called
a lexical tie-in, and that makes your grammar context-sensitive.


But that second way has some disadvantages:


* Context sensitive grammars are more complicated to understand. They
make error recovery more problematic, too. You're basically passing
information back and forth between the lexer and the parser, rather
than just one way.


* You'd have to squeeze a lot of code into that parser. It must now
build the AST and the symbol-table. I personally find it cleaner for
the semantic actions in the grammar to just build the AST, then use
the AST to build all the other stuff.


I'd personally go for the first option (and indeed, I did). But the
choice is yours.


Good luck!
--
Michiel Helvensteijn
[Given the syntactic ambiguties in C++, I'd also go with the GLR parser.
John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.