Re: Parser for C++ implemented in Java

Theodore Norvell <theo@engr.mun.ca>
12 Jan 2003 17:39:11 -0500

          From comp.compilers

Related articles
Parser for C++ implemented in Java davidp@imec.be (2003-01-07)
Re: Parser for C++ implemented in Java theo@engr.mun.ca (Theodore Norvell) (2003-01-12)
| List of all articles for this month |

From: Theodore Norvell <theo@engr.mun.ca>
Newsgroups: comp.compilers,comp.compilers.tools.javacc
Date: 12 Jan 2003 17:39:11 -0500
Organization: Memorial University of Newfoundland
References: 03-01-037
Keywords: C++, parse, Java
Posted-Date: 12 Jan 2003 17:39:11 EST

Patrick wrote:
> Does anybody has experience on parsing large C++ source code with
> javacc or another java based parser ? After reading the previous
> posts, it seems to be quite tricky, any information would be greatly
> appreciated.
>
> Patrick


I've been working on a parser for a C++ subset, using JavaCC. Yes it is tricky.
Very tricky. I plan to put the parser in the public domain once I'm happy
with it, but that isn't quite yet. Here are some issues that make
it hard:


        Interaction with the symbol table. How you treat identifiers depends
            on whether they are declared as types or not. In some cases this
            requires peeking ahead in the token stream in order to make the
            decision. Consider
                        a::b::c::d
            whether you treat this as an type name depends on the declaration of d.
            JavaCC's semantic look-ahead and the ability to peek ahead in the token
            stream make this possible.
        Distinguishing declarations from function definitions. At first I tried
            doing this by looking ahead for a comma or semicolon. This turned
            out not to work when a class specification appears as a decl_specifier,
            so I ended up combining the nonterminal for function definitions
            with that for simple declarations.
        Declaration before use. Mostly C++ has declaration before use. But
            it doesn't (entirely) within classes. Consider
                    class A {
                              int foo() { T (i) ; i = 0 ; return i ; }
                              typedef int T ;
                    } ;
              Is T(i) a function call or a variable declaration?
              My solution (not implemented yet) is to delay parsing of the function bodies
              until the end of the class specification. JavaCC should make this fairly
              easy (using a custom token manager), but I haven't done it yet.
        Templates. I'm not implementing templates. But if I were, I'd want to delay
                parsing until the template is instantiated. Again JavaCC should make this
                possible. To make this work you also need to design the symbol table so it
                can be backed up to the right place to provide the right context for the parse.
        Telling when the decl-specifiers stop.
                A simple declaration in C++ is of the form
                      (decl_specifier)* (init_declarator)* ";"
                In some cases it can be hard to tell when to jump out of the
                first loop. This sounds easy, but there are some subtle issues.


JavaCC's flexibility is very useful in dealing with some of these issues.
But it is clear that C++ evolved in an environment where parsing was mostly
bottom-up. So I've sometimes thought that the job might be easier with
a bottom-up parser generator such as CUP. Maybe this is just the grass
seeming greener on the other side of the hill.


If you don't need all the accuracy that you need for a compiler (say for computing
metrics or something) then there are a lot of shortcuts you can take. To do an
accurate parse for full ISO C++, it is a considerable investment of effort.


Cheers,
Theodore Norvell


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.