Re: Infinite look ahead required by C++?

Stephen Horne <sh006d3592@blueyonder.co.uk>
Tue, 09 Feb 2010 22:14:13 +0000

          From comp.compilers

Related articles
Infinite look ahead required by C++? ng2010@att.invalid (ng2010) (2010-02-05)
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-06)
Re: Infinite look ahead required by C++? idbaxter@semdesigns.com (Ira Baxter) (2010-02-06)
Re: Infinite look ahead required by C++? thurston@complang.org (Adrian Thurston) (2010-02-08)
Re: Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-09)
Re: Infinite look ahead required by C++? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-02-10)
Re: Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-10)
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-10)
Re: Infinite look ahead required by C++? martin@gkc.org.uk (Martin Ward) (2010-02-11)
Re: Infinite look ahead required by C++? idbaxter@semdesigns.com (Ira Baxter) (2010-02-13)
Re: Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-14)
[19 later articles]
| List of all articles for this month |

From: Stephen Horne <sh006d3592@blueyonder.co.uk>
Newsgroups: comp.compilers
Date: Tue, 09 Feb 2010 22:14:13 +0000
Organization: virginmedia.com
References: 10-02-024
Keywords: C++
Posted-Date: 10 Feb 2010 11:04:42 EST

On Fri, 5 Feb 2010 22:27:54 -0600, "ng2010" <ng2010@att.invalid>
wrote:


>What elements of C++ make it so hard to parse? Is it a weakness of
>compiler designs rather than a weakness of the language design? I've read
>somewhere that the language requires potentially infinite look ahead.
>Why? And how do compilers handle it?
>[It's ambiguous syntax. Others can doubtless fill in the details. -John]


Most C and C++ compilers seem to use a hand-crafted mix of recursive
descent and precedence parsing. The main reason for that is because
that's what the language designers used from the start, and therefore
the language is most easily parsed using that approach.


Consider a typical C++ variable declaration...


    int x;


No problem there. But now, let's assume that we want a variable of
some struct type.


    mystruct x;


The fact that "mystruct" identifies a type is significant - it is how
this is recognised as a variable declaration. But how does the parser
*know* that "mystruct" is a type at all?


The answer is that semantic analysis of earlier parts of the source
code was done concurrently with the parsing. The grammar is strictly
ambiguous. "mystruct" is, in normal parsing terms, just an identifier.


In short, to separate out parsing from semantic analysis, you would
have to accept incredibly ambiguous parser output and filter down the
options by semantic analysis.


C used to require that you write something like...


    struct mystruct x;


This neatly resolves the ambiguity issue, but also implies an
additional namespace, which is exactly what C used.


There are parser generators now that can handle C and C++. One example
is Kelbt.


http://www.complang.org/kelbt/


A partial C++ grammar is available as a proof of concept.


I've been using Kelbt for a while now - it is perfectly usable, though
with some build-time error-handling issues and lacking some obvious
basic features (precedence, associativity, syntax error recovery). The
resulting parsers are fine, but I've never needed any of the advanced
features - the reason being that the DSLs I've used it for have syntax
that is LR(1) by design.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.