Re: language twiddling, was Infinite look ahead required by C++?

Robert A Duff <bobduff@shell01.TheWorld.com>
Fri, 05 Mar 2010 17:57:24 -0500

          From comp.compilers

Related articles
Infinite look ahead required by C++? ng2010@att.invalid (ng2010) (2010-02-05)
Re: Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-09)
Re: Infinite look ahead required by C++? ng2010@att.net (ng2010) (2010-02-23)
Re: Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-27)
Re: Infinite look ahead required by C++? bartc@freeuk.com (bartc) (2010-02-28)
Re: language twiddling, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-03-01)
Re: language twiddling, was Infinite look ahead required by C++? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-03-03)
Re: language twiddling, was Infinite look ahead required by C++? bobduff@shell01.TheWorld.com (Robert A Duff) (2010-03-05)
Re: language twiddling, was Infinite look ahead required by C++? bobduff@shell01.TheWorld.com (Robert A Duff) (2010-03-05)
Re: language twiddling, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-03-07)
Re: language twiddling, was Infinite look ahead required by C++? bartc@freeuk.com (bartc) (2010-03-08)
Re: language twiddling, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-03-10)
Re: language twiddling, was Infinite look ahead required by C++? bobduff@shell01.TheWorld.com (Robert A Duff) (2010-03-12)
Re: language twiddling, was Infinite look ahead required by C++? nevillednz@gmail.com (Neville Dempsey) (2010-03-14)
[4 later articles]
| List of all articles for this month |

From: Robert A Duff <bobduff@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Fri, 05 Mar 2010 17:57:24 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 10-02-024 10-02-039 10-02-086 10-02-088 10-03-003 10-03-005
Keywords: parse, design
Posted-Date: 05 Mar 2010 23:30:40 EST

Chris F Clark <cfc@shell01.TheWorld.com> writes:


> The basic idea of reserved word (i.e. a keyword you can't use for
> anything else) specifying a declaration followed by a list of
> identifiers being declared is a sound one and is used in the languages
> leading upto C. Using list of specifiers in declarations works ok
> also, as in "static int x". However, if you allow user defined
> identifiers in those locations, then you need to follow C's method of
> declaring those identifiers before their use and making the lexer turn
> them into special tokens, or you need some other special syntax to key
> off of. Adding a new keyword the introduces variable declarations
> will not be sufficient, not unless you are going to severely restrict
> the syntax of the declarations.


Yeah, it really is a bad language design that forces the lexer and/or
parser to depend on the output of semantic analysis. It really wrecks
the nice phase structure of the compiler.


The solution need not involve reserved words, though.
For example, Ada's variable declaration syntax:


        X : Integer := 123;


does not start with a reserved word (like Pascal's "var"), but
is easy to parse (by compilers and by humans). The colon makes
it easy. And I like it, because the most important thing about
a declaration is its name, so that should come first (not some noise
word like "var"), followed by what sort of a thing it is (it's
"type").


> The reason I recommended the person consider Pascal is that it has all
> those problems solved and solved simply.


Well, Pascal has some syntactic problems, too. The dangling else
comes to mind. There's no excuse for designing a new language
with the dangling else problem. See Ada for a good solution.


The Ada grammar has some syntactic ambiguities, too, but they are
typically not solved in the typical C way (feedback into the
lexer). Instead, the parser builds a tree for "X(Y)" that says
"this might be a function call, or a type conversion, or
an array indexing", and then lets the semantic analysis phase
sort it out.


>...C is not a particularly easy
> language to parse. If you go trying to extend C, it is very easy to
> make the language completely unparseable. A few tiny bandaids on C
> don't resolve those issues.


Indeed.


>> As BGB has said a few times, parsing is nothing compared with some
>> of the other bits of a compiler.
>
> Having built all the pieces of a compiler several times, I can say
> that it is generally true, especially if you are working with a
> well-defined syntax.


Sure. If the language has an interesting type system, then
semantic analysis will certainly be more complicated than
the parser. And if the compiler does any serious optimization,
that's where most of the complexity will be.


>...However, syntax design of your own language is
> actually quite hard to do well, easily on par with the complexity of
> any part of a compiler.


Hmm. I don't find syntax design to be particularly hard.
Easier than type system design, or run-time semantics design.


>...People make the mistake of the ease of
> writing a parser for a well-defined syntax with the difficulty of
> coming up with a well-defined syntax in the first place. This is
> partially why most languages are syntactically similar to other
> existing languages.


Partially, maybe. I think the main reason is people like what
they're used to. If a language designer knows (only) languages
in the C family, they will come up with a syntax containing
lots of curly braces, even if they're not trying to be
compatible with any particular language.


>...People can add an operator or two and a new
> keyword or two, but actually changing the shape of a language (and
> making it work) is exceptionally difficult to do.


Well, yeah, language design is hard. I guess I'd say nobody has
done a really excellent job of it, yet.


- Bob



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.