Re: Building a parse tree that reflects C Semantics

"Rodney M. Bates" <rbates@southwind.net>
20 Oct 2002 22:53:39 -0400

          From comp.compilers

Related articles
Building a parse tree that reflects C Semantics vbdis@aol.com (VBDis) (2002-10-13)
Re: Building a parse tree that reflects C Semantics grosch@cocolab.de (Josef Grosch) (2002-10-18)
Re: Building a parse tree that reflects C Semantics idbaxter@semdesigns.com (Ira Baxter) (2002-10-20)
Re: Building a parse tree that reflects C Semantics whopkins@alpha2.csd.uwm.edu (Mark) (2002-10-20)
Re: Building a parse tree that reflects C Semantics rbates@southwind.net (Rodney M. Bates) (2002-10-20)
Re: Building a parse tree that reflects C Semantics david.thompson1@worldnet.att.net (David Thompson) (2002-10-25)
Re: Building a parse tree that reflects C Semantics rbates@southwind.net (Rodney M. Bates) (2002-11-06)
Re: Building a parse tree that reflects C Semantics idbaxter@semdesigns.com (Ira Baxter) (2002-11-07)
| List of all articles for this month |

From: "Rodney M. Bates" <rbates@southwind.net>
Newsgroups: comp.compilers
Date: 20 Oct 2002 22:53:39 -0400
Organization: EarthLink Inc. -- http://www.EarthLink.net
References: 02-10-058
Keywords: C, parse, semantics
Posted-Date: 20 Oct 2002 22:53:39 EDT

Josef Grosch wrote:
>
> > How to create an parse tree in general?
>
> I would build an abstract syntax tree that reflects the semantics of a
> language and not its concrete syntax.


I once did this for C (using an early version of Josef Grosch's
scanner, parser, and AST generators), and did more or less what he
recommends, i.e. unravelling the convoluted surface syntax into an
abstract syntax that better reflects the meaning. Here is my way of
thinking about what this unravelling amounts to.


In most languages, a programmer-defined type is built by what is
sometimes called a "type constructor", that builds a new type from
some set of already-known constituents, usually types and constants.
C's declarators are upside down in that their syntax looks like they
are instead deconstructing a previously known constituent type from
the new type they are defining. Whether this is really upside down is
a value judgement (though there seems to be good evidence that it is
counterintuitive).


More fundamentally, the deconstruction idea simply can't work for any
type that has more than one constituent, because a tree node can have
only one parent, and the constituent of a declarator-constructed type
is an ancestor, not a descendent, in the syntax. Only pointer declarators
follow this deconstruction syntax consistently. A function declarator is
upside down with respect to its result type but right side up WRT its
formal parameters. An array declarator is upside down WRT its element
type, but right side up WRT its bounds. Structs, unions, and enumerations
all have a variable number of constituents, none of them particularly
distinct, so they are built with type specifiers, which are entirely
right side up.


So, I agree with Josef that it is best to define an abstract syntax
where all type constructors have all their constituents as children,
not ancestors. As I recall, building the AST was somewhat confusing,
but this approach gets all the confusion out of the way early. Then
later stages of whatever you are writing will be protected from it.


I could provide details of my implementation, but it uses LALR parsing,
so I think you would probably need to do it somewhat differently.


--
Rodney M. Bates
rodney.bates@cs.twsu.edu


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.