Representing C as an AST

trzynadl@unr.nevada.edu (Bart T.)
25 Jan 2003 01:11:02 -0500

          From comp.compilers

Related articles
Representing C as an AST trzynadl@unr.nevada.edu (2003-01-25)
| List of all articles for this month |

From: trzynadl@unr.nevada.edu (Bart T.)
Newsgroups: comp.compilers
Date: 25 Jan 2003 01:11:02 -0500
Organization: http://groups.google.com/
Keywords: C, parse, analysis
Posted-Date: 25 Jan 2003 01:11:01 EST

Hello,


I've been thinking about how to represent C programs as trees. The
original approach I settled on was to have 2 children, left and right,
and to chain together statements using an stmt node:


(stmt (stmt (stmt statement_1 statement_2) statement_3))


When dealing with if-else statements, loops, and functional calls,
more children are needed in the node to naturally represent the
construct.


I took a look at how other C compilers and related programs represent
trees:


LCC: Trees are build for statements only, it appears, and these trees
are linked together to create a "forest." This is similar to how older
versions of GCC worked: Expression trees were built for each statement
and then converted into RTL code. This happened one statement at a
time. I could be wrong about the details.


SUIF: (http://suif.stanford.edu/suif/suif1/docs/suif_toc.html) A brief
read of this seems to indicate a really complicated system of trees
and lists and symbol tables all linked together to form the actual
abstract syntax tree.


C-Tree: Now called cTool, a SourceForge project. If-statements and for
loops are handled as special case nodes with more than 2 children.
Every node has common fields, the different types of nodes merely have
a different number of children. I'm not sure how function calls and
types are handled.


CAST: C Abstract Syntax Tree:
http://www.cs.utah.edu/flux/flick/current/doc/guts/gutsch6.html
Something complicated, like SUIF, as far as I can tell. I think it
also involves lists (parameters passed to functions appear to be
handled as lists.)




I think function calls could be represented as nodes which use a
symbol pointer to indicate the name of the function and all of the
children would represent individual arguments.


Foo(a, b, c * d);


FUNC_CALL (Foo)
/ | \
a b *
/ \
c d


[ In case the above doesn't turn out formatted properly: (FUNC_CALL a
b (* c d)) -- the Foo function name is part of the FUNC_CALL node, not
a child. ]


This makes sense to me, and it seems as though it would be very easy
to perform analysis on such a tree and to generate code from it.


BTW, for types, I'm choosing to use a system similar to LCC: A special
structure which can be linked together to form complete type
definitions. I'm not sure whether to have DECL nodes in the AST (these
would simply identify the variable declared and what it was
initialized to [if anything], the type info would be in the symbol
table.)


Any thoughts? My implementation of n-ary trees in the past was really
ugly (involving an inelegant way of constructing lists.) I think this
could be overcome and a more simple approach could be taken.


Implementation difficulties aside, what would an ideal C syntax tree
look like? Allowing it to be used for other languages would be a plus,
but probably not something I should concern myself with now (this is
my first real compiler project.)


Thanks in advance...


(If you wish to reply by email, reply to trzynadl AT unr DOT nevada
DOT edu -- the dynarec.com address is no longer valid, I'll have to
register a new Google Groups account.)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.