Re: AST for several modules

Chris F Clark <>
14 Jun 2000 12:47:27 -0400

          From comp.compilers

Related articles
AST for several modules (Andreas Ames) (2000-06-10)
Re: AST for several modules (Ira D. Baxter) (2000-06-14)
Re: AST for several modules (Rodney M. Bates) (2000-06-14)
Re: AST for several modules (Chris F Clark) (2000-06-14)
Re: AST for several modules (2000-06-14)
Re: AST for several modules (Daniel C. Wang) (2000-06-20)
Re: AST for several modules (Andreas Ames) (2000-06-20)
Re: AST for several modules (Joachim Durchholz) (2000-06-20)
Re: AST for several modules (2000-07-18)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 14 Jun 2000 12:47:27 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 00-06-038
Keywords: parse

> Till now I only know about descriptions how to build ASTs for single
> modules for programming languages like C, assembler etc. I wonder if
> somebody could give me some hints about how to parse and build a
> single ( or several connected ) ASTs for several modules/files,
> e. g. for a whole program or software system. Do you know about
> books or online documents concerning this topic? If one can build a
> single AST from several modules, are then the normal techniques for
> data and control flow analysis efficently applicable?

If your question is how to build an AST for several modules (functions
or classes) in a single language, the answer is there is no trick to
it. Simply build AST's for each individual module and connect them
together in a linked list. Or in other words, treat your entire
program as if it was:
program: module+; // a program is one or more modules
// possibly from different files

However, if your question is how to design an AST for several
languages, that is an entirely different question. Part of the reason
you may not have found examples of what you are looking for is
terminology. The term AST is short for Abstract *Syntax* Tree. That
is ASTs tend to be abbreviations of the underlying syntax of the
language. Different languages, of course, have different syntaxes.

The internal representation that most compilers use for data and
control flow analysis typically goes by a different name. Usually it
is called the "Intermediate Language (IL)" or "Intermediate
Representation (IR)"--although there are probably as many variations
on the name as there are multi-language compilers (i.e. dozens ;-)).

The "Intermediate" representations don't usually look much like syntax
(parse) trees. That information is irrelevant to optimizers and code
generators (with the exception of the ones in Sharir style). Instead,
they represent contol flow as verticies and edges between "basic
blocks" (stretches of statements with no control flow transfer).

A typical example is Ucode. It started as pcode the IL for a set of
Pascal compilers. It was picked up and transformed into an optimizing
form by Fred Chow at Stanford University. The became the basis of the
MIPS compiler suite and the IL was extended to cover Fortran and C
(and probably some other languages).

It is worth noting the "bytecodes" and "virtual machines" (as in the
JVM) are closely related to IL's. So are the members of the Forth
family (such as Postscript). The other term worth tracking down is SSA
(Static Single Assignment form), as it is essentially an IL with some
nice properties.

As to books, I would recommend either Bob Morgan's "Building an
Optimizing Compiler" (ISBN 1-55558-179-X) or Steven Muchnick's
"Advanced Compiler Design & Implementation" (ISBN 1-55860-320-4).
These books are really about optimization algorithms. However, in
this case, the algorithms are what define the data structures
required, and thus the design of the IL.

Note, this was actually a timely question for me, as the next edition
of my "Practical Parsing Patterns" column in Sigplan Notices is on
this exact topic--although I haven't written more than a couple of
paragraphs of it yet, so I can't forward a preprint of it. Also,
anyone who want to disabuse me of my misconceptions (e.g. argue that
bytecodes or Forth are not like ILs at all) would do me a favor of
posting (or emailing) in response to this, lest the same errors get
set down in print. (The relevant caveat is: One man tells a lie, a
thousand others repeat it as true.)

Hope this helps,

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.