Re: Object-oriented compiler construction: ideas?

Arch Robison <>
1 Jul 1996 22:38:10 -0400

          From comp.compilers

Related articles
[14 earlier articles]
Re: Object-oriented compiler construction: ideas? (Jerry Leichter) (1996-06-27)
Re: Object-oriented compiler construction: ideas? jgm@CS.Cornell.EDU (Greg Morrisett) (1996-06-27)
Re: Object-oriented compiler construction: ideas? (1996-06-30)
Re: Object-oriented compiler construction: ideas? (1996-06-30)
Re: Object-oriented compiler construction: ideas? (1996-06-30)
Re: Object-oriented compiler construction: ideas? (1996-07-01)
Re: Object-oriented compiler construction: ideas? (Arch Robison) (1996-07-01)
Re: Object-oriented compiler construction: ideas? (1996-07-02)
Re: Object-oriented compiler construction: ideas? (1996-07-03)
| List of all articles for this month |

From: Arch Robison <>
Newsgroups: comp.compilers
Date: 1 Jul 1996 22:38:10 -0400
Organization: Kuck & Associates, Inc.
References: 96-06-047 96-06-064 96-06-117 96-06-134
Keywords: OOP

Chris Clark ( writes:
> Arch Robison presented a very good case for iterators to solve just
> these problems at the 94 OOPSLA workshop of Object Oriented
> Compilation techniques. His paper and others can be found at

Thanks for citation, Chris. Two more years of experience of using iterators
for traversing an IR has convinced me that iterators are the way to go.
I'll summarize some of the arguments here. By iterator, I mean
"iterator with cursor". That is, something that you can write
C-style loops in such as:

for( iterator i = root; !i; i++ )
inspect( *i )

even thought the traversal may be over a tree. I'll take "visitor"
to mean a procedure that apply's some visitor function to each node
of an IR. By "explicit recursive traversal", I mean writing this
sort of thing:

function inspect( node ) {
if( node has left child )
inspect( left child of node );
if( node has right child )
inspect( right child of node );

Here's some issues that favor iterators.

1. Reuse - explicit recursive traversal is not reusable. You have to rewrite
      the same recursions each time. In a complicated IR, this can
      be painful each time. Visitors win here too.
      I suppose attribute grammars win even more since you don't have
      to write traversal code. On ther other hand, the last time I wrote
      attribute grammars, I remember writing "auxiliary" attributes to
      write functionally what would have been trivial to write procedurally.
      (This might be the "grass is greener on the other side" syndrome.
      I can't remember the examples any more.)

2. Control over iteration order - with an iterator, this problem is trivially
      solved by passing "mode flags" to the iterator's constructor.
      I suppose that mode flags solve the order problem with visitors too.

3. Convenience - this is where visitors start to lose. With iterators,
      I can write a triply nested loop without difficulty in a few lines.
      With visitors, I have to write a bunch of functions, one for each
      level of nesting. (Okay, if you have lambda abstractions in your
      language, this is not a problem. But I have to write my stuff in C++.)
      The result is that an optimizer can look more like good ole FORTRAN
      and less like LISP :-)

4. Computational complexity - this is where recursive traversals and attribute
      grammars lose. What is the computational complexity of a traversal?
      If I write:

for( iterator i = root; !i; i++ )
for( iterator j = root; !j; j++ )
for( iterator k = root; !k; k++ )
inspect( *i, *j, *k )

      I know I have cubic complexity, assuming (amortized) constant time
      for the iterator primitives. With a recursive traversal, I have to
      study the recursions very carefully to figure out the complexity.
      And in a real production compiler, these recursions are going to be
      spread out over *pages* of code, not a few lines of code.
      With attribute grammars, I probably have to have the attribute-grammar
      compiler tell me how clever it's been in compiling my grammar.
      With iterators, I simply look at nesting. In production compilers,
      knowing the computational complexity is essential.

So why isn't everyone using iterators? There are some drawbacks:

5. You need a language that has enough polymorphism to let you conveniently
      write and use a reusable iterator. Sure, you could do it in C, but the
      attendant hassles from lack of polymorphism probably make them a pain.

6. Certain kinds of information passing between IR nodes is more suited
      to recursive traversal. Visitors are probably even worse with regard
      to these problems. (How do "visits" communicate between themselves?)
      For this reason, we're not religious about iterators here. Our optimizer
      uses both iterators and recursive traversal. But probably 4 out of 5
      traversals use iterators for the simple reason that writing a for-loop is
      much simpler than writing a recursion.

      In some cases, iterators force using a different algorithm than would
      have been used for explicit recursive traversal. To a pure theoretician,
      picking an algorithm based on this difference might seem to be putting
      the cart before the horse, but in a production optimizer it becomes
      one of many engineering choices based on pragmatics.
      And the iterators are definitely easier to maintain.
      Explicit recursive traversals seem to involve adding more and more
      parameters to the recursive function over time.

Changing topics here. As others have noted, there is a tension between
the OO and procedural organization of a compiler.
As Greg Morrisett put it:

> > What we need are languages that allow programmers to view the organization
> > or their programs in different ways in order to support different kinds
> > of modifications. Ideally, when adding a new kind of primitive to the
> > AST, the system would present the OO version, so that all I have to
> > add is one new definition, with appropriate methods. But when I go to
> > add a new transformation or optimization, I'd rather see the "old-
> > fashioned" view of my program where I code one big procedure that
> > switches on the different kinds of AST objects. This way, I don't have
> > to touch all of the classes to add new methods.

I find that with experience, I can organize a compiler to have the better of
both worlds, though not the best. (I'd need the aforementioned multiview
language for that.) The trick is to write procedural transformations and
optimizations, but avoid having them operate on concrete AST nodes. For
example, instead of a pass operating on "add", "subtract", "multiply", and
"shift", nodes, it might operate on "referentially transparent nodes".
Building inclusion hierarchies of properties is useful -- they allow some
passes to paint with a broader brush. I.e., "no references to memory" is a
subproperty of "no writes to memory. (Don't need those pointy-headed OO
class hierarchies for this - look-up tables suffice.) So even though
transforms are written procedurally, adding a new kind of referentially
transparent node is not a problem. Finding the right set of abstract
properties takes time. There are some in our optimizer that are motivated
by no higher principle than "Gee, this property is needed here and here and
here!" Of course, to add a new kind of node with radically different
properties not anticipated in the original design requires much effort. But
such trouble is probably inescapable no matter how the compiler is
organized, because it probably invalidates fundamental assumptions of the

This the sort of practical practice that gets one excommunicated by
the religious purists on both sides of the OOP wars.

Arch D. Robison Kuck & Associates Inc. 1906 Fox Drive
217-356-2288 Champaign IL 61820
Lead Developer for KAI C++

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.