Re: [?] Trees vs. Tuples for IRs

Chris Clark USG <>
18 Sep 1998 23:02:07 -0400

          From comp.compilers

Related articles
[?] Trees vs. Tuples for IRs (Nick Shaffner) (1998-09-13)
Re: [?] Trees vs. Tuples for IRs (Chris Clark USG) (1998-09-18)
Re: [?] Trees vs. Tuples for IRs (1998-09-18)
Re: [?] Trees vs. Tuples for IRs (1998-09-19)
Re: [?] Trees vs. Tuples for IRs (Clifford Click) (1998-09-22)
Re: [?] Trees vs. Tuples for IRs (William D Clinger) (1998-09-26)
Re: [?] Trees vs. Tuples for IRs (Peter Klausler) (1998-09-26)
| List of all articles for this month |

From: Chris Clark USG <>
Newsgroups: comp.compilers
Date: 18 Sep 1998 23:02:07 -0400
Organization: Digital Equipment Corporation - Marlboro, MA
References: 98-09-042
Keywords: optimize, design

> What are the pros and cons of using trees versus tuples for a
> compiler's intermediete representation?

Okay, there are fixed a-rity tuples and n-ary tuples. Most people
when talking about tuples mean fixed arity ones, so I'll discuss them

There are 2-kinds of fixed-arity tuple representations, usually called
triples and quadruples (based on how they represent a binary tree).

Triples are almost exactly equivalent to trees, so I'll explain them
first. In fact, anything you can do with trees you can do with
triples and vice-versa and neither is generally substantially easier
that the other. The reason for this is obvious once you see how
triples represent a tree.

You simply number all of the nodes in a tree on a left to right
post-order pass and those are the nodes of the triple representation
in order. You then use the node numbers in place of pointers.

That is if you have a tree like:

  / \
a +
        / \
      a 2

It is coded in triples as:

1: addr a
2: load a
3: const 2
4: add 2, 3 // a + 2
5: assign 1, 4 // a := a + 2

The number in the first column is the number of the triple. The next
entry is the opcode of the triple followed by its operands. The final
entry is a comment giving the tree represented. (Converting from
triples back to trees is simply a case of replacing the indicies with
pointers.) Therefore, any tree operation is just an operation on the
index numbers rather than pointers when working with triples.

I've seen more than a few compilers which allocate their trees out of
an array-like structure, so that they could easily compute triples for
writing to an intermediate-file. (I.e. you rarely want to write raw
pointers to a file, but writing sequential indicies is generally quite
portable.) This is the one exception to the equal ease between
triples and trees rule, because triple are explcitly ordered by their
triple index, things which depend on order are easier to express in a
triple representation.

By the way, triple get their name because there are three fields in
the record to represent a binary operator (the opcode and the two

Okay, so now we have triples. Getting from them to quadruples is also
easy. Take the implicit number of the triple (the first column) and
make it a field of the tuple record (member of the tuple object if you
prefer object-oriented terminology) as below:

addr 1, a
load 2, a
const 3, a
add 4, 2, 3
assign 5, 1, 4

That extra field to make the number of the triple explicit in a
quadruple may seem like a waste. However, it allows one to do two

First, with unique record numbers quadruples are fully self-secribing
and you can rearrange them in any order you like and they still
describe the same tree. With triples, the order they appear in their
index list is important, since that is how they are addressed, while
the address of a quadruple is explicit in the node itself.

Second, many quadruple representations allow the quadruple index to be
reused and packed, which allows it to express register allocation, as
in the following:

load 1, a
const 2, a
add 2, 1, 2
addr 1, a
assign 1, 1, 2

Note, when this is done, quadruples lose their order independence and
become more like triples.


Now, that we have discussed fixed arity tuples, lets talk about n-ary
tuples. They come in the same varieties: target address implicit,
i.e. triple-like and target address explicit i.e. quadruple-like. The
same points from above apply.

The key difference in an n-ary tuple from its finite arity cousin is
how it represents nested expressions. In an n-ary tuple, trees with
the same operator are flattened (except where the semantics of the
language prohibit it, as in FORTRAN parenthesis).

For example:

            / \
          a +
                  / \
                + b
              / \
            a 2

Is represented as:

1: addr a
2: load a
3: const 2
4: load b
5: add 2, 3, 4 // a + 2 + b
6: assign 1, 5 // a := a + 2 + b

In addition, the operands are often sorted also (often so that
constants are first or last, depending on the implementors whim).

1: addr a
2: load a
3: load b
4: const 2
5: add 2, 3, 4 // a + b + 2
6: assign 1, 5 // a := a + b + 2

Tuples of this form have advantages for optimizing, because certain
algebraic optimizations the require tree reordering fall-out
trivially. They can also save space. On the other hand, everything
needs to be coded to handle multi-child trees, which adds its own
complexity, since certain algorithms are simpler if you can assure
that they applied only to binary trees.

Note even fancier n-ary tuples have been designed and the tradeoffs
are always the same. The fancier the representation the more one can
pack into a tree and handle as one entry. However, the more one also
has to deal with the potentially subtle semantics of the packed entity

-Chris Clark
Compiler Resources, Inc. email:
3 Proctor St.
Hopkinton, MA 01748 phone: (508) 435-5016
USA 24hr fax: (508) 435-4847

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.