Re: Wanted: Register Assignemnt Algorithm for a DAG (Anton Ertl)
29 Jul 2000 23:13:57 -0400

          From comp.compilers

Related articles
Wanted: Register Assignemnt Algorithm for a DAG (Don Mackay) (2000-07-27)
Re: Wanted: Register Assignemnt Algorithm for a DAG (2000-07-29)
| List of all articles for this month |

From: (Anton Ertl)
Newsgroups: comp.compilers
Date: 29 Jul 2000 23:13:57 -0400
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 00-07-066
Keywords: analysis, theory

  "Don Mackay" <> writes:
>I have been using the ideas/algorithms from the 'Dragon Book' and have got
>to the stage of creating a DAG representation of my code. I now want to
>assign registers to the various nodes (ie this is the start of the 'back-end
>specific stuff - I'm targeting the x86 processors at this stage).

A popular way that I recommend is to use an unlimited number of
pseudo-registers thoughout the back-end, and allocate real registers
only at the end.

>The 'Dragon Book' shows an algorithm that works with a DAG that only has
>single parent nodes (ie is a tree) [at least that's the way I understand

The problem that this algorithm solves is scheduling the instructions
in the tree for minimal register pressure and allocating registers for
them (but only within the tree).

AFAIK the DAG equivalent of this problem is NP-complete.

>Can someone please describe/provide a pointer to an algorithm that will
>allocate a (variable sized?) set of general purpose registers to the nodes
>of a general DAG?

The usual way to deal with this is to use an instruction scheduler to
transform the DAG into a sequence, and then apply a register allocator
of your choice; if you just want to allocate registers within this
DAG, I would recommend a simple reverse pass that assigns a free
register whenever it sees a use of an unallocated pseudo-register, and
frees the register when it sees the definition. The goal of the
instruction schedulers I have seen in the literature has been mainly
to reduce the number of pipeline stalls, not primarily to reduce
register pressure.

There have been several papers about the interaction of instruction
scheduling and register allocation. [goodman&hsu88] is well-written
and proposes two approaches for the problem on the basic-block level.
In [ambrosch+94] we proposed a variant of graph-colouring global
register allocation that works on DAGs; the paper also contains
references to the other papers about this topic I know (except

I (and probably others) thought that the 386 architecture and thus
this problem would go away soon (if you have 32 registers, the
benefits of an integrated approach (register pressure reduction of
maybe 1-2 registers on average) do not justify the additional
complexity). But the 386 architecture has proven to be more
long-lived than expected, so more research in this direction is
probably justified. Of course nowadays the schedulig problem is
somewhat different through the introduction of out-of-order

- anton

    author = "James R. Goodman and Wei-Chung Hsu",
    title = "Code Scheduling and Register Allocation in Large
Basic Blocks",
    booktitle = "International Conference on Supercomputing",
    year = "1988",
    pages = "442--452",
    annote = "After an overview of the phase ordering problems in
instruction scheduling and register allocation two
algorithms are introduced: Integrated Prepass
Scheduling keeps track of the number of registers
left and switches between scheduling to minimize
pipeline stalls and scheduling to minimize register
usage accordingly. A variation of this algorithm also
spills registers in certain circumstances. DAG-Driven
Register Allocation is to be used with a postpass
scheduler and tries to allocate the registers without
generating new dependencies. If this cannot be
achieved, the register is chosen in a way that
minimizes the path length of the additional paths.
The two algorithms (combined with a register
allocator and an instruction scheduler, respectively)
perform better than the usual prepass, postpass
or two-pass approaches."

    author = "Wolfgang Ambrosch and M. Anton Ertl and Felix Beer
and Andreas Krall",
    title = "Dependence-Conscious Global Register Allocation",
    booktitle = "Programming Languages and System Architectures",
    year = "1994",
    editor = "J{\"u}rg Gutknecht",
    publisher = "Springer LNCS~782",
    address = "{Z\"urich}",
    pages = "125--136",
    url = "",
    abstract = "Register allocation and instruction scheduling are
antagonistic optimizations: Whichever is applied
first, it will impede the other. To solve this problem, we
propose dependence-conscious colouring, a register
allocation method that takes the dependence graph
used by the instruction scheduler into
consideration. Dependence-conscious colouring
consists of two parts: First, the interference graph
is built by analysing the dependence graphs,
resulting in fewer interference edges and less
spilling than the conventional preordering approach.
Secondly, during colouring the register selection
keeps dependence paths short, ensuring good
scheduling. Dependence-conscious
colouring reduces the number of interference edges
by 7\%--24\% and antidependences by 46\%--100\%."

    author = "Cindy Norris and Lori. L. Pollock",
    title = "A Scheduler-Sensitive Global Register Allocator",
    booktitle = "Supercomputing'93",
    year = "1993",
    url = "",
    annote = "Like \cite{pinter93}, their global register
allocator builds a maximal interference graph, i.e.,
a graph that contains all intereferences possible in
various schedules. The register allocator does not
introduce antidependences, and therefore provides
maximum scheduling freedom. If the register
allocator thinks it will run out of registers, it
adds dependences to reduce the number of
interferences. The paper empirically studies various
heuristics for adding dependences, applied at
various stages of the register allocator. The best
one introduces dependences that remove the maximum
number of interferences already before building
the interference graph (based on a count of live
values). With this heuristic, scheduler-sensitive
register allocation is a little better than
integrated prepass scheduling for 20 registers or
more and a little worse for fifteen registers or
less. Unfortunately they do not compare these
methods with plain prepass scheduling. "
M. Anton Ertl Some things have to be seen to be believed Most things have to be believed to be seen

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.