Re: Reg. Alloc. - Graph Coloring - Copy Minimization

markhall@pyrps5.pyramid.com (Mark Hall)
Thu, 25 Oct 90 14:25:06 -0700

          From comp.compilers

Related articles
Reg. Alloc. - Graph Coloring - Copy Minimization jsp@milton.u.washington.edu (Jeff Prothero) (1990-10-22)
Re: Reg. Alloc. - Graph Coloring - Copy Minimization markhall@pyrps5.pyramid.com (1990-10-25)
| List of all articles for this month |

Newsgroups: comp.compilers
From: markhall@pyrps5.pyramid.com (Mark Hall)
Keywords: optimize, design
Organization: Compilers Central
Date: Thu, 25 Oct 90 14:25:06 -0700

In article <9010222019.AA27142@milton.u.washington.edu> Jeff Prothero <jsp@milton.u.washington.edu> writes:
>Question: Has anyone looked at coloring heuristics designed to
>minimize the number of copies needed by the resulting code?


Yes. In fact, the original "Chaitin et al" paper describes an algorithm to
coalesce nodes of an interference graph when there's a register copy
operation between two registers. When we implemented our coloring
algorithm for the Pyramid MIServer about 2 years ago, we did some
further research into the problem.


>Background: The register allocation selected can affect the number of MOVE
>instructions generated, since by assigning two temps with consecutive
>lifetimes to the same register, one can avoid the need to do an explicit copy
>between them.


The lifetimes can actually be overlapping. More on that later.


>In general, increasing the number of degrees of freedom before searching for
>a solution should be a Good Thing, but in this case the extra degrees of
>freedom are bought at the cost of potentially introducing extra copy
>instructions.


Fortunately for us at Pyramid, the MIServer architecture has upwards of
48 registers available per stack frame. Our first cut at a coloring
algorithm didn't consider copy elimination. However, the first time we
compiled the source distribution with the coloring compiler and looked,
there were *way* too many unused registers lying around. We figured we
better do something else with them quickly before the hardware guys saw
that we weren't using them anymore and tried to take them back :-). That's
when we looked into the copy elimination problem.


>Unfortunately, existing coloring heuristics seem to be
>designed only to pack the registers as full as possible, ignoring the cost of
>the copies introduced by a given coloring.


The technique we chose interacted well with our coloring technique
(which is priority-based). We use a priority-based copy elimination
algorithm (yawn...). When you coalesce two nodes of the IG, you are
doing it based on a particular copy operation that appears somewhere in
the instruction stream you are analyzing. Well, if that copy
instruction appears in a loop, you certainly want to get rid of it
before you get rid of one that isn't in a loop. So you assign
priorities to the copy operations, and then you coalesce nodes in the
IG based on those priorities.


The technique worked well for us. I think this was mostly because
the MIServer is a 2-address architecture, which requires more register
copies than a 3-address machine does. For example, the naive code generated
for the C line:


i = j + k;
is:
movw Rj,Rt #move register j into a temp reg
addw Rk,Rt #add register k to the temp reg
movw Rt,Ri #move the temp reg back to register i.


Now with some effort, any decent code generator can produce:


movw Rj,Ri #move register j into a Register i
addw Rk,Ri #add register k to the temp reg


But if "i" can share a register with either of k or j,
it takes a copy-elimination register allocator to get rid of both of
the register copies, yeilding:


addw Rk,Rij
or
addw Rj,Rik




To do this well, you have to play some games when building the
interference graph. Specifically, if you are doing lifetime analysis
on the instruction:


i = k;


And "i" remains alive after the instruction (ie, the lifetimes overlap), you
might still be able to assign the same register to both i and k, so
you don't want to put the edge (i,k) into your interference graph (IG)
based on this instruction. Of course, the edge might eventually appear in the
IG due to some *other* instruction, and that's OK, it just means you can't
coalesce the nodes.


>Has anyone looked at this issue? Is it wrong-headed to do so? Should one
>have two separate sets of coloring heuristics, one for use when the registers
>are very full, and one for use when registers are going begging?


For us, since we have so many registers, it was appropriate to coalesce nodes
without worrying about how it affected coloring. However, this may be true
even if you have only a few registers. I say this because coalescing nodes
of an IG doesn't affect the "colorability" of the graph in a linear way.
That is, coalescing two nodes of the IG doesn't necessarily cause the
chromatic number to increase by one. For the average program, you might be
able to coalese 10 nodes before increasing the chromatic number. Moreover,
if the copy instruction is in a loop, you want to get rid of it even if you
raise the chromatic number beyond the number of available registers! (so long
as this doesn't cause you to spill any registers of higher priority than the
register copy instruction). I maintain that there's a good thesis in somehow
quantifying this relationship.


>Are there practical heuristics which can be tuned to handle both situations?
>(I presume simulated annealing, for example, could be so tuned, but would be
>too slow to be practical?)


Now that I think of it, one simple thing to do is to check the combined
number of incident edges which would result from coalescing two nodes. If
this number is less than the number of available registers, then you can
coalesce without worry.


Watch out for a sinister sub-problem here. Even if you have an
infinite number of registers, it's still an Extremely Hard Problem
(I'll bet it's NP-complete) to coalesce IG nodes such that you minimize
the "weights" of the remaining copy operations. You are constrained by
the interference edges of the IG, and the problem gets really ugly. We
looked at heuristic solutions using Min-cut/Max-flow algorithms (which
are much better than annealing for this problem because you can retain an
acceptable compile time using them) but none of these seemed to be worth
the "extra effort" (ie, Pyramid doesn't pay me for pure research).


-Mark Hall (smart mailer): markhall@pyrps5.pyramid.com
(uucp paths): {ames|decwrl|sun|seismo}!pyramid!markhall
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.