Related articles |
---|
How many vector registers are useful? kirchner@uklira.informatik.uni-kl.de (1993-01-25) |
Re How many vector registers are useful? ssimmons@convex.com (1993-01-26) |
Re: Re How many vector registers are useful? billms@corp.corp.mot.com (1993-01-28) |
Re: Re How many vector registers are useful? idacrd!desj@uunet.UU.NET (1993-01-31) |
Re: Re How many vector registers are useful? billms@corp.mot.com (1993-02-02) |
Newsgroups: | comp.compilers |
From: | billms@corp.mot.com (Bill Mangione-Smith) |
Keywords: | optimize, vector |
Organization: | University of Michigan |
References: | 93-01-174 93-02-016 |
Date: | Tue, 2 Feb 1993 14:05:40 GMT |
Bill Mangione-Smith <billms@corp.corp.mot.com> writes:
> Santosh Abraham, Ed Davidson, and I had a paper two asplos's ago that
> looked at the minimal number of vector registers required for specific
> codes. [.... W]e decided to focus on the minimal number of registers
> required to achieve optimal performance.
idacrd!desj@uunet.UU.NET (David desJardins) writes:
I haven't looked at your paper, but I think that you have to be very
careful in using the word "optimal" here. I have written a fair number of
assembly-language routines for vector machines, and it is very often the
case that the number of vector registers needed for "nearly optimal" code
is substantially less than that needed for "perfectly optimal" code.
I've certainly seen that happen as well. We constructed a tool that used
implicit enumeration, along with a predictable set of latencies (as you
suspected), to guarantee optimal performance and minimal register
requirements. The study really focused on register requirements, and not
code scheduling. But once I convince you that we meant real optimal, and
not fuzzy optimal like most people seem to, you've got to either assume
that I don't know the problem complexity or I'm not selling compiler
technology.
In my experience, what often happens is that you can get a code which is
"nearly optimal" in the sense of taking the correct number of chimes to
execute the loop, but a few more ticks than is strictly necessary, because
the usage of the vector registers is not perfectly synchronized. A vector
functional unit might have to wait for its input for a few ticks, for
example, because the latency of the unit feeding it is greater than its
own latency. These few ticks might only add a few percent to the
execution time of the loop, but it might take as much as double the number
of vector registers to eliminate them.
In our study we did assume an idealized machine model, but you could
easily account for these effects without increasing the complexity of the
problem. In fact, the addition of resource hazards tends to reduce the
search time by pruning the tree sooner. But you are right, in practice
you probably want to ignore these issues. Jalby, Eisenbies, and
Lichnewsky had a very simple machine model for the Cray-2 which achieved
very good results. Of course, you could get into trouble if you were
scheduling on a machine with either short vector registers or a
programmable scratch pad that the compiler decided to divide into lots of
short registers.
Perhaps you were looking at some sort of "ideal" vector machine? Assuming
things like constant latencies in the functional units would certainly
simplify a truly optimal analysis while probably producing nearly
equivalent results for practical purposes.
I don't see how we can discuss real optimal schedules without assuming
constant latencies. For many machines the function units really have
constant latencies, and even without multiprocessor contention memory
references are a tough problem (to say the least). But one thing we can
do is assume a pessimistic memory system (maybe modeled in the compiler as
a varied load depending on the code) and schedule to that, like the
Cydrome people did.
Ultimately, who knows what the memory is going to do from run to run,
except in a really degenerate case? Scheduling pessimistically can cause
resource collisions that you would not have otherwise seen. It may not be
possible to say for sure what the optimal performance on live hardware is,
let alone identify or generate code that achieves it.
Bill
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.