Related articles |
---|
How many vector registers are useful? kirchner@uklira.informatik.uni-kl.de (1993-01-25) |
Re How many vector registers are useful? ssimmons@convex.com (1993-01-26) |
Re: Re How many vector registers are useful? billms@corp.corp.mot.com (1993-01-28) |
Re: Re How many vector registers are useful? idacrd!desj@uunet.UU.NET (1993-01-31) |
Re: Re How many vector registers are useful? billms@corp.mot.com (1993-02-02) |
Newsgroups: | comp.compilers |
From: | idacrd!desj@uunet.UU.NET (David desJardins) |
Keywords: | vector, architecture |
Organization: | IDA Center for Communications Research, Princeton |
References: | 93-01-174 93-01-211 |
Date: | Sun, 31 Jan 1993 05:15:43 GMT |
Bill Mangione-Smith <billms@corp.corp.mot.com> writes:
> Santosh Abraham, Ed Davidson, and I had a paper two asplos's ago that
> looked at the minimal number of vector registers required for specific
> codes. [.... W]e decided to focus on the minimal number of registers
> required to achieve optimal performance.
I haven't looked at your paper, but I think that you have to be very
careful in using the word "optimal" here. I have written a fair number of
assembly-language routines for vector machines, and it is very often the
case that the number of vector registers needed for "nearly optimal" code
is substantially less than that needed for "perfectly optimal" code.
In my experience, what often happens is that you can get a code which is
"nearly optimal" in the sense of taking the correct number of chimes to
execute the loop, but a few more ticks than is strictly necessary, because
the usage of the vector registers is not perfectly synchronized. A vector
functional unit might have to wait for its input for a few ticks, for
example, because the latency of the unit feeding it is greater than its
own latency. These few ticks might only add a few percent to the
execution time of the loop, but it might take as much as double the number
of vector registers to eliminate them.
Perhaps you were looking at some sort of "ideal" vector machine? Assuming
things like constant latencies in the functional units would certainly
simplify a truly optimal analysis while probably producing nearly
equivalent results for practical purposes.
David desJardins
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.