|Register Allocation and Instruction Scheduling firstname.lastname@example.org (1994-03-23)|
|Register allocation and instruction scheduling email@example.com (1999-01-17)|
|Re: Register allocation and instruction scheduling firstname.lastname@example.org (David A. Greene) (1999-01-19)|
|Re: Register allocation and instruction scheduling email@example.com (1999-01-20)|
|Re: Register allocation and instruction scheduling firstname.lastname@example.org (David A. Greene) (1999-01-22)|
|Re: Register allocation and instruction scheduling email@example.com (1999-01-25)|
|Re: Register allocation and instruction scheduling firstname.lastname@example.org (1999-01-27)|
|Re: Register allocation and instruction scheduling email@example.com (1999-01-27)|
|Re: Register allocation and instruction scheduling firstname.lastname@example.org (1999-01-31)|
|[1 later articles]|
|From:||"David A. Greene" <email@example.com>|
|Date:||19 Jan 1999 00:55:42 -0500|
|Organization:||Department of Electrical Engineering and Computer Science, The University of Michigan|
> I'm presently reading Appel's book on compilers and a few questions
> came to mind.
> 1) How to take into account register renaming by the hardware by the
> compiler: especialy when using graph coloring techniques?
The extra physical registers are not (generally) architecturally
visible, so I don't see what can be gained here. Perhaps someone else
can shed some light.
You'd like the register allocator to reduce the memory bandwidth
requirements of the program as much as possible. A better question to
ask might be, "how can we modify the architecture to allow an
optimizing compiler to make better decisions?" One thing I really
like about Wen-mei Hwu's group at Illinois is that they approach
optimizing compilers from this angle.
> 2) How to take into account hardware instruction scheduling/reordering
> when trying to do different types of compiler instruction optimzation.
There are a couple of things going on here. An optimizing compiler
can look far further down the (static) instruction stream than the
hardware can dynamically. The trade-off is the amount of run-time
information available. Profiling can bridge this gap somewhat. So
one might look at this problem as a way for the compiler to schedule
code "close enough" together so that the hardware can do the
fine-grained tweaking at execution time.
Another way to approach this problem involves the organization of the
processor. Every O-O-O processor that I know of has some in-order
part to it. Generally this includes the fetch-decode mechanism. So
the compiler might want to consider scheduling for this part of
instruction processing rather than scheduling (out-of-order) execution
units. The idea here is to optimize the instruction stream so that it
can be decoded as quickly as possible to supply a steady stream of
work to the core. This may be especially useful for PetiumPro-style
machines where the instruction decode (2 simple, one complex, or
whatever it is these days) can become a bottleneck.
> Doesn't the hardware essentially undo all of my hard work in trying to
> write an optimized compiler? The book doesn'talk about these topics.
Somewhat. The register rename process essentially does on-the-fly
register allocation. So it seems that the whole purpose of the
compiler's static allocator is to reduce the amount of spill code
(i.e. to get as many candidates as possible into the rename hardware).
The scheduler becomes responsible for allowing the run-time
instruction window to contain as many independent instructions as
possible. The O-O-O engine then has more work to schedule for the
execution units, keeping everyone happy.
Return to the
Search the comp.compilers archives again.