Re: References to MC88K compilers?

phillip@oakhill.sps.mot.com (Mike Phillip)
Wed, 4 Dec 1991 00:42:28 GMT

          From comp.compilers

Related articles
References to MC88K compilers? esswein@kmx.gmd.dbp.de (1991-12-02)
Re: References to MC88K compilers? phillip@oakhill.sps.mot.com (1991-12-04)
| List of all articles for this month |

Newsgroups: comp.compilers
From: phillip@oakhill.sps.mot.com (Mike Phillip)
Keywords: architecture, optimize
Organization: Motorola Inc., Austin, Texas
References: 91-12-007
Date: Wed, 4 Dec 1991 00:42:28 GMT

In article 91-12-007 esswein@kmx.gmd.dbp.de (Dieter Esswein) writes:
>
>Does anybody know of published papers on compilers for the MC88K family?
>In particular, I am interested in the instruction scheduling technique of
>the MC88100 and MC88110 (basic block oriented,
>(trace/region/percolation/...) scheduling, also). I possess a paper of
>Motorola surveying the MC88110 which mentions an a(lpha) compiler without
>any reference or detail on its optimization strategy. Does GCC offer a
>backend with instruction scheduling?


I'm assuming that the paper you refer to was the one presented in Japan
last month at the IPSJ (Information Processing Society of Japan)
conference. There are some other papers in preparation that discuss in
more detail the 88110 compilers that we are working on in Austin.


The "alpha" compilers that were referenced are these Motorola 88110
compilers.


In answer to your instruction scheduling question, there are a number of
heuristics that are being used to efficiently schedule code; the issues
differ somewhat between the 88100 and the 88110, however.


For the 88100, which is a single-issue machine with fairly long floating
point pipelines, 32 32-bit registers and 32-bit data paths to cache, the
primary scheduling concern was to hide pipeline latencies while avoiding
structural hazards such as shared pipeline stages (in the floating point
units) and memory access bandwidth. Of course, the 88100 provided
hardware interlocks to enfore these hazards and other data dependencies,
but to optimize performance the compiler needed to attempt to reorder code
to avoid these stalls. Register pressure was also of significant concern,
especially for double precision floating point data, which burned a pair
of 32-bit registers. As far as I know, most commercially available 88100
compilers limited scheduling to within basic blocks. The GNU 1.xx
compilers didn't really do instruction scheduling, although the "new" 2.xx
versions have added this feature. Although we're not actively using GCC
for our compiler development in Austin, most feedback I've heard from 88K
users has been very positive for the 2.xx releases of the 88100 GNU
compiler.


For the 88110, the scheduling issues shift towards utilizing the
superscalar nature of the processor (2 instruction per clock). The
floating point pipelines are much shorter, and do not share stages, which
means there is less latency to hide and fewer resource conflicts to track.
The internal data paths can accomodate 64-bit transfers, so execution
latencies do not vary based on data size. Register pressure is not as
severe, either, since the 88110 adds 32 80-bit registers to the existing
88100 register file. (Of course, there is still lots of register pressure
for certain applications, but there is effectively 3x as much double
precision register name space on the 88110 as there is on the 88100...)
Currently, the "alpha" version of our compilers only schedule within basic
blocks, with the exception of filling delayed branch slots in predecessor
blocks. We are working on various implementations of global scheduling
heuristics similar to "trace" or "percolation" scheduling to improve
utilzation of the multiple execution units on the chip. On a practical
note, much of the research-oriented discussion of global scheduling cannot
be "safely" applied to most existing commercial processors due to nagging
little details like exceptions, etc. That's not to say that global
scheduling doesn't work - it's just that a lot of instructions can't be
moved past conditional branches due to the possibility of program sematics
being altered by an exception. Some of the actual/proposed machines that
aggressively apply global scheduling techniques have special architectural
support in hardware to handle exceptional cases.


Our 88110 compilers also try some other approaches to increasing basic
block sizes for the instruction scheduler. Examples include loop
unrolling and branch elimination through logical reduction (the 88K has
some really nice bit field instructions that can be used to eliminate
certain classes of conditional branches). The 88110 also provides
architecural support for static branch prediction, so the compiler
attempts to use the right "flavor" of conditional branches to accelerate
execution. We have also spent a great deal of effort trying to account
for memory behavior when performing optimizations and instruction
scheduling. This is of particular importance on the 88110 when a
secondary cache is not used, since the on-chip instruction and data caches
are 8 Kb each, and only 2-way set associative.


--------------------------------------------------------------------------
    Mike Phillip E-mail: phillip@oakhill.sps.mot.com
    RISC Compiler Development or oakhill!phillip@cs.utexas.edu
    Motorola, Inc.
    Austin, TX Phone: (512) 891-3656
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.