re: GCC Software Pipelining

hedley@iit.com (Hedley Rainnie)
Sat, 17 Oct 1992 20:29:50 GMT

          From comp.compilers

Related articles
Re: Getting GCC to software pipeline loops sanjay@equalizer.cray.com (1992-10-13)
re: GCC Software Pipelining hedley@iit.com (1992-10-17)
| List of all articles for this month |

Newsgroups: comp.compilers
From: hedley@iit.com (Hedley Rainnie)
Organization: Compilers Central
Date: Sat, 17 Oct 1992 20:29:50 GMT
References: 92-10-054
Keywords: optimize, GCC

GCC 2.2.2 supports software pipelining in 2 forms.


1) Delay slot scheduling via reorg.c


- branches (annulled or "normal")
- calls
- jumps


2) Instruction costing via attibutes so that instructions
such as loads, stores, FP operations can be scheduled
in an intelligent way.


It should be noted that as of 2.2.2 1 & 2 do not communicate and thus
reorg can mess up a pipeline scheduled via the scheduler. In the case of
the processor I am most familiar with, the Stanford MIPS-X, having modifed
the GCC 2.2.2 R3000 code generator to reflect the Stanford differences
there are some experiences that may be useful to all.


Right off the bat because 1 & 2 above don't communicate, the bare bones
pipeline of the MIPS-X is a sitting duck for constructs like this


beq r0,r2,L101
ld 0[r4],r6
st _count[r25],r6


The above branch is illegal in MIPS-X (which has a 2 slot delay after
branch and the result of the load due to the structure of the pipeline
with respect to external data caches causes a 1 instruction hazard after
load). The result of the load above is not ready for the store following
it.


Compromise:
  In order to get the job done and still use GCC (which is a good
compiler), I pushed these problems on to the assembler. Basically a new
assember was written that understood that a partial job was done on reorg
and schedule by the compiler and some final MIPS-X intelligence was
required at the backend. The assembler will selectively reorganize hazards
so that they don't occur, this can be quite involved since not only must
linear instruction sequences be studied but also the target's of branches
since a load could occur in the second slot and thus be a hazard at the
destination. Thus the above becomes:


ld 0[r4],r6
beq r0,r2,L101
st _count[r25],r6
nop


More pipeline vagaries exist such as:


st _count1[r25],r0
st _count2[r25],r0


is illegal, remedy:


st _count1[r25],r0
nop ; or assembler found useful instr (very rare)
st _count2[r25],r0


MIPS-X also has false annulled branches: These are great for C like this


if(x) count++;
else count--;


ld _count[r25],r4
bnesq r0,r3,L101 ; assume x in r3
addi r4,#1,r4
nop
addi r4,#-1,r4
L101:
st _count[r25],r4


Again the compiler can mess these up and the assembler must be extra
carefull when reorging these statements since oftentimes the code in the
slot is code from the target, that cannot be hoisted.


---------------------
My message then is that use GCC 2.2.2 to get you almost where you want
to be, and then fix what you don't like later with a smart assembler


Hedley


hedley@iit.com | Integrated Information Tech.
{decwrl|sun}!imagen!iitinc!hedley | Santa Clara, CA. (408)-727-1885 x266
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.