re: GCC Software Pipelining (Hedley Rainnie)
Sat, 17 Oct 1992 20:29:50 GMT

          From comp.compilers

Related articles
Re: Getting GCC to software pipeline loops (1992-10-13)
re: GCC Software Pipelining (1992-10-17)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Hedley Rainnie)
Organization: Compilers Central
Date: Sat, 17 Oct 1992 20:29:50 GMT
References: 92-10-054
Keywords: optimize, GCC

GCC 2.2.2 supports software pipelining in 2 forms.

1) Delay slot scheduling via reorg.c

- branches (annulled or "normal")
- calls
- jumps

2) Instruction costing via attibutes so that instructions
such as loads, stores, FP operations can be scheduled
in an intelligent way.

It should be noted that as of 2.2.2 1 & 2 do not communicate and thus
reorg can mess up a pipeline scheduled via the scheduler. In the case of
the processor I am most familiar with, the Stanford MIPS-X, having modifed
the GCC 2.2.2 R3000 code generator to reflect the Stanford differences
there are some experiences that may be useful to all.

Right off the bat because 1 & 2 above don't communicate, the bare bones
pipeline of the MIPS-X is a sitting duck for constructs like this

beq r0,r2,L101
ld 0[r4],r6
st _count[r25],r6

The above branch is illegal in MIPS-X (which has a 2 slot delay after
branch and the result of the load due to the structure of the pipeline
with respect to external data caches causes a 1 instruction hazard after
load). The result of the load above is not ready for the store following

  In order to get the job done and still use GCC (which is a good
compiler), I pushed these problems on to the assembler. Basically a new
assember was written that understood that a partial job was done on reorg
and schedule by the compiler and some final MIPS-X intelligence was
required at the backend. The assembler will selectively reorganize hazards
so that they don't occur, this can be quite involved since not only must
linear instruction sequences be studied but also the target's of branches
since a load could occur in the second slot and thus be a hazard at the
destination. Thus the above becomes:

ld 0[r4],r6
beq r0,r2,L101
st _count[r25],r6

More pipeline vagaries exist such as:

st _count1[r25],r0
st _count2[r25],r0

is illegal, remedy:

st _count1[r25],r0
nop ; or assembler found useful instr (very rare)
st _count2[r25],r0

MIPS-X also has false annulled branches: These are great for C like this

if(x) count++;
else count--;

ld _count[r25],r4
bnesq r0,r3,L101 ; assume x in r3
addi r4,#1,r4
addi r4,#-1,r4
st _count[r25],r4

Again the compiler can mess these up and the assembler must be extra
carefull when reorging these statements since oftentimes the code in the
slot is code from the target, that cannot be hoisted.

My message then is that use GCC 2.2.2 to get you almost where you want
to be, and then fix what you don't like later with a smart assembler

Hedley | Integrated Information Tech.
{decwrl|sun}!imagen!iitinc!hedley | Santa Clara, CA. (408)-727-1885 x266

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.