Instruction Scheduling for UltraSPARC

Kuriakose Kuruvilla <>
11 Dec 2000 01:59:12 -0500

          From comp.compilers

Related articles
Instruction Scheduling for UltraSPARC (Kuriakose Kuruvilla) (2000-12-11)
| List of all articles for this month |

From: Kuriakose Kuruvilla <>
Newsgroups: comp.compilers
Date: 11 Dec 2000 01:59:12 -0500
Organization: Wipro
Keywords: architecture, optimize
Posted-Date: 11 Dec 2000 01:59:12 EST

Hi People

I am trying to analyse the performance improvements of an instruction
scheduler for instructions generated on-the-fly targeting the SPARC-V9
compliant UltraSPARCIIi processor.

This processor is able to issue upto 4 instructions per cycle. This is
based on rules for grouping instruction; these being described in
"Chapter 22: Grouping Rules and Stalls" of the "UltraSPARC-IIi User's

For example, SLLX uses IEU0 and ADD is a non-specific IEU instruction.
                sllx %i2, 2, %i2 ! Group1
                sllx %i3, 2, %i3 ! Group2
                sllx %i4, 2, %i4 ! Group3
                add %l4, 2, %l4 ! Group4
                add %l5, 2, %l5 ! Group4
                add %l6, 2, %l6 ! Group5
would be better scheduled as...
                sllx %i2, 2, %i2 ! Group1
                add %l4, 2, %l4 ! Group1
                sllx %i3, 2, %i3 ! Group2
                add %l5, 2, %l5 ! Group2
                sllx %i4, 2, %i4 ! Group3
                add %l6, 2, %l6 ! Group3
thereby giving an improvement of 2 cycles.

The instructions seem to be properly reordered based on these rules for
small instances of code I looked at. But the standard test suites do
not show the expected improvements.

So I tried using the Performance Control Register (PCR) and the
Performance Instrumentation Counters (PICs) provided by the processor.
These I accessed using a freeware perfmon driver.

But even the PIC is not showing the expected results when I tried
determining the number of instructions cycles for a small piece of
code. Also, the number of instructions shown by PIC to have executed is
not exactly the number of instructions that were timed, but is also
dependent on where the instructions are located in the address space (if
the first instruction timed is the last instruction a block of 8
instructions, aligned at 32-byte boundary, 7 more instructions are added
to the counter value corresponding to "instructions executed".

Can someone help me out on this? Is the accuracy of PIC registers
broken in some way; do they not do what the manual says? What about the
implementation of the grouping logic? Or am I missing out on something?

Anyone have prior experience with scheduling based on grouping rules on
the UltraSPARC? Or experience with using the PIC/PCR registers?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.