Re: Block Scheduling: determ vs nondeterm

grunwald@foobar.cs.colorado.edu (Dirk Grunwald)
Mon, 20 Jun 1994 01:58:27 GMT

          From comp.compilers

Related articles
Block Scheduling: determ vs nondeterm PAULLONG@delphi.com (1994-06-05)
Re: Block Scheduling: determ vs nondeterm preston@noel.cs.rice.edu (1994-06-13)
Re: Block Scheduling: determ vs nondeterm paullong@delphi.com (1994-06-15)
Re: Block Scheduling: determ vs nondeterm preston@noel.cs.rice.edu (1994-06-16)
Re: Block Scheduling: determ vs nondeterm chase@Think.COM (1994-06-16)
Re: Block Scheduling: determ vs nondeterm grunwald@foobar.cs.colorado.edu (1994-06-20)
| List of all articles for this month |

Newsgroups: comp.compilers
From: grunwald@foobar.cs.colorado.edu (Dirk Grunwald)
Keywords: optimize
Organization: University of Colorado at Boulder
References: 94-06-060 94-06-103
Date: Mon, 20 Jun 1994 01:58:27 GMT

A similar greedy algorithm is used by Wen-mei Hwu and Pohua P. Chang in..


@INPROCEEDINGS{hwu89:icache,
AUTHOR = "Wen-mei W. Hwu and Pohua P. Chang",
TITLE = "Achieving High Instruction Cache Performance with an
Optimizing Compiler",
BOOKTITLE = ISCA89,
YEAR = 1989,
PAGES = "242--251",
ORGANIZATION = ACM,
PUBLISHER = ACM
}


For each procedure, they take the entry block, then ``grow it down,''
following the most common execution path. They then select the next block,
and ``grow it down'' and ``grow it up''. In later paper, some of which
should be in SP&E by now, they show that by ``tail duplication'' of what
are essentially traces, but they call ``superblocks'', they can greatly
reduce Icache miss rates and improve branch prediction.


Pettis and Hansen used a similar technique..


@INPROCEEDINGS{pettis90:profile-guided,
AUTHOR = "Karl Pettis and Robert C. Hansen",
TITLE = "Profile Guided Code Positioning",
BOOKTITLE = pldi90,
YEAR = 1990,
PAGES = "16--27",
ORGANIZATION = "ACM",
PUBLISHER = "ACM",
MONTH = Jun
}


Pettis and Hansen didn't duplicate code, but they did ``segregate'' the
frequently executed traces from the infrequently executed traces. This
improved the Icache miss rate and probably the TLB miss and paging rate.


My student, Brad Calder, and I compared several such techniques and
devised one that performs an exhaustive search over the set of most
commonly executed blocks. This has (marginally) better behavior than the
previous two techniques. We were mainly interested in branch prediction,
and compared the performance for several static prediction mechanisms and
some dynamic mechanisms.


You want to balance off having the ``fall through'' path be the most
common with the desire to eliminate branches. I have read other papers
(which I've forgotten the citations for) that attempt to reduce unecessary
jumps by duplicating code.


The implications of software pipelining are interesting. We applied these
transformations after the program was compiled, so didn't have to worry
about this.


The paper I mentioned will be in the next ASPLOS, and we should have a
ready version in a couple of weeks if you want it sooner.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.