Branch prediction (was: eliminating array bounds checking overhead)

Patryk Zadarnowski <patrykz@ilion.eu.org>
4 May 2000 17:15:09 -0400

          From comp.compilers

Related articles
Re: Re: eliminating array bounds checking overhead patrykz@ilion.eu.org (Patryk Zadarnowski) (2000-05-01)
Branch prediction (was: eliminating array bounds checking overhead) patrykz@ilion.eu.org (Patryk Zadarnowski) (2000-05-04)
Re: Branch prediction (was: eliminating array bounds checking overhead anton@mips.complang.tuwien.ac.at (2000-05-08)
Re: Branch prediction (was: eliminating array bounds checking overhead dave@complang.tuwien.ac.at (David Gregg) (2000-05-12)
| List of all articles for this month |

From: Patryk Zadarnowski <patrykz@ilion.eu.org>
Newsgroups: comp.compilers
Date: 4 May 2000 17:15:09 -0400
Organization: Compilers Central
References: 00-05-006
Keywords: optimize, architecture

> [I was under the impression that guessing that backward branches are
> taken and forward branches aren't does surprisingly well. -John]


Well, it certainly does better than always predicting branches as not
taken (which would be the simplest thing to do in hardware, as it
would eliminate the need for any instruction decoding in the prefetch
logic) simply because predicting backward branches as taken does so
well.


There are two sides to the branch prediction story: static
(compile-time) prediction and dynamic prediction done by the
prefetcher. A lot of work went into the area, particulary in the
modern x86 camp. The problem is that branch misprediction is painfully
expensive, especially if your instruction decoder is a mess of the x86
kind, or you are a superscalar CPU that has to abort many
speculatively-executed instructions if it fails to predict a branch
correctly. I've read somewhere that many post-pentium CPUs incur a
10-20 clock cycle penalty for mispredicted branches (don't quote me on
that). Of course, this means that your predictor should be really
smart: these CPUs expect prediction accuracy well into the 90% range.


The scheme you mention has been used on plain pentiums, and did
reasonably well as it predicted backward branches in loops
well. However, this scheme does particulary poorly for forward
branches. In particular, assuming them as `not taken' is merely to
simplify prefetch logic rather than because that's the common case
(ie. you make no particular assumptions about forward branches, and
just keep ploughing through your instruction stream unless you know
better.) Whether forward branches are taken or not depends heavily
one's programming style: my code tends to have a lot of error checking
of the form "if (fn() == -1) die();" making forward branches *usually
taken*. I haven't got any benchmarks handy, but Patterson (SIGPLAN
'95, pp.67-78) discusses performance of the technique when predicting
actual probabilities of branches taken, and the forward/backward rule
doesn't do significantly better (sometimes worse) than purely random
guessing.


All post-pentium x86 CPUs, and many high-performance RISCs maintain a
branch history, usually encoded as a bit queue, and hence are capable
of accurately predicting fairly complex branching patterns such as
those found in most non-trivial inner loops. Athlons even have a
fairly large cache that associates those bit queues with particural
branch sites, and Itanium uses even more convoluted (although still
kept under wrap by Intel) techniques.


All of this is aimed at improving prediction accuracy at runtime and
at the hardware level. Of course, many (most?) branches can be
predicted very high accurately by the compiler (think all those error
checks that most C programs do) and that's where static branch hints
come into play (discussed in my previous posting). Only if neither is
supplied, do modern CPUs revert to the backward/forward rule.


Pat.


Patryk Zadarnowski University of New South Wales
<pat@ia64.org> School of Computer Science and Engineering


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.