Re: Branch prediction (Anton Ertl)
31 May 2000 23:08:44 -0400

          From comp.compilers

Related articles
Re: Branch prediction (2000-05-20)
Re: Branch prediction (2000-05-21)
Re: Branch prediction (2000-05-21)
Re: Branch prediction (Andi Kleen) (2000-05-21)
Re: Branch prediction (2000-05-28)
Re: Branch prediction (2000-05-31)
Re: Inline caching (was Re: Branch prediction) (2000-06-01)
Re: Branch prediction (2000-06-03)
Re: Branch prediction (2000-06-20)
| List of all articles for this month |

From: (Anton Ertl)
Newsgroups: comp.compilers
Date: 31 May 2000 23:08:44 -0400
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 00-05-103
Keywords: architecture, performance writes:
>>In virtual machine (VM) interpreters BTBs have only 0%-20% prediction
>>accuracy if the interpreter uses a central dispatch routine, but they
>>give about 50% prediction accuracy if every VM instruction has its own
>>dispatch routine.
>This is possible with GCC's label as values. Another good reason to use

Yes, that's the most portable way a user can ensure that this happens
(Fortran's computed GOTO is another way, but IMO Fortran is less
portable than GNU C, and I believe GNU C has other advantages when
implementing integers).

>There is an interesting aspect I just ran after with BTB. If you
>implement inline caches with an indirect jump (instead of patching the

[I assume this is about OO method dispatch]
What does that look like? Isn't this just ordinary OO dispatch?

> you have no penalty because the jump through the inline cache is
>always predicted correctly (by the very definition of inline caching).

Modulo conflict and capacity misses.

>>I believe this can be improved even more by combining common sequences of
>>VM instructions into one VM instruction
>An easier way is to combine similar adjacent bytecodes into a single
>routine. For example (I use a switch statement syntax here):
> case 0: case 1: ... pushOOP(instanceVariable(*ip++ & 15));

That's contrary to my suggestion; I suggested creating more instances
of the dispatch code, the method above would combine several
dispatches into one. The disadvantage is: if you have several of
these VM instructions in an inner loop or somesuch, there will
probably be different next instructions, and the BTB will perform
badly for these dispatches.

>But check out decode penalties!

Yes, not only does it require additional decoding overhead, it is also
incompatible with the use of threaded code.

- anton
M. Anton Ertl Some things have to be seen to be believed Most things have to be believed to be seen

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.