|[4 earlier articles]|
|Re: Compiler or interpreter? firstname.lastname@example.org (BGB / cr88192) (2010-06-18)|
|Re: Compiler or interpreter? email@example.com (Paul Biggar) (2010-06-18)|
|Re: Compiler or interpreter? firstname.lastname@example.org (Al Kossow) (2010-06-18)|
|Re: Compiler or interpreter? email@example.com (glen herrmannsfeldt) (2010-06-18)|
|Re: Compiler or interpreter? firstname.lastname@example.org (BGB / cr88192) (2010-06-19)|
|Re: Compiler or interpreter? email@example.com (glen herrmannsfeldt) (2010-06-19)|
|Re: Compiler or interpreter? firstname.lastname@example.org (BGB / cr88192) (2010-06-20)|
|From:||"BGB / cr88192" <email@example.com>|
|Date:||Sun, 20 Jun 2010 11:54:13 -0700|
|References:||10-06-032 10-06-038 10-06-045 10-06-049 10-06-053 10-06-055|
|Keywords:||interpreter, design, performance|
|Posted-Date:||21 Jun 2010 01:31:52 EDT|
"glen herrmannsfeldt" <firstname.lastname@example.org> wrote in message
> BGB / cr88192 <email@example.com> wrote:
> (snip, I wrote)
>>> Note the quote above. The different implementations can be faster
>>> or slower, even on different versions of the same architecture.
>>> Among others, the branch prediction logic may be sensitive
>>> to the differences.
>> possibly, but I suspect matters are more basic:
>> it takes several operations to decode arguments.
>> for example:
>> if(op>=192)op=((op-192)<<8)|(*ip++); //naively handle multi-byte ops
> (big snip)
> As far as I know, the usual implementation of switch/case, at least
> with dense packing of cases, is either an indexed branch to a table of
> branch instructions, or an indexed load followed by a register branch
MSVC manages to pull off the table dispatch in something like 10 or so
this does things like:
subtracting the starting index;
verifies that the index is within the jump table range (so that it can jump
to the "default label");
doing a "movzx" and adding the address of a label (IIRC MSVC uses 16-bit
jump offsets for the switch, which it typically extends to the native
register size, and adds a label, to recreate the address of the target);
jumping to this register.
doing something similar (I don't remember the actual sequence off hand):
;eax=value to be dispatched
sub eax, 10 ;say, table is 10..75
cmp eax, 65
movzx ecx, word ptr [jump_table+eax*2]
add ecx, Lbase
although, I think I may be missing a few instructions here (this is shorter
than I remember seeing).
but, when done in a tight loop, the code for doing such a switch dispatch
tends to rise to the top of the list in the profiler, and when this happens,
this is often an internal limit for the speed of an interpreter (along with
whatever instructions are used for decoding the opcode/...).
a while loop like in the prior example, is likely to look more like:
mov eax, [ebp-20] ;assume naive compiler
and eax, eax
; mov eax, [ebp-20] ;(MSVC tends to include this...)
mov ecx, [eax+8]
mov [ebp-20], eax
now, this thing is dispatched in a shorter instruction path than simply
needed for the switch (which to work would need many additional
in this case, it is not that much of a matter of guesswork.
the only real (likely) downside is that if there is a lot of code, given the
relatively low memory density of such linked-lists, they might take a bit of
a hit due to reduced cache effectiveness (whereas directly interpreting the
bytecode could have a smaller memory footprint, and thus have less cache
> Not having followed branch prediction logic too recently, (I suppose
> comp.arch is where that would go) it would seem that both cases are
> difficult for branch prediction, but that the former would be a little
> easier to process.
> That is, one could build branch prediction logic that understood that
> the value in a register was being used in an indexed branch and use
> that register for prediction. Though another possibility would be to
> provide an explicit switch/case instruction in the architecture.
branch prediction is not the only problem here...
branch prediction is more important for things like "if" and "for"
statements (which are relatively cheaper, and so are more effected in a
relative sense by the branch predictor).
a switch involves a big and complicated piece of logic to dispatch the
thing, and this logic is not free.
this is why for small switches, it is more commonly the case to use an
if-else dispatch, and in some tests, I have found that in many cases if-else
trees are actually a faster way to dispatch than using a switch (at least
with MSVC and x86/x64).
granted, a lot of this doesn't matter too much in most cases, since the
program does more than endlessly re-dispatch through a switch.
but, in interpreters, the use of a function pointer seems to be much
cheaper, since this is essentially just a load and indirect call (followed
by "push ebp; mov ebp, esp; ..." and like).
but, anyways, most of us can't simply add a special "switch" instruction,
and if one can, it is not clear what this instruction would do which could
really make the operation cheaper.
Return to the
Search the comp.compilers archives again.