Re: Compiler or interpreter?

"BGB / cr88192" <>
Sat, 19 Jun 2010 10:57:20 -0700

          From comp.compilers

Related articles
[2 earlier articles]
Re: Compiler or interpreter? (glen herrmannsfeldt) (2010-06-16)
Re: Compiler or interpreter? (glen herrmannsfeldt) (2010-06-17)
Re: Compiler or interpreter? (BGB / cr88192) (2010-06-18)
Re: Compiler or interpreter? (Paul Biggar) (2010-06-18)
Re: Compiler or interpreter? (Al Kossow) (2010-06-18)
Re: Compiler or interpreter? (glen herrmannsfeldt) (2010-06-18)
Re: Compiler or interpreter? (BGB / cr88192) (2010-06-19)
Re: Compiler or interpreter? (glen herrmannsfeldt) (2010-06-19)
Re: Compiler or interpreter? (BGB / cr88192) (2010-06-20)
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers
Date: Sat, 19 Jun 2010 10:57:20 -0700
References: 10-06-032 10-06-038 10-06-045 10-06-049
Keywords: interpreter, design
Posted-Date: 19 Jun 2010 14:35:56 EDT

"glen herrmannsfeldt" <> wrote in message
> BGB / cr88192 <> wrote:
> (snip, I previously quoted)

<snip, Fortran>

>> oddly, a linked list of structs each containing a function pointer (and
>> typically pre-decoded arguments), can be somewhat faster than the "read
>> an
>> opcode word, dispatch via switch, read-in args, execute" strategy.
> Note the quote above. The different implementations can be faster
> or slower, even on different versions of the same architecture.
> Among others, the branch prediction logic may be sensitive
> to the differences.

possibly, but I suspect matters are more basic:
it takes several operations to decode arguments.

for example:
if(op>=192)op=((op-192)<<8)|(*ip++); //naively handle multi-byte ops
        case OP_NOP:
        case OP_LOAD:
                i=*ip++; i=(i<<8)|(*ip++);
        case ...

as well as, at least in MSVC, the jump-tables are fairly slow, I suspect
because MSVC apparently tries to "compress" the jump offsets (each table
entry is a relative offset rather than a full pointer), ...

but, I also noted that this limitation happened often when using GCC as

pre-decoding any opcodes into structs, and then running things more like:
while(op) { op=op->fcn(ctx, op); }

with fcn being the function pointer to the opcode handler, which returns the
next opcode.

where, for example:
Foo_Opcode *Foo_OpNop(Foo_Context *ctx, Foo_Opcode *op)

and for less common opcodes, something like:

Foo_Opcode *Foo_OpGeneric(Foo_Context *ctx, Foo_Opcode *op)

IME, the instruction dispatching tends to be much faster this way, where
many common opcodes can be encoded directly, and then potentially bypass a
fair amount of fairly expensive logic.

admittedly though, having a bytecode and then translating it into this form,
does potentially somewhat complicate an interpreter.

this strategy though worked extra well when writing an interpreter for x86,
which otherwise has a notably expensive opcode decoder (although, I had
ended up largely pre-decoding opcodes to deal with this cost, essentially
using an opcode cache).

but, even as such, most of the time was still going into the main big

something like:

after changing to the function-pointer strategy, it ran a good deal faster,
and most of the running time shifted over to the logic for memory-address

>> the reason I say "almost entirely" above is because, as noted,
>> there are very few compilers which don't generate (at least some)
>> hidden API calls in some cases.
> Many C compilers on smaller systems do API calls for every floating
> point operation. In the MS-DOS days when the x87 processor wasn't
> so common, it wasn't unusual to see self-modifying code. The API
> call would detect that the math processor was present, and then
> patch over the call with the appropriate x87 opcode.

fair enough.

I haven't done much of this sort of SMC. I have done some SMC, but it works
differently and for different reasons.

>> something loosely similar, but typically used within the main codegen,
>> is to preserve all registers, but pass/return values on the stack
> This reminds me of the API call commonly used with MS-DOS to execute
> the appropriate DOS INT (interrupt) call. The interrupt code is the
> second byte of the INT instruction. While one could implement a table
> full of INT instructions, it is commonly implemented by generating the
> instruction in memory and then executing it, prety much self modifying
> code.

well, in this case, I was mostly dealing with things like 128 bit integers:
(subtract 32 from virtual SP, re-sync real ESP to this point);
movaps [esp+16], xmm5
movaps [esp+0], xmm7
call __int128_add
movaps xmm7, [esp+0]
(add 32 back to virtual SP)

with something like XMeta, the call would look something more like:
[re-sync ESP]
call _XM_int128_1add__xmm7__xmm5

with the linker causing an appropriate handler to be generated.
other registers may be used within the handler, but all used registers are
to be preserved.
(the downside is that if one wants to call back into C, this means
essentially saving all the scratch, XMM registers, and FPU state, since it
is not certain what the called C function may touch).

sub esp, 524 ;512+12
fxsave [esp]
fxrstor [esp]
add esp, 524

and this is not exactly cheap...

(so, the general idea is to not use XMeta to call into C, but typically
implement the handler logic purely in ASM such as to avoid potentially
expensive full register state saving...).

now, with an ASM extension for XMeta, it would look more like:
xmeta int128_add xmm7, xmm5

which would transform into the call from before.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.