Re: Efficient bytecode design and interpretation

anton@mips.complang.tuwien.ac.at (Anton Ertl)
3 Jun 2001 17:04:08 -0400

From comp.compilers

Related articles
[2 earlier articles]
Re: Efficient bytecode design and interpretation jonm@fishwife.cis.upenn.edu (2001-05-30)
Re: Efficient bytecode design and interpretation loewis@informatik.hu-berlin.de (Martin von Loewis) (2001-05-30)
Re: Efficient bytecode design and interpretation eugene@datapower.com (Eugene Kuznetsov) (2001-05-30)
Re: Efficient bytecode design and interpretation korek@icm.edu.pl (2001-05-31)
Re: Efficient bytecode design and interpretation usenet.2001-05-30@bagley.org (Doug Bagley) (2001-05-31)
Re: Efficient bytecode design and interpretation anton@mips.complang.tuwien.ac.at (2001-06-03)
*Re: Efficient bytecode design and interpretation anton@mips.complang.tuwien.ac.at* (2001-06-03)**

| List of all articles for this month |

From:	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups:	comp.compilers
Date:	3 Jun 2001 17:04:08 -0400
Organization:	Institut fuer Computersprachen, Technische Universitaet Wien
References:	01-05-068 01-05-082 01-05-097
Keywords:	performance, architecture
Posted-Date:	03 Jun 2001 17:04:08 EDT

korek@icm.edu.pl writes:
>In article 01-05-082, Eugene Kuznetsov wrote:
>>> [It's been discussed before. My suggestion is that unlike the design for
>>> a physical architecture, there's little added cost to providing zillions
>>> of operators, and the more each operator does the fewer times you go
>>> through the interpreter loop, so a CISC design likely wins. You might
>> also
>>
>> That's true up to a point -- there are two breakpoints proportional to
>> L1 and L2 instruction cache sizes. That makes a huge difference, and can
>> balance out the advantage of using very many opcodes.
>
>Would you suggest some way(if it is feasible) to detect cache sizes
>before full compilation (as a part of ./configure script)?

You could take a look at lmbench. Cache size also plays a role for
many numerical algorithms, so maybe there are programs configured in
that way in that area. Instruction and data cache size are the same
on most processors, so you might get away by measuring data cache size
(which is easier).

However, in interpreters the number of I-cache misses is strongly
influenced by the locality of the interpreted programs, so a huge
interpreter may cause few I-cache misses for some programs and many
for others.

Consider the worst case: every VM instruction is used in the
interpreterd program statically only once; then (the executed part of)
the interpreter is larger by a constant factor k than the native-code
version of the program (k should be about 2-3 if the interpreter has
many superinstructions). So if a program has few I-cache misses in
native code in a 4KB I-cache, it should have few I-cache misses when
interpreter in a 16KB I-cache, no matter how large the interpreter is.

Usually VM instructions are used several times statically, so the
usual case will be much better. Tom Pittman even once argued that
interpreters will be faster than native code due to this effect.

Anyway, the bottom line is: Getting the ionterpreter plus all support
libraries in the I-cache guarantees that I-cache misses are rare, but
does not necessarily result in best performance, since you have to
forego the benefits of having a large number of superinstructions for
that.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Efficient bytecode design and interpretation

anton@mips.complang.tuwien.ac.at (Anton Ertl)3 Jun 2001 17:04:08 -0400

anton@mips.complang.tuwien.ac.at (Anton Ertl)
3 Jun 2001 17:04:08 -0400