Related articles |
---|
Register Based Interpreter Model acampbellb@hotmail.com (Avatar) (2005-11-01) |
Re: Register Based Interpreter Model basile-news@starynkevitch.net (Basile Starynkevitch \[news\]) (2005-11-02) |
Re: Register Based Interpreter Model nkavv@skiathos.physics.auth.gr (Uncle Noah) (2005-11-02) |
Re: Register Based Interpreter Model christian.mueller@aktivanet.de (2005-11-02) |
Re: Register Based Interpreter Model spammer@b2z.com (Omri Barel) (2005-11-02) |
Re: Register Based Interpreter Model kurt-gmane@oakadaptive.com (Kurt Jung) (2005-11-02) |
Re: Register Based Interpreter Model anton@mips.complang.tuwien.ac.at (2005-11-02) |
Re: Register Based Interpreter Model fw@deneb.enyo.de (Florian Weimer) (2005-11-02) |
Re: Register Based Interpreter Model lhf@csgpwr1.uwaterloo.ca (2005-11-02) |
From: | anton@mips.complang.tuwien.ac.at (Anton Ertl) |
Newsgroups: | comp.compilers |
Date: | 2 Nov 2005 22:12:56 -0500 |
Organization: | Institut fuer Computersprachen, Technische Universitaet Wien |
References: | 05-11-016 |
Keywords: | interpreter, VM, bibliography |
Posted-Date: | 02 Nov 2005 22:12:56 EST |
"Avatar" <acampbellb@hotmail.com> writes:
>I am doing some research on the register-based interpreter model in
>comparison to the more traditional stack-based approach. I would like
>to know if anyone has had any practical experience with register-based
>interpreters
There are some register-based VM interpreters, and their implementors
should have experience, and sometimes have written about it. Some
that come to mind are Inferno [winterbottom&pike97], Parrot, and IIRC
the Lua VM. The last two are especially interesting, because
originally the language was implemented using a stack-based VM, and
then converted to a register-based VM. Moreover, there is the WAM for
implementing Prolog [vanroy94].
However, I still believe, that, if you want good performance, a hybrid
stack-based VM (hybrid because you usually do indexed accesses to
local variables) is a good idea, because:
- The main advantage of register VMs is fewer dispatches. If you
really want performance, you use dynamic superinstructions (aka
selective inlining) [piumarta&riccardi98,ertl&gregg03], which bring
the number of dispatches to the same level (one dispatch per control
flow change), eliminating the advantage gained from register VMs.
- It's easier to keep VM stack items in real-machine registers than VM
registers. Keeping one stack item in real-machine registers is
almost trivial, and can be very beneficial (factor 2 speedup on
PPC970). For more complex schemes, see our work on stack caching
[ertl&gregg04ivme,ertl&gregg05].
- Stack-based VMs are simpler to implement, leaving more time for
implementing highly effective optimizations like dynamic
superinstructions.
>or has any good suggestions for references about this
>topic?
Well, this has been discussed here several times, so you can look up
the old discussions. There is one paper that does a direct empirical
comparison [davis+03] (a journal version of this paper appeared
recently in Science of Computer Programming).
@InProceedings{davis+03,
author = {Brian Davis and Andrew Beatty and Kevin Casey and
David Gregg and John Waldron},
title = {The Case for Virtual Register Machines},
booktitle = {Interpreters, Virtual Machines and Emulators
(IVME~'03)},
pages = {41--49},
year = {2003},
url1 = {http://www.complang.tuwien.ac.at/anton/ivme03/proceedings/davis.ps.gz},
url2 = {http://portal.acm.org/ft_gateway.cfm?id=858575&type=pdf},
abstract = {Virtual machines (VMs) are a popular target for
language implementers. Conventional wisdom tells us
that virtual stack architectures can be implemented
with an interpreter more efficiently, since the
location of operands is implicit in the stack
pointer. In contrast, the operands of register
machine instructions must be specified
explicitly. In this paper, we present a working
system for translating stack-based Java virtual
machine (JVM) code to a simple register code. We
describe the translation process, the complicated
parts of the JVM which make translation more
difficult, and the optimisations needed to eliminate
copy instructions. Experimental results show that a
register format reduces the number of executed
instructions by 34.88%, while increasing the number
of bytecode loads by an average of 44.81%. Overall,
this corresponds to an increase of 2.32 loads for
each dispatch removed. We believe that the high cost
of dispatches makes register machines attractive
even at the cost of increased loads.}
}
@InProceedings{winterbottom&pike97,
author = "Phil Winterbottom and Rob Pike",
title = "The Design of the {Inferno} Virtual Machine",
OPTeditor = "{IEEE}",
booktitle = "Hot Chips {IX}: Stanford University, Stanford,
California, August 24--26, 1997",
publisher = "IEEE Computer Society Press",
OPTaddress = "1109 Spring Street, Suite 300, Silver Spring, MD
20910, USA",
year = "1997",
ISBN = "????",
OPTpages = "??--??",
bibdate = "Mon Jan 08 16:33:30 2001",
acknowledgement = ack-nhfb,
}
@InProceedings{piumarta&riccardi98,
author = {Ian Piumarta and Fabio Riccardi},
title = {Optimizing Direct Threaded Code by Selective
Inlining},
crossref = {sigplan98},
pages = {291--300},
url = {ftp://ftp.inria.fr/INRIA/Projects/SOR/papers/1998/ODCSI_pldi98.ps.gz},
annote = {They reduce the overhead of a direct threaded
interpreter by combining all the VM instructions in
a basic block into a single virtual machine
instruction. This is done by simply concatenating
the machine code for the virtual machine
instructions (except for the Next code). Multiple
instances of the same sequence are just translated
once, and then reused. They evaluated this technique
empirically in the context of a fine-grained
RISC-like VM, and in an Objective Caml
interpreter. The speedup over plain direct threaded
code for the RISC-like VM is a factor
1.33--2.06. For the Caml VM the speedups varied
with benchmark and processor, from 1 to 2.2. The
code expansion ranges from 2.2--4 for the Sparc,
with larger benchmarks having less expansion (due to
more reuse of sequences). Implementing this
technique on the Caml VM took only one day.}
}
@InProceedings{ertl&gregg03,
author = "M. Anton Ertl and David Gregg",
title = "Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters",
crossref = "sigplan03",
OPTpages = "",
url = "http://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz",
abstract = "Interpreters designed for efficiency execute a huge
number of indirect branches and can spend more than
half of the execution time in indirect branch
mispredictions. Branch target buffers are the best
widely available\mn{on all recent general-purpose
machines?} form of indirect branch prediction;
however, their prediction accuracy for existing
interpretes is only 2\%--50\%. In this paper we
investigate two methods for improving the prediction
accuracy of BTBs for interpreters: replicating
virtual machine (VM) instructions and combining
sequences of VM instructions into superinstructions.
We investigate static (interpreter build-time) and
dynamic (interpreter run-time) variants of these
techniques and compare them and several combinations
of these techniques. These techniques can eliminate
nearly all of the dispatch branch mispredictions,
and have other benefits, resulting in speedups by a
factor of up to 3.17 over efficient threaded-code
interpreters, and speedups by a factor of up to 1.3
over techniques relying on superinstructions alone."
}
@Proceedings{sigplan03,
booktitle = "SIGPLAN '03 Conference on Programming Language
Design and Implementation",
title = "SIGPLAN '03 Conference on Programming Language
Design and Implementation",
year = "2003",
key = "SIGPLAN '03"
}
@InProceedings{ertl&gregg04ivme,
author = {M. Anton Ertl and David Gregg},
title = {Combining Stack Caching with Dynamic
Superinstructions},
crossref = {ivme04},
pages = {7--14},
URL = {http://www.complang.tuwien.ac.at/papers/ertl%26gregg04ivme.ps.gz},
abstract = {Dynamic superinstructions eliminate most of the
interpreter dispatch overhead. This results in a
higher proportion of interpreter time spent in stack
accesses (on a stack-based virtual machine). Stack
caching reduces the stack access overhead. Each of
these optimizations provides more speedup, if the
other one is applied, too. Combining these
optimizations also opens additional opportunities:
we can insert stack state transitions without
dispatch cost; this reduces the number of necessary
VM instruction instances significantly. A
shortest-path search can find the optimal sequence
of state transitions and VM instructions. In this
paper we describe an implementation of static stack
caching employing these ideas. We also represent
empirical results for our implementation, resulting
in a speedup of up to 58\% over a version that keeps
one value in registers all the time.}
}
@Proceedings{ivme04,
booktitle = {Interpreters, Virtual Machines and Emulators (IVME '04)},
title = {Interpreters, Virtual Machines and Emulators (IVME '04)},
year = {2004}
}
@InProceedings{ertl&gregg05,
author = {M. Anton Ertl and David Gregg},
title = {Stack Caching in {Forth}},
crossref = {euroforth05},
pages = {6--15},
url = {http://www.complang.tuwien.ac.at/anton/euroforth2005/papers/ertl%26gregg05.ps.gz},
OPTnote = {not refereed},
abstract = {Stack caching speeds Forth up by keeping stack items
in registers, reducing the number of memory accesses
for stack items. This paper describes our work on
extending Gforth's stack caching implementation to
support more than one register in the canonical
state, and presents timing results for the resulting
Forth system. For single-representation stack
caches, keeping just one stack item in registers is
usually best, and provides speedups up to a factor
of 2.84 over the straight-forward stack
representation. For stack caches with multiple stack
representations, using the one-register
representation as canonical representation is
usually optimal, resulting in an overall speedup of
up to a factor of 3.80 (and up to a factor of 1.53
over single-representation stack caching).}
}
@Proceedings{euroforth05,
title = {21st EuroForth Conference},
booktitle = {21st EuroForth Conference},
year = {2005},
key = {EuroForth'05},
url = {http://www.complang.tuwien.ac.at/anton/euroforth2005/papers/proceedings.pdf}
}
@Article{vanroy94,
author = {Van Roy, Peter},
title = {1983--1993: The Wonder Years of Sequential Prolog Implementation},
journal = {Journal of Logic Programming},
year = 1994,
volume = {19,20},
pages = {385--441},
url = {ftp://ftp.digital.com/pub/DEC/PRL/research-reports/PRL-RR-36.ps.Z},
url = {http://www.ps.uni-sb.de/Papers/abstracts/SequentialPrologImp.html}
}
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html
Return to the
comp.compilers page.
Search the
comp.compilers archives again.