|[4 earlier articles]|
|Re: General byte-codes reference firstname.lastname@example.org (2000-12-11)|
|Re: General byte-codes reference Norman_member@newsguy.com (Norman Culver) (2000-12-18)|
|Re: General byte-codes reference email@example.com (2000-12-18)|
|Re: General byte-codes reference firstname.lastname@example.org (Pat Caudill) (2000-12-18)|
|Re: General byte-codes reference email@example.com (2000-12-20)|
|Re: General byte-codes reference firstname.lastname@example.org (2000-12-21)|
|Re: General byte-codes reference email@example.com (2000-12-31)|
|From:||firstname.lastname@example.org (Anton Ertl)|
|Date:||31 Dec 2000 03:03:02 -0500|
|Organization:||Institut fuer Computersprachen, Technische Universitaet Wien|
|Posted-Date:||31 Dec 2000 03:03:01 EST|
Norman Culver <Norman_member@newsguy.com> writes:
>It is possible to fit an entire interpreter into the L1
>cache (64 KB) of a 1 Ghz AMD but it won't fit into the 16 KB cache of
>a Pentium III.
That depends on the interpreter. E.g., the Gforth engine for the 386
architecture currently uses 16238 bytes. It contains more than 300
primitives; only a few of these are used frequently, see
I.e., 77 primitives make up the top 99% of the dynamic executions of
primitives, and in the benchmarks that this data is based on only 152
of the primitives were actually executed. I.e., the
frequently-executed part of the interpreter easily fits into 16KB.
Note that this is cumulated data over three benchmarks, the working
set for a stretch of time in one benchmark run will be biased towards
even fewer primitives.
Now, some of the primitives call library routines, and you might want
to include them in the interpreter size; however, there are only two
primitives in the top 99% that do calls (to long division routines and
to fwrite), and these make up for <0.6% of the executed primitives.
You may wonder what the other primitives are for: Many are for stuff
that you can do only as primitives in Gforth, even though they may be
rarely used, like performing the getrusage system call; many others
are for stuff that is not used in these benchmarks, e.g., FP.
Currently I cannot offer performance counter results for Gforth on the
Pentium-III, but the timings I have done indicate that the Pentium-III
is about as fast on Gforth as the Athlon of similar clock frequency,
for both small and large benchmarks. So the larger L1 cache does not
seem to give an advantage to the Athlon.
Taking a look at other people's work, [romer+96] show icache miss
cycles on a 21064 (8KB I-cache) for several interpreters; for the
MIPSI interpreter the I-cache miss cycles are less than 5% of the
total cycles for all benchmarks.
>The choice of byte codes is highly dependent upon the CPU architecture
Why do you think so?
M. Anton Ertl Some things have to be seen to be believed
email@example.com Most things have to be believed to be seen
Return to the
Search the comp.compilers archives again.