Mon, 9 May 1994 23:05:32 GMT

Related articles |
---|

compilers, in a nutshell ellard@endor.harvard.edu (1994-05-09) |

Re: compilers, in a nutshell chase@Think.COM (1994-05-09) |

Re: compilers, in a nutshell munk@prl.philips.nl (1994-05-10) |

Switch code generation, correction. chase@Think.COM (1994-05-10) |

Re: compilers, in a nutshell bill@amber.ssd.csd.harris.com (1994-05-11) |

Re: compilers, in a nutshell newburn@aslan.ece.cmu.edu (1994-05-12) |

Newsgroups: | comp.compilers |

From: | chase@Think.COM (David Chase) |

Keywords: | courses |

Organization: | Thinking Machines Corporation, Cambridge MA, USA |

References: | 94-05-018 |

Date: | Mon, 9 May 1994 23:05:32 GMT |

Only half a lecture on optimization, eh? (just kidding.)

I think what might be missing from your list (maybe needed as a

prerequisite, or perhaps implicit in the topic "what is a compiler") is

some notion of "equivalent program" and what things are irrelevant to

equivalence. Note that there's at least two common notions of equivalence

-- "I/O equivalence" (any two programs with same inputs and outputs and

termination are equivalent) and "debugger-observable equivalence" (any two

programs which perform the same set of steps are equivalent).

It should be clear by now (well, given acceptably good compilers) that

there's no difference between "a++;" or "a+=1;" or "a=a+1;". It should be

clear by now that "register" is a noise word in C. It should be clear

that a compiler ought to be free to move things in and out of memory, and

to reuse memory, and to heap-allocate stack frames if that suits the

implementor's purposes. It should be clear that a compiler can convert a

switch statement into a linear sequence of tests, or a binary search tree,

or a jump table, or a hashed lookup table, depending upon the speed of the

various operations in the underlying architecture.

The reason I suggest this is that people often understand their programs

(or worse, their programming language) at "too low a level", which tends

to limit their idea of what it is a compiler can do. At times, I've seen

people get into a tail-wagging-dog-mode where "the hardware does things

thusly (e.g., division) and thus the language must surely do what the

hardware does." At times this reaches the level of language standards --

signed integer division depends on an implementation-defined choice "lest

existing fast code be gravely slowed" (Rationale for the Ansi C standard

-- and just how fast was integer division running in 1989, anyway?)

=======================================================

By-the-way, the switch example makes a decent exercise on picky code

generation for superscalar machines (recall recent discussions of the

goodness or badness of "arrays of labels").

For SuperSparc, for instance, you get:

linear tests, test value in %o0, assume small integer cases:

cmp %o0,K1

be case1 ! cycle 1

cmp %o0,K2

be case2 ! cycle 2

...

cmp %o0,KN

be,a caseN ! cycle N

delay inst

<default>

This costs one cycle per branch, plus one for the (useless) delay slot

instruction when you finally branch. If you assume a uniform distribution

of cases, and none to the default, then N cases take (N+3)/2 cycles to get

you into the case code.

indexed table:

sub %o0,K0,%o0 ! Normalize to zero

sll %o0,2,%g1 ! (cyc 1)

cmp %o0,KN

sethi %hi(table),%g2

bgu default ! Use unsigned comparison hack (cyc 2)

add %g2,%lo(table),%g2 ! (cyc 3)

ld [%g1+%g2],%g1 ! (cyc 4)

! pipeline bubble, I think (cyc 5)

jmp %g1 ! (cyc 6)

nop ! Useless delay slot (cyc 7)

So, this runs faster (assuming uniform distribution, and no branches to

the default) in 7 cycles. For the linear case, you could do 10 tests in 7

cycles. On the other hand, this is much faster going into the default

case. (These numbers get worse for position- indendent code.) (You can

burn table space and avoid the normalize to zero if it thrills you, but it

only buys you about 1/2 cycle.)

binary search (assume cases 0-13)

cmp %o0,8

bge GE8 (cyc 1)

<linear tests for 0-7 (8 cases)>

GE8:

<linear tests for 8-13 (6 cases)>

In this case, the costs are

[8*(1 + 11/2) + 6*(2 + 9/2)] / 13

or (4*13+3*13)/13 = 7 cycles.

In general, that is

[M*(1+T(M)) + (N-M)*(2+T(N-M))] / N, where

T(X) is the time required to do an X-case switch for whatever method suits

you. The "1+" is the fall-through portion of the two-way split, the "2+"

is the branch taken portion of the two-way split.

So, given uniform distribution, no defaults taken (but still compiled for,

since that is what the language requires), we find that linear search

takes 7 cycles at N=10, jump tables always take 7 cycles, and this simple

two-way split takes 7 cycles at N=13.

Given this model, the even break between linear and binary search comes at

N=6, assuming that you split it into either 4 and 2 or 3 and 3

T(4) = 7/2, T(3) = 3, T(2) = 5/2, T(6) = 9/2

(2*9+9)/6 = 27/6 = 9/2

[3*(1+3)+3*(2+3)]/6 = 27/6 = 9/2.

On the other hand, if you use a deeper binary tree, you can get an

improved result for larger numbers. For example, if you take N=19, and

first split it into 12 and 7, and then split 7 into 4 and 3, and 12 into 7

and 5, you get:

T(7) = [4+4T(4)+6+3T(3)]/7 = (4+14+6+9)/7 = 33/7

T(5) = (5+3)/2 = 4

T(12) = (7+7T(7)+10+5T(5))/12 = (7+33+10+20)/12 = 70/12

T(19) = [12+12T(12) + 14+7T(7)]/19 =

(12+70+14+33)/19 = 6.79,

which is slightly better than the table lookup. Thus, when compiling

switches for the SS10, if you believe that the default is rarely taken

(i.e., if you are nuts) from 1 to 5 cases are treated with linear tests,

from 6 to 19 are treated with binary search, and 20 and more use table

lookup (assuming that they are dense).

NOW, if you assume a certain percentage of default cases, that changes

everything. OR, if you have any profiling feedback at all, that changes

everything.

Treating the hashed case is more entertaining yet, since the time spent

hashing is in competition with the binary search case (which, if you

haven't noticed by now, is more or less a fibonacci tree). The closed

form for the time spent executing one of these cases is a little messy,

both because they are fibonacci trees, not binary trees, and because they

have these little linear tails.

This, by the way, is code generation at an extraordinarily picky low

level. There's higher-level stuff that you could do, too. Constant

propagation is a good introduction. I'm a big fan of the SSA framework,

myself.

David Chase, speaking for myself

Thinking Machines Corp.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.