Mon, 29 Dec 2014 13:30:10 -0800 (PST)

Related articles |
---|

Writing A Plain English Compiler gerry.rzeppa@pobox.com (Gerry Rzeppa) (2014-11-04) |

Fundamentals for a Compiler and Language (was: Writing A Plain English federation2005@netzero.com (2014-12-29) |

From: | federation2005@netzero.com |

Newsgroups: | comp.compilers |

Date: | Mon, 29 Dec 2014 13:30:10 -0800 (PST) |

Organization: | Compilers Central |

References: | 14-11-004 |

Injection-Date: | Mon, 29 Dec 2014 21:30:10 +0000 |

Keywords: | design |

Posted-Date: | 29 Dec 2014 18:09:55 EST |

On Tuesday, November 4, 2014 10:27:29 PM UTC-6, Gerry Rzeppa wrote:

*> [And with that, this thread is over unless someone has something*

*> to say that's related to compilers. Thanks, all. -John]*

*> Exactly how does one go about writing a Plain English compiler?*

What you should be focusing on (and to sorta abide by the moderator's wish) is

what goes on in the semantics, and on the back end more so than how the

language is presented to someone.

That means that the "fundamentals"

*> 1. Type definitions;*

*> 2. Global Variable definitions;*

*> 3. Routine Headers;*

*> 4. Conditional Commands; and*

*> 5. Unconditional Commands.*

have to be properly addressed; and first you have to ask whether these really

are the fundamentals, and if not, then what they should be.

In fact, you can get away with just the following for the procedural element:

* the comma statement A, B (do A, discard its results, then do B [with

results])

* the conditional A? B: C. The comma can be treated as a special case A, B

= A? B: B.

* recursive definitions of continuation or call-return segments. Doing

things by continuations is preferrable since you can define call-return

semantics in terms of it, but not so easily the other way around.

* the ability to reference continuations (i.e. gotos).

Fortran actually headed in that direction early on (as did the scripting

language DC), but decided to fall back in more recent years.

The distinction between locals and globals has to go one level deeper: to

include also the distinction between thread-local and shared.

Finally, as for type systems: it appears that contemporary languages have been

progressively trying to incorporate more and more of the Curry-Howard

isomorphism (or "Type-Proposition" correspondence), and ... in particular ...

the various modifications that try to this correspondence to include

quantifiers in predicate logic (which then requires indexed or parametrized

types). These are embodied by the various higher-order type theories that have

cropped up since Carnac and Goedel (Goedel's Dialectica interpretation,

Martin-Loef, Carnac's systems B and C, Lambek/Scott's formalism). I think C++

was moving in the direction of Hindley-Milnor type theory.

This aspect of the language design -- which is (in fact) the aspect that bears

the greatest affinity to natural language (think Montague semantics) -- is

highly non-trivial. This is borne witness by the large number of ways that

different language have tried to approach this issue.

In particular, what's non-trivial about it is that objects which can be named

and addressed can be contained in or contain other objects. Fortran actually

went a step further allowing *overlapping* objects (by way of the equivalence

statement). Languages like C++, in contrast, seem to shy away from this.

The Lambda operator *can* be incorporate *seamlessly* into any of these

languages provided you recognize the continuation for what it really is: an

infinitary lambda expression (i.e. an expression with an infinitary parse

tree.) So, for instance, the cyclic statement (while (E) S) followed by a

continuation Q would correspond to the infinitary expression E? (S (E? S (E? S

(...) Q) Q) Q, which is called "rational" since it has only a finite number of

distinct subexpressions. A goto label is simply a label of a subexpression,

and a goto statement is just a compact reference to the labelled

subexpression.

This works quite well with continuation semantics, since the call and return

are already factored out by the time you get to this semantics. All you have

are jumps. So, I won't belabor how you represent calls and returns in

continuation form.

The correspondence goes all the way back to Landin, via the following

equivalence for assignments to simple variables (x = A, Q) <-> (lambda x.Q) A,

where Q is a continuation. But Landin never incorporated the above-mentioned

insight on cyclic control-flow structures, so the whole programme languished

in the 1960's.

Ultimately, the correspondence is of the following form: to each statement S

is a context S[] (i.e. an infinitary expression with a missing subexpression)

such that S[Q] is the result of applying Q after executing S. Thus, for

instance, the context corresponding to the simple assignment statement x = A

is just (lambda x.[]) A.

But the non-trivial part: the type system, the objects residing within it and

pointers. What's non-trivial about is what goes on with the object-subobject

hierarchy; i.e. what does lambda V[I].Q mean, for instance, for the I element

of the vector V? There is, in fact, an facility native to the lambda calculus

for addressing this very issue; the equivalence

lambda V[I].Q = (V[I] = [], Q)

= (V = (lambda J.(I == J? []: V[J])), Q)

= (lambda V.Q) (lambda J.(I == J? []: V[J]))

which effectively treats the structured object as a function applied to its

"designators" (here, the index [I]) and the assignment as a partial update to

the function itself.

A good translation method will be one that treats *all* instances equivalent

to this type of statement the same (no matter whether they come from an

assignment to a designated subobject or not) while avoiding any unnecessary

structure-wide copying of unchanged elements.

A proper account of how pointers fit into this meshes completely with the

question of how the inheritance and object/subobject hierarchies are defined

in the language. To each object type T, the corresponding pointer operator *

is actually a universal object (i.e. the type-cast pointer operator *(T *)) so

that if P is a pointer variable, then P is the designator that bears the same

relation to *P that the index I bears to component V[I]. This fits snugly into

the type/subtype hierarchy in such a way that if T' is a derived from T, then

the operator *(T' *) is a subobject of *(T *).

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.