Re: extensible compilers

"N. D. Culver" <ndc@alum.mit.edu>
24 Mar 1998 22:37:55 -0500

          From comp.compilers

Related articles
extensible compilers mtrofimov@glas.apc.org (1998-03-18)
Re: extensible compilers ichisugi@etl.go.jp (ICHISUGI Yuuji) (1998-03-20)
Re: extensible compilers pardo@cs.washington.edu (1998-03-20)
Re: extensible compilers bruce@cenderis.demon.co.uk (Bruce Stephens) (1998-03-22)
extensible compilers mtrofimov@glas.apc.org (1998-03-22)
Re: extensible compilers ndc@alum.mit.edu (N. D. Culver) (1998-03-24)
RE: extensible compilers vic@paragraph.com (Zhukov, Victor) (1998-03-30)
Re: extensible compilers ndc@alum.mit.edu (N. D. Culver) (1998-04-03)
| List of all articles for this month |

From: "N. D. Culver" <ndc@alum.mit.edu>
Newsgroups: comp.compilers
Date: 24 Mar 1998 22:37:55 -0500
Organization: Atlantic Biomedical Engineering
References: 98-03-212
Keywords: design

I think that the way to do an extensible compiler/language is as
follows:


ASSUMPTION: a complete AST is retained after the parse and is
processed by:


0. A multi-pass front end that munches
      on the AST decribed in 1. below and
      emits something to a code generating
      back end.


1. Invent a set of AST nodes and branches
      that encompass all/most? of the semantics
      embodied in the computer languages we know.
      This would be a nice research project for
      a PHD candidate. How many computer language
      semantic categories have been produced in the
      last 50 years? I would guess that there would
      be less than 200 main node types.


2. Use a parser table generator that produces
      tables that include encoded actions that emit
      intermediate codes; Lrgen50 by Paul Mann comes
      to mind.


3. Write a parser engine that will save up the
      intermediate codes in a buffer and then proceed
      to process the codes when the original source
      is exausted. The grammar for the codes would have
      been added onto the grammar for the language
      so that the parser tables encode both languages.
      It is trivial to separate the two by simply using
      language illegal characters to identify the
      intermediate codes.


4. Invent a set of codes that can tell the parser
      engine to restructure and rename the AST nodes
      so that they can be adjusted from the natural
      form that the language grammer imposes into the
      'canonical' form that the front end in 0. can
      deal with.


5. After you get the grammar for the language working
      you would add code emission actions to the productions
      of the language. You would then write the grammar
      for the codes (which could have productions that emit
      codes ...) so that eventually the AST would be bent
      into the 0. form. I suspect that most of the type
      system could be built at this time.


6. Bingo, you have a very versatile system that
      embodies ALL of the variability in one grammar
      module so it should be reasonably extensible by
      the learned or by the not so learned if they are
      given some helpful graphical tools.


NOTE: The grammar that I am describing is not of the yacc or pccts or
JavaCC kind which embodies C code or 'object oriented' callouts in the
middle of the EBNF. IMHO that format is much too messy and hard to
read if you wish to accomplish step 5 above.


Norman Culver
[IMP72 did something like that 25 years ago. It used an Earley parser
so it could tolerate ambiguous syntax and let you add syntax on the
fly. Semantics were either expressed as macros or as calls to the
internal semantic routines. It worked, but had two big problems: one
was the "every program in a different language" problem, the other was
that it was very hard to do any optimization if you couldn't make any
assumptions about the source language. I suspect the optimization
problem would be more tractable now, but the syntax problem wasn't a
technical one. -John]




--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.