Re: A Compiler for a Functional language ?

Ray Dillinger <bear@sonic.net>
6 Apr 2002 23:43:54 -0500

          From comp.compilers

Related articles
A Compiler for a Functionel language ? ztig@mail.tele.dk (Peter Hansen) (2002-03-31)
Re: A Compiler for a Functional language ? sarah@telergy.com (2002-04-06)
Re: A Compiler for a Functional language ? haberg@matematik.su.se (2002-04-06)
Re: A Compiler for a Functional language ? gdm@gedamo.demon.co.uk (George Morrison) (2002-04-06)
Re: A Compiler for a Functional language ? bear@sonic.net (Ray Dillinger) (2002-04-06)
Re: A Compiler for a Functional language ? journeyman@compilerguru.com (2002-04-06)
| List of all articles for this month |

From: Ray Dillinger <bear@sonic.net>
Newsgroups: comp.compilers
Date: 6 Apr 2002 23:43:54 -0500
Organization: Compilers Central
References: 02-03-179
Keywords: functional
Posted-Date: 06 Apr 2002 23:43:54 EST

Peter Hansen wrote:
> I am currently trying to create a compiler / interpreter for a Functionel
> language. However i seem to be getting stuck right after having created the
> abstract / concrete syntax of my little language. Can anyone tell me what
> the next step is going to be, because i can't seem to figure out where to go
> next.


Well, the next thing you need to do is decide what the program's
running image looks like in memory. Often this will be something like
a "root set" block of memory (which is pointed to by one of the
registers) that contains pointers to blocks of machine code and
pointers to blocks of data. For C-like (simply stack disciplined)
languages, there will be a rule about the CPU's stack pointer pointing
at the stack segment of the running routine (which contains a value to
set the stack pointer to when the stack segment of the running routine
is "popped"). If your stack discipline is otherwise, you may have one
or more stacks allocated out of "heap" memory as linked lists of
invocation frames.


Depending on the semantics of your procedure calls, the individual
code blocks that are pointed to from the "root set" block may be
procedures; alternatively, they may also be groups of several
related procedures, or vectors of bytecodes that serve as data
for a bytecode interpreter written in machine code.


With a basic model of what a program looks like roughed out, you
will have enough information to figure out what instructions the
machine needs to carry out when, eg, making a procedure call or
returning from a procedure, putting arguments on the stack and
getting them off, creating a new code or data segment and putting
a pointer to it into the "root set", getting rid of a code or data
segment, etc.


At this point, you are going to iterate from designing the running
image of an arbitrary program back to how to do the things you need to
do, until you have a model of a program that works for your language
and in which there are clear ways to do everything you need to do. It
may take a while to develop this well, but it's time well spent and
the better you know where you're going now the more trouble you can
avoid later.


Once that's done, and you know what the image of the program that
you want to load into memory looks like, you will need to start
finding ways to map program constructs onto that model. Usually
this is done with an output function that has state. You traverse
the parse tree that you've built of the program, and at each node
you will call your output function. Depending on the values you
call it with, your output function will have to take various actions;
these actions include modifying its state, outputting machine code
instructions, moving from one "segment" to another in the output
file, etc.


The biggest item of state that the output function needs is the
Symbol Table. It has to contain the names of all the variables and
all the procedures, along with the basic information you will need
to know about them to write machine code for the program code that
refers to them. Usually this means at least the type and some kind
of specification of where the value will be located at runtime at a
bare minimum. Symbol tables can become arbitrarily complex depending
on what you want them to do. In most languages, symbol tables will
also need scope information because there may be a dozen different
variables all named "count" that have different scopes. In languages
that have constants, you'll need a way to tell a constant from a
variable. And if your language supports user-defined types, the
type specifiers will need to be up to the task of referring to these
defined types. If you know the basic program model, or how all the
parts of the program will be represented when the program is running,
then the Symbol Table is the next thing you work on after you get
the lexing and parsing done.


Note that the root set block at runtime will often contain a lot
of the same information that you are gathering in the symbol
table at compile time. So design the symbol table very carefully,
keeping in mind both what information the output function will need
to write code during compile time, and what information the program
you're compiling will need at runtime. You'll need to figure out
how and where you can get all of that information from the source
code.


It may be easier if you traverse the parse tree more than once: On
the first pass, you can have your output function build its symbol
table. On the second, you can have your output function build
images of the machine code it's going to output, with pointers at
entries in the symbol table substituted for pointers at where the
data's actually going to be. After the second pass, the output
function can tell how big the segments are going to be, so it can
decide where exactly all the data will be stored and substitute
actual pointers to the data's location before it finally writes
the executable to a file.


Note, this is a simplified explanation. I've left out a lot of
stuff that depends on your operating system, like linking and
loading and resolving system calls. And I've left out optimizing
the output. And I'm guilty of using the words "output function"
to hide about forty-seven acres of complexity that nobody would
ever really implement as a single function in code.


But this is the overview of the basic mission you've taken on and
what needs to be accomplished for that mission to succeed.


Good luck.


Bear


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.