|language independent intermediate representation email@example.com (1997-05-08)|
|Re: language independent intermediate representation firstname.lastname@example.org (David Chase) (1997-05-12)|
|language independent intermediate representation Dave@occl-cam.demon.co.uk (Dave Lloyd) (1997-05-12)|
|Re: language independent intermediate representation email@example.com (1997-05-13)|
|From:||David Chase <firstname.lastname@example.org>|
|Date:||12 May 1997 00:18:07 -0400|
Amir Michail wrote:
> I am working on a project where I need to perform various analyses on
> a language independent intermediate format. I was looking into the
> gnu RTL and parse tree structures and I am not sure what to use. I
> will probably need to convert the structure into a program dependence
> graph or something similar. I will also need to convert the result
> back into source code (in a fixed language independent of the original
> source language).
You might look at ILOC, or whatever it has mutated into, which was/is
used at Rice University (I think it has been in use for about ten years
now). It is a low-level, RISC-like intermediate code. I've worked on a
couple of compilers now, and that is generally the way to go, EXCEPT:
1. you'll need a primitive of some sort for your constant-case-switch
statements (common to many languages, inscrutable if translated).
2. "structures" are going to give you grief. Going in and out of a
high-level language, where the target language is C, structures
can be a pain. If possible, forget they ever existed, and simply
do pointers and offsets.
3. use an infinite register set. Again, structures are a pain; do
they have "value" status, meaning that they are loaded and stored
from "wide" registers? There are two reasons to preserve structures,
one is that they can simplify your aliasing analysis a little (MAYBE),
and the other is that it is nice to use block copies to move them
around in the generated code. If you can write a general-purpose
recognizer for the structure movement idiom in your code generator,
you'd be better off.
A second choice is a sort of cleaned up abstract syntax tree. This
makes more sense if you wish to preserve more of the structure of the
input program; it's been used in the Rice vectorizer (and whatever it
mutated into) as well as some subsequent compilers written by ex-Rice
people (the Dana/Ardent/Stardent Fortran compiler, for one). There's
some troubles using ASTs with C, on account of the language is not
quite as block-structured as it appears (e.g., Duff's device).
I don't know if ANDF is a good intermediate format, the acronym stands
for Architecture Neutral DISTRIBUTION Format. The Java byte codes are
also a distribution format, and are not suited to analysis in that
form (they can be translated, of course).
[Last time I looked at ANDF, it was getting to be an awful lot like
obfuscated C, since they wanted to be able to use per-platform stdio.h
and the like. Ugh. -John]
Return to the
Search the comp.compilers archives again.