Converting C to C++

heinrich@gazoo.dvs.com (Kenn Heinrich)
Tue, 16 Nov 1993 15:06:20 GMT

          From comp.compilers

Related articles
Converting C to C++ jhall@whale.WPI.EDU (1993-11-14)
Converting C to C++ heinrich@gazoo.dvs.com (1993-11-16)
Re: Converting C to C++ apardon@rc1.vub.ac.be (1993-11-17)
Re: Converting C to C++ pkl@mundil.cs.mu.OZ.AU (1993-11-22)
| List of all articles for this month |

Newsgroups: comp.compilers
From: heinrich@gazoo.dvs.com (Kenn Heinrich)
Keywords: C++, C, translator
Organization: Compilers Central
References: 93-11-089
Date: Tue, 16 Nov 1993 15:06:20 GMT

John Clinton Hall <jhall@whale.WPI.EDU> writes:


> For my senior project, I am developing a program to convert C to C++. I
> have a working C parser, and I am adding on to it code to build an
> intermediate representation of the input source. However, I am wondering
> if a "traditional" AST is the way for me to go.




A while ago I wrote a translator for a logic device description language
that added some more features to the language. The silicon vendor
supplied a compiler that would understand flip-flop declarations like:
dff( q_out, d_in, clk);
but would not understand an arbitrary declaration
dff( q_out_bus[7..0], d_in[7..0], clk);


I wrote a parser that added this and other enhancements to the source
language and spat out the original simple language. To do this I wound up
using both AST's and text string approaches. I started by tokenizing the
complete file, parsing the tokens, and building data structures:


struct expression_or_declaration {
char *start_whitespace, *useful_text, *end_whitespace;
};


This allowed me to have an output file that was just as readable as the
input, e.g. the comments were preserved (though repeated), the tab
alignment was pretty, etc. It made the parsing (I used recursive descent
BTW) really fiddly, because in some places I needed to lookahead for a
syntactically significant token ( equal sign, identifier, etc) but in
other places I wanted to stop the production at the first whitespace, or
first newline, or first keyword following an optionally missing semicolon.


This resulted in all sorts of mode flags in the lexer, and all sorts of
duplicated semantic match functions, one for whitespace-or-certain-token,
another for certain-token-only, and so on.


Then I decided I wanted to do some boolean transformations on the
equations at the same time, and tried converting the program to run on
AST's while preserving the whitespace. This was a disaster because I had
to find clean ways of inheriting/synthesising whitespace. Couldn't find
them.


Now the translator runs on AST's and will only preserve comments in very
specific contexts. I gave each operator a numeric priority, which was
used to decide whether to put brackets or not around a subtree when I
output the AST as a new text equation. The output routine is actually
quite compact.


I used a hard-and fast rule for regenerating spaces (one before and after
each operator, and parenthesis), and a hard-and fast semicolon after each
equation (the original language had optional semicolons). The only
exception is due to a parsing quirk in the original vendor's compiler.


My point is that treating the program as text and linear-lists will only
let you go so far with your transformations, but may allow a greater
fidelity of reproduction (imagine a compiler with 96 dB SNR :-), while an
AST will probably let you do more exciting things to the program you are
translating. If your app is never going to grow, you may well be best off
using the non-AST method.


Hope this helped some,
Kenn.
------------------------------
heinrich@dvs.com
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.