Converting C to C++

jhall@whale.WPI.EDU (John Clinton Hall)
Sun, 14 Nov 1993 06:45:02 GMT

          From comp.compilers

Related articles
Converting C to C++ jhall@whale.WPI.EDU (1993-11-14)
Converting C to C++ (1993-11-16)
Re: Converting C to C++ (1993-11-17)
Re: Converting C to C++ (1993-11-22)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jhall@whale.WPI.EDU (John Clinton Hall)
Keywords: C++, C, translator, question, comment
Organization: Worcester Polytechnic Institute
Date: Sun, 14 Nov 1993 06:45:02 GMT

For my senior project, I am developing a program to convert C to C++. I
have a working C parser, and I am adding on to it code to build an
intermediate representation of the input source. However, I am wondering
if a "traditional" AST is the way for me to go.

The way my conversion rules are written, conversion to C++ keeps the guts
of each function relatively unchanged. Basically, structures are
converted to classes, each function is associated with a class, and
references to structure variables inside each function are changed. (For
example, if a function becomes associated with a class Queue and the
function takes a parameter struct Queue *q, that parameter is removed from
the function's parameter list and references to the tail (for example) via
"q->tail" are changed to just "tail," since the structure's members are
now data members of the class. It's actually a little more involved than
this, but that's the gist of things.)

My point is that the output C++ code is much like the input C code, except
for a few changes to references to structure members.

I'm wondering what to use for my intermediate representation: a
"traditional" AST or a flat list of tokens? An expression such as "x1 =
(a + bb) * 12;" is translated to the following AST:

/ \
x1 *
/ \
+ 12
/ \
a b

Although this form is great if you want to generate assembly code, I don't
think it is the best form for me to use to convert C to C++, basically
because information is lost. Not only have we omitted the semicolon at
the end of the expression (which would not be that hard to regenerate),
but the parentheses are gone. In order to regenerate the code, I would
have to have some algorithm that compares the precedence of operators and
decide whether or not to insert parentheses. It would also make iterating
through the function's code more difficult.

This seems like a lot of work to output something that looks a lot like
the input.

I think a better idea would be to have for each function a linked list of
the tokens in the function. The above expression would then look like

"x1" -> "=" -> "(" -> "a" -> "+" -> "bb" -> ")" -> "*" -> "12" -> ";"

Does anybody see a problem with using this as my intermediate
representation? Of course I would store this list of tokens only with the
functions to which it belongs. (I would not create a list of tokens of
the entire program!) During my research on C to C++ conversions, I never
found anything discussing a _compiler_ that did the conversions; the
articles only discussed general methods. Does anyone know of any relevent
research I could look into?
[I would concur that it's probably better to cheat than to parse the whole
thing. My inclination would be to store strings of tokens, probably attaching
to each token the white space and comments that followed, so you can
reconstitute something that looks like the original. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.