Re: expression rewrites - how to?

Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>
Thu, 3 Feb 1994 00:34:58 GMT

From comp.compilers

Related articles
expression rewrites - how to? intrepid!gary@uunet.uu.net (1994-01-31)
*Re: expression rewrites - how to? synaptx!thymus!daveg@uunet.UU.NET (Dave Gillespie)* (1994-02-03)**

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>
Keywords:	translator, parse
Organization:	Compilers Central
References:	94-02-011
Date:	Thu, 3 Feb 1994 00:34:58 GMT

Gary Funck writes:
> The source language (Pascal) and the target language (Ada) have
> differing operator priorities (and associativity). The translator
> must rewrite expressions from the source language into those
> of the target language, while preserving the order of evaluation
> expressed in the source language.

Our moderator outlined the basic answer in his postscript to your
message, but I thought I'd contribute some more background and a
few suggestions.

> An additional constraint that
> is desirable is to keep the parenthesization present in the original
> source text, if this parenthesization is adequate to preserve the
> order of evaluation when translated into the target language.
> Additionally, the expression should be rewritten with minimal
> parenthesization.

These two sentences are at odds with each other, so you'll have to figure
out what exactly you mean by them.

Basically, a good translator has to worry about the following four steps:

    1. Add parentheses when Ada requires them but Pascal doedn't.
    2. Remove parentheses which Pascal requires but Ada doesn't.
    3. Add optional parentheses to enhance clarity.
    4. Preserve unnecessary parentheses which were there for clarity.

Only step 1 is necessary for a correct translation. In my Pascal to C
translator "p2c" I found steps 2 and 3 to be a very good idea; I don't do
step 4 in p2c but I probably should. (I've had one or two comments about
this from p2c users in the past few years, but it's interesting that none
of them considered step 4 to be vital.)

I also faced this problem with Calc, a calculator/algebra system for GNU
Emacs that supports several "language modes" for algebraic formulas. It
can both parse and display in C, Pascal, FORTRAN, TeX, and a few other
languages. The "translation" happens implicitly when you enter a formula
in one language mode and then switch to another language mode for display.

I found the operator precedence parsing algorithm to work well in both
programs. Calc uses an explicit table-driven recursive operator
precedence parser. The table keeps a "left" precedence and a "right"
precedence for each operator. Calc uses the same table for both parsing
and formatting expressions.

The parsing algorithm looks like this; you ought to find something like it
in any compiler book:

parse_expr(prec)
{
    parse a "factor", e.g., variable name or literal;
    while (next-operator-left-precedence >= prec) {
        skip and save operator token;
        call parse_expr(the-operator-right-precedence);
    }
}

For left associative operators, the right-precedence is one greater than
the left-precedence; for right associating operators, the left-precedence
is one greater.

The display algorithm in Calc looks like this:

format_expr(expr, prec)
{
    if (expr is a factor)
        return formatted-factor;
    else if (prec > min(expr-left-prec, expr-right-prec))
        return "(" + format_expr(expr, 0) + ")";
    else
        return format_expr(left-subexpr, expr-left-prec)
                      + operator
                      + format_expr(right-subexpr, expr-right-prec);
}

P2c uses an ad-hoc algorithm for parsing, but its formatting algorithm is
essentially like Calc's.

The above routines will give you steps 1 and 2 of my outline but not 3 or
4. I've found it helps quite a bit to add some unnecessary parentheses as
described in step 3. For example, p2c adds the unnecessary parens in "(a
< b) == (c < d)" by default, and optionally also parenthesizes "(a * b) +
(c * d)" and "(a && b) || (c && d)". P2c uses ad-hoc heuristics to decide
when to add unnecessary parens; I worked out the heuristics just by trying
lots of things and seeing what looked "nice," although most of them would
be familiar to any experienced C programmer.

In this sense, you're actually pretty lucky going from Pascal to Ada,
since the basic precedence layouts are similar even if they differ in
details.

For step 4, you want to include explicit parentheses as a kind of
pseudo-operator. Be careful not to generate two sets of parentheses by
accident. Also, there is the tricky question of what to do about
parentheses that *were* necessary in Pascal but are not in Ada. Consider
the expression "(a = b) and (c = d)", in which the parens are required.
Translating to Ada, "a = b and c = d" is sufficient. I would be inclined
to remove the parens in this case and then allow step 3 to add them back
in if the extra-parens option is turned on. But you might be able to kill
two birds with one stone by preserving all parens, even if they were used
"under duress" in the Pascal. This strategy might preserve too many
undesirable parentheses, but I'll bet Pascal and Ada are similar enough
for you to get away with it.

Another case to watch out for, taking Pascal-to-C as an example, is
"if (a = b) then ...", which contains unnecessary parentheses that should
*not* lead to "if ((a == b)) ...". My C code generator emits the parens
around an "if" condition separately, not as part of the expression
generator. If I had p2c preserve parens, I'd have to do something special
to avoid this pitfall. I don't remember my Ada syntax well enough to say
whether or not this kind of thing crops up with Ada.

If other parts of your translator actually examine and modify expression
trees, watch out for funny interactions with explicit parentheses. For
example, if the array A is 1-based in Pascal but 0-based in C, p2c
translates "A[i+1]" to "A[i]", not "A[i+1-1]". If for some reason the
Pascal code said "A[(i+1)]", you'd have to ask whether it's better to
produce "A[(i+1)-1]" or "A[i]"; you probably *don't* want to produce
"A[(i)]". This example is kind of silly since people don't write [( ...
)], but there are plenty of other situations in p2c where this kind of
thing could plausibly happen. Again, Ada is much more Pascal-like than C,
so maybe this will never come up for you.

-- Dave
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: expression rewrites - how to?

Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>Thu, 3 Feb 1994 00:34:58 GMT

Dave Gillespie <synaptx!thymus!daveg@uunet.UU.NET>
Thu, 3 Feb 1994 00:34:58 GMT