Re: Source-to-source transformation: best approach?

Jim Cordy <cordy@cs.queensu.ca>
Tue, 14 Aug 2007 01:29:02 -0400

          From comp.compilers

Related articles
Source-to-source transformation: best approach? somedeveloper@gmail.com (SomeDeveloper) (2007-08-04)
Re: Source-to-source transformation: best approach? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-08-07)
Re: Source-to-source transformation: best approach? Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2007-08-11)
Re: Source-to-source transformation: best approach? idbaxter@semdesigns.com (2007-08-11)
Re: Source-to-source transformation: best approach? cfc@shell01.TheWorld.com (Chris F Clark) (2007-08-12)
Re: Source-to-source transformation: best approach? cordy@cs.queensu.ca (Jim Cordy) (2007-08-14)
| List of all articles for this month |

From: Jim Cordy <cordy@cs.queensu.ca>
Newsgroups: comp.compilers
Date: Tue, 14 Aug 2007 01:29:02 -0400
Organization: Compilers Central
References: 07-08-013 07-08-033 07-08-036
Keywords: parse, translator
Posted-Date: 15 Aug 2007 11:44:18 EDT

(Disclaimer: this will probably be TXL propaganda since I'm writing it.)


I think Ira's list of issues is a good one (Hi Ira!). One of the
problems in comparing these tools is they don't compare well - they're
made for different things. DMS is a complete framework for addressing
industrial scale transformations, and that's the kind of thing it's
used for.


TXL is a programming language, and is not in itself such a framework -
frameworks are built using it, and it is a good language for
expressing large parts (not all) of such frameworks. It is built
around multi-pass thinking, for the most part without single common
data structures. Frameworks built using it typically involve a
database or relational algebra component as well for that reason, and
utilize a variety of other standard tools best suited to tasks.


One of the problems with comparing frameworks is that the technology
of any really successful one is commercial and hidden from view. This
is true of DMS and it's also true of the frameworks that have been
successfully implemented to do the same kinds of things as DMS using
TXL, ASF+SDF and Stratego. There are high-level papers about
experiences, but they don't tell you how the details were handled, and
as Ira points out that's the hard part.


The production-quality grammars, translation frameworks and tools
built using TXL are proprietary, and you're never going to see them on
the TXL website or anywhere else public. Not to contradict Ira, but
TXL- based frameworks for Cobol, PL/I, RPG and Fortran have had no
problem handling real applications of thousands of files and tens of
millions of lines, and individual Cobol files of 500,000 lines are
parsed by TXL in about 20 seconds. TXL also appears embedded in some
popular commercial software tools for C and C++, but again the
grammars are proprietary and you're not going to see them in public.


To me parser technology issues can to some extent be a red herring, in
the sense that you can make any parser technology work for any
language you want to work with, it's only a question of how many
kludges you're willing to add to correct for the limitations of the
method. But the general parsing methods such as GLR (DMS, SDF,
Stratego) and LL(inf) (TXL, Spirit, ANTLR (almost)) definitely seem to
make things a lot easier than Yacc and other traditional LALR and LL
(1) methods, because they can deal with ambiguity and context
dependency conveniently and directly.


GLR and LL(inf) are both general and differ mostly in how they handle
ambiguity resolution - GLR (at least in theory) generates all the
possible parses and then pares them to the one(s) you want using
contextual disambiguation rules, whereas LL(inf) methods (at least in
theory) explore all the possible parses in a priority-ordered fashion
until they hit the first one that works (or another one if you force
it). Of course the devil's in the details, and actual implementations
optimize these general ideas to be highly efficient in practice using
both automated and manual means.


Perhaps another important question though is how you express your
translations once inputs are parsed. Some tools and techniques use
"concrete syntax" patterns, some use abstract syntax trees, and some
use term algebras. The notation matters in that it can help you
understand what you're doing, and help you avoid mistakes. There are
some interesting differences - for example, TXL transformation rules
constrain you to grammatically correct replacements, but ASF allows
you to create arbitrarily shaped replacements. The former prevents
you from making syntax errors in the result at the cost of having to
live within the grammar forms, whereas the latter frees you from the
grammar forms at the cost of possibly malformed results. Depending on
your background and personality, you may find that you prefer one or
the other of these ways of expressing things. While we're on this
topic, I should mention that TXL is actually perfectly happy to work
with two separate grammars - there's no need to "union" them if you
don't want to in a translation, that's just one of the paradigms you
can use with the language.


If you are facing a big project needing commercial-grade translation
with support, you probably want to look into a company offering a
mature framework like DMS and invest in the learning that Ira
suggests. If you're looking for an in-house tool, a one-off
consulting project, a rapid prototype or (heaven forbid) you have in
mind to design your own framework, you might want to go it on your own
with something like TXL or Stratego that will get you 95% in a hurry
that you can correct by hand. In my experience if you want the last
5% though you're going to end up having to put in the learning one way
or another.


Jim Cordy


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.