Txl revisited

hdev@cp.tn.tudelft.nl (Hans de Vreught)
Mon, 21 Mar 1994 16:52:06 GMT

          From comp.compilers

Related articles
Txl revisited hdev@cp.tn.tudelft.nl (1994-03-21)
| List of all articles for this month |

Newsgroups: comp.compilers
From: hdev@cp.tn.tudelft.nl (Hans de Vreught)
Keywords: tools, translator
Address: Lorentzweg 1, 2628 CJ Delft, The Netherlands
Organization: Delft University of Technology (TN FI-CP)
Date: Mon, 21 Mar 1994 16:52:06 GMT

Somewhere in January I reacted on a question of a person that wanted to use
Txl (a tree transformer) for some C++ transformations and he asked for user's
experiences with Txl. To summarize my reaction:
* Txl's parser can take ages to parse.
* Txl handles external functions (and the like) symbolically.
* Txl has an elegant syntax.


One of the Txl team members, Jim Cordy, reacted on my article and to make
a long story short, we have had a fruitful email discussion. At this
moment I like to present some of the bottom lines of that discussion.


WRT the first point left recursion is the trouble maker. TXL's naive
parser can be hopelessly slow on traditional Yacc-style grammars (it is
good practise to use left recursive rules in LALR grammars since that will
keep the stack small) and the TXL documentation does not properly warn
about or provide solutions to this problem. There is a way to get around
the problem of left recursion in Txl.


The second point is only important if you really want to use *many*
external functions: it isn't that easy to maintain and it is not as fast
as if the external stuff could be simply linked. TXL (intentionally)
provides no facilities whatsoever for externalization of trees and is not
well suited to applications in which it is necessary to manipulate the
trees outside TXL.


The third point is very important (to me in any case): Txl's syntax is
clean, simple, and easy to use. A *definite* advantage. This is actually
the main reason why we are going to use Txl for one of our projects over
here.


Finally, I like to mention that the Txl team encourage any user who
encounters trouble to contact them. I must say that they quickly respond.


-----------------------------------------------------------------------------


For the rest of the article I like to concentrate on the first point only,
substantiating my initial claim that Txl's parser can take ages to parse.
(But with the help of the Txl team I can now parse my programs quite fast.)


Although it might look that I'm not pleased with Txl from the rest of the
article, this is not the case. The fact that we actually are using Txl
should say enough (there are alternative tools available to us). Txl is
apart from the shortcoming mentioned hereafter a great tool to work with.


According to the Txl manual the parser is a top-down, fully backtracking
parser that can handle any context-free grammar. As you might know both
top down and fully backtracking parsers can be quite fast *even* for
general CFGs. However, in the case of Txl this is not the case.


To give you some indication, I trimmed down my grammar (which was exactly
the grammar of the definition of the language) such that it could only
derive expressions (string, relational and boolean included):


*BAD:*
E -> AE | E RelOp AE
AE -> T | AE AddOp T
T -> F | T MulOp F
F -> string | number | UnOp? id | UnOp? \( E \) (? = optional)


Parsing of "(a)=0" toke 13 minutes and locating the syntax error in "(@"
took 1 hour and 42 minutes on a micro Sparc. Quite disappointing for a
system that is said to be able to handle *any* CFG. Personnaly, I find
that handling also means that it can process its input in reasonable time.
The footnote in the manual that "Grammars with left recursion occasionally
require a long parsing time", is IMHO an euphemism...


So the question is whether left recusion can be avoided in a natural way
(I know how it can be done, but the transformed grammar ain't natural
anymore)? It can be avoided because Txl use a notation similar to EBNF.
The following grammar looks quite natural:


*GOOD:*
E -> AE (RelOp AE)^* (^* = Kleene star)
AE -> T (AddOp T)^*
T -> F (MulOp F)^*
F -> string | number | UnOp? id | UnOp? \( E \)


The improvement is enormous: parsing 1024 times "(a)=0" took 2.0 seconds
(it's probably less since the Txl spec has to be read in as well). As a
matter of fact after these transformations in our grammar, we were able to
process our programs in a dash.


Although in Txl you can use EBNF like notation, internally it is
translated to normal production rules. This means that the resulting parse
trees don't look like the ones you would expect from left associative
operations. This has some consequences on the tree transformations.
However, the transformations the Txl team supplied to handle left
associativity, were short, simple, and general (the transformations could
work for any left associative operator). Although this is still somewhat
annoying, the disadvantage is minor.


To round things up, I believe the best you can say is that Txl can handle
any non-left recursive CFG in an EBNF like notation. The fact that in
theory it can process left recursive grammars as well is IMO uninteresting
when you are trying to make use of that feature. A big red warning signal
should be included in the manual.
--
Hans de Vreught
J.P.M.deVreught@CP.TN.TUDelft.NL
Delft University of Technology
The Netherlands
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.