LL(1) translation problem

jan@si.hhs.nl (Schramp)
Mon, 10 Feb 92 16:06:18 +0100

From comp.compilers

Related articles
grammar rule rewriting question jos@and.nl (Jos Horsmeier) (1992-02-04)
*LL(1) translation problem jan@si.hhs.nl* (1992-02-10)**

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	jan@si.hhs.nl (Schramp)
Keywords:	LL(1), parse, question
Organisation:	Sector Informatica, Haagse Hogeschool, The Hague, The Netherlands
Organization:	Compilers Central
References:	92-02-017
Date:	Mon, 10 Feb 92 16:06:18 +0100

jos@and.nl (Jos Horsmeier) writes:
>Bluntly stated: I have a problem. I'm currently working on a grammar rule
>rewriting program. One feeds it with a description of a CF grammar and it
>tries to rewrite the rules into an LL(1) form. This means:

>1: Removal of all epsilon rules
>2: Removal of all (indirect) left recursion
>3: Left factoring of all the necessary rules.

>So far, life is all quite simple. But, and here's my problem: some of the
>terminal symbols are, what I call `static semantic' symbols. These symbols
>contain snippets of actual programming language statements. Still not much
>of a problem, but these statements contain references to other (non)
>terminal elements of the grammar rules.

> ...

>There's more to it: if an epsilon rule contains a semantic token, I cannot
>remove that rule (I would simply lose the semantic actions then.) But if I
>can't remove the rule, I cannot rewrite the left recursive rules! On the
>other hand, if I *do* remove such an epsilon rule and transfer the
>semantics to all the other rules which have the LHS of this rule somewhere
>in their RHS (read this again ...), I might end up with semantic tokens,
>completely ripped out their context. Ok, enough whining ...

With respect to your question some remarks:

- Removal of all epsilon-rules is not neccesary. I suppose your wish to
remove epsilon rules derives from the restriction that the algorithm
metioned in the Dragon-book only works for epsilon-free grammars.
Consider that just a simplification of the problem.

To see the problem look at the next grammar fragment:

A -> A B
A -> C

The standard solution after removing left recursion is:

A -> C A'
A' -> B A' | epsilon

The problem is that B might derive epsilon, in which case you're stuck
with left recusion of the form A' -> A'.

So you have to make sure B doesn't derive epsilon!

Now what if B does indeed derive epsilon?

Then you substitute B by all of its right sides. Of course this raises the
number of A-rules. Then reconsider all left recursive right sides of
A-rules and repeat the process of substituting the nonterminal symbol to
the right of A in case this whole (!) string derives epsilon.

In the end you're stuck with some offending rules of the form:
A -> A
and/or
A -> A {any of your actions}.

Rules of the form A -> A can be left out without harm, they contribute
nothing to the language. Rules of the form A -> A {action} define an
infinite repetition of {action} and require a better grammar, throw them
away too.

Blindly removing all epsilon-productions however is of no use as the
standard solution introduces new ones.

- A simpler solution to your problem might be to use a "translation
grammar" to define translation to a kind of postfix code, to build an
abstract syntax tree from the postfix code and to hook on actions to the
nodes of the tree. I'am presently working on that kind of approach.

To be more specific:

A translation grammar is a CFG with two sets of terminal symbols: input
terminals and output terminals. As such it defines some intermediate
language with sentences built from both sets of terminals. Leaving the
output terminals out of a sentence gives a sentence of the input language,
leaving the input terminals out of a sentence defines the corresponding
output sentence. This way the translation grammar defines a language to
language translation.

A translation grammar can be modified by all transformations that leave
the intermediate language invariant. These include substitution, removal
of left recursion, and factorization among others.

A table-controled LL(1) parser needs only a minor modification to
translate according to a tranlation grammar: When a output terminal
appears on the top of the stack, then pop it and send it to the output.

As an output language I prefer an extended form of binary abstract syntax
tree. It should not just be used for expressions, but also for other
constructs like lists and structured statements.

An AST (abstract syntax tree) can be constructed with a variant of the
commonly known stack-evaluator for expressions. This variant uses a stack
of AST's instead of a stack of values. It's action is mainly:
- given any operator, (don't take this to strictly)
- pop two AST's from the stack, (binary AST's!)
- create a single tree from the operator with the two trees as children,
- push this AST on the stack.

Define actions as recursive evaluators of (parts of) the AST. This leaves
room for inherited and synthesized attributes within the AST.

What we win with this approach: Translation becomes completely independent
of the messed up LL(1) grammar you obtain by automated transformation.
The AST you get, can reflect associativity and precedence of operators in
a natural way.

Current state of affairs:

- I've completed a working system acceptiong regular right-part
translation grammar as input.

- I'm currently working on a version that models the right-parts of the
productions of each nonterminal in a finit-state-acceptor fashion. That
way factorizing becomes equivalent to making it deterministic and
minimizing the number of states. That way elimination of left and right
recursion can be obtained by leaving out the left or right recursive
transitions, replacing them by an epsilon-transition. For instance an
epsilon from the final state of the fsa to the end-point of the offending
left recursive transition. Likewise a rightrecursive transition can be
replaced by an epsilon from its startstate to the startstate of the fsa.
Next you have to make it deterministic and to minimize the number of
states. Other issues: elimination of equivalent fsa's in such a scheme.

I'm not sure whether this is to concise for you readers around the world.
You may mail me with your questions.

Jan Schramp.

--
Jan Schramp, Haagse Hogeschool - Sector Informatica. Louis Couperusplein 19,
2514 HP Den Haag, Netherlands. E-mail: jan@si.hhs.nl
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

LL(1) translation problem

jan@si.hhs.nl (Schramp)Mon, 10 Feb 92 16:06:18 +0100

jan@si.hhs.nl (Schramp)
Mon, 10 Feb 92 16:06:18 +0100