Re: Writing Grammars

"Mark" <>
31 Jul 2002 01:15:44 -0400

          From comp.compilers

Related articles
Writing Grammars (Tim Newsham) (2002-07-25)
Re: Writing Grammars (Ira Baxter) (2002-07-31)
Re: Writing Grammars (Mark) (2002-07-31)
Re: Writing Grammars (Peter H. Froehlich) (2002-07-31)
Re: Writing Grammars (Tim Newsham) (2002-08-04)
Re: Writing Grammars (SLK Parsers) (2002-08-04)
Re: Writing Grammars (VBDis) (2002-08-10)
| List of all articles for this month |

From: "Mark" <>
Newsgroups: comp.compilers
Date: 31 Jul 2002 01:15:44 -0400
Organization: University of Wisconsin - Milwaukee, Computing Services Division
References: 02-07-118
Keywords: parse, syntax
Posted-Date: 31 Jul 2002 01:15:44 EDT

"Tim Newsham" <> writes:
>It seems pretty typical in compiler (and other parsing) implementation
>that you would:
> - start with a grammar of the language
> - modify the grammar to be easy to parse (ie. LALR(1))
> - add rules to build an abstract syntax tree from the
> resulting parse (either the parse tree, or as actions
> during the parse process).
>My question is: Are there any tools or research into tools to help
>automate the process of going from step 1 to step 2.

The key is to recognize that parsers don't work with grammars for
context-free languages, but grammars for SSDT's (simple syntax
directed transductions). The latter comprise the class of
context-free subsets of
T = X* x Y*

X = input alphabet, Y = output alphabet.

A point of a parser is to translate inputs to outputs by finding the
set T(w) = { v in Y*: wv is in T } of all outputs that corresponding
via T to the word w.

As long as you recognize that you're actually working with SSDT's and
not CFL's, then you're free to do any algebraic manipulations you want
on the corresponding grammars.

One only says they're parsing a CFG via one of its canonical
extensions to SSDT grammars:

Bottom-Up SSDTG:
      Every rule n -> w is modified to n -> w reduce_nw
      New start rule: start -> S accept

The output alphabet is
        Y = {accept} union { reduce_nw: n->w is a rule in the CFG }


Top-Down SSDTG:
      Every rule n -> w is modified to n -> node_nw w
      New start rule: start -> root S

Output alphabet:
      Y = {root} union {node_nw: n->w is a rule in the CFG }

The concern you may commonly hear about "losing the 'structure' when
transforming a grammar" actually pertains to the fact that different
CFG's for the same CFL give you different canonical extensions to
SSDT's, whereas it's the SSDT's, themselves, that's relevant to
parsing not the CFL's.

So, then, any transformation which can be derived from the underlying
algebra is valid -- as long as you keep in mind that you're
transforming the entire SSDTG, not the base CFG.

A grammar is a system of inequalities over the algebra. This is true,
regardless of what type of grammar you're talking about: Chomsky type
0, 1, 2 or 3; grammars for languages (subsets of X*), transductions
(subsets of X* x Y*), nary relations (X1* x X2* x ... x Xn*), or even
over more general word algebras (even groups, like SU(2), SL(2,C) or
Z5, etc.)

One variable, S, is designated the "main" variable. The other
variables, along with S, form the set N of variables of the grammar.
These correspond respectively to the "start" symbol and

Write the non-terminals as a set N = {S = n0, n1, n2, ... }, there can
be many solutions (n0 = N0, n1 = N1, ..., nk = Nk) to the system. But
one, alone, will stand out as the "least" solution -- that is, the
least with respect to all the variables.

The language (or transduction or relation, etc.) is that part of the
least solution corresponding to the main variable n0.

The possible rules for transforming a grammar are those of
transforming the corresponding system. A transformation should yield
the same least solution to n0. These may change the set N.

A stronger type of equivalence is where the set N is preserved and the
least solution for ALL the variables is preserved. This provides you
with a more restricted set of possible transformations.

Reference is made to an ordering relation, with the rule n -> w
corresponding to the inequality n >= w. That means the underlying
algebra is partially ordered and embodies the rules associated with a
word algebra, itself:

Axioms: x1 = x = 1x; (xy)z = x(yz)
1 <-> "empty word"
xy <-> "concatenation of x and y.

For regular languages, an algebra is already well-known: the regular
expressions. The ordering relation is
E >= F <==> set(E) includes set(F),
where set(E), set(F) are the sets corresponding to the regular expressions
E and F.

The key property of the ordering is:

Every A-type subset has a least upper bound (sum A).
If U, V are A-type subsets, then
sum(UV) = sum(U) sum(V)
                          where UV = { uv: u in U, v in V }.

You get different types of algebras depending on what you define as
"A-type", with the following table:

A Algebra
Finite subsets Dioid
Rational subsets *-continuous Kleene algebra (*KA)
Context-free subsets (Unnamed)
Turing subsets (Unnamed)
Countable subsets omega-continuous KA (omegaKA)
                        All subsets Quantale

Conway presented 3 algebras (two called N-algebra and S-algebra) which
are the ones corresponding to *KA, omegaKA and Quantale, but I forget
which is which.

A-type subsets in general must include all finite subsets and be
closed under products of sets in order for the definitions above
to make sense and in order to define the usual operations:

0 = sum {} <--> "fail"
a+b = sum {a,b} <--> "non-deterministic branch" (a | b).

A grammar rule
S -> T | ST
for instance, then corresponds to the inequality
S >= T + ST,
which is equivalent (for illustration's sake) to
S >= T + ST + 0.

If the family A includes all rational operations (as all those do,
which are listed at and under "Rational subsets"), you can also define

a* = sum {1,a,aa,aaa,...} <--> "repetition"

So, with that background, you can formulate transformation rules as
theorems. For example it will be true that:

In any system containing the rule: n >= a + bn + nc + ndn, the rule
can be replaced by
n >= b* a c* (d b* a c*)*
to yield an equivalent system.

This is true regardless of whether the expressions a, b, c, d themselves
contain the variable n.

Another example
TRANSFORMATION 2: (Special case of left-recursion removal)
In any system containing rules of the form:
                                  n >= na + pb + u
p >= nc + pd + v
these rules can be replaced by
n >= (u + v d* b) (a + c d* b)*
p >= (v + u a* c) (d + b a* c)*
to yield an equivalent system.

Again, this is true regardless of what variables the expressions a, b,
c, d, u and v contain (even p and n).

TRANSFORMATION 3: (Special case of substitution)
In any system containing rules of the form:
n >= u
p >= F(n)
where F(n) is some algebra expression in n, these rules can be replaced by
n >= u
p >= F(u)

In each of these transformations, equivalence is of the strong variety
where all variables' least solutions are preserved ... and, in fact,
of the even stronger variety where the transformations preserves ALL
solutions to the system (not just the least solution).

One, where equivalence in the stronger sense is clearly not preserved

TRANSFORMATION 4: (Partial solution, substitution & elimination)
In any system of the form
                                ni >= Fi(n0,...,nk,p1,...,pl), i = 0,...,k
pj >= Gj(p1,...,pl), j = 1,...,l
with main variable n0, where the F's and G's are in general algebraic
expressions; if pj = Pj is the least solution to the subsystem
comprising the p's alone, then
ni >= Fi(n0,...,nk,P1,...,Pl), i = 0,...,k
is an equivalent system.

Another, is:

TRANSFORMATION 5: (Elimination)
Any system of the form
ni >= Fi(n0,...,nk); i = 0,...,k
pj >= Gj(n0,...,nk,p1,...,pl), j = 1,...,l
with main variable n0, where the F's and G's are algebraic expressions
is equivalent to the reduced system
ni >= Fi(n0,...,nk); i = 0,...,k

In fact, TR. 4 -- as suggested by its name -- can be factored into TR. 5 plus
two strong-equivalence transformations

TRANSFORMATION 6: Partial solution
Any system with rules of the form
pj >= Gj(p1,...,pl); j = 1,...,l
is equivalent to the system with these rules replaced by
pj >= Pj; j = 1,...,l
where (pj = Pj: j = 1,...,l) is the least solution to the subsystem
comprosing these l rules alone.

TRANSFORMATION 7: Substitution
Any system with rules of the form
ni >= Fi(n0,...,nk,p1,...,pl); i = 0,...,k
pj >= vj; j = 1,...,l
is equivalent to one where the rules involving the n's are replaced by:
ni >= Fi(n0,...,nk,v1,...,vl); i = 0,...,k.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.