31 Jul 2002 01:15:44 -0400

Related articles |
---|

Writing Grammars newsham@lava.net (Tim Newsham) (2002-07-25) |

Re: Writing Grammars idbaxter@semdesigns.com (Ira Baxter) (2002-07-31) |

Re: Writing Grammars whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-31) |

Re: Writing Grammars pfroehli@ics.uci.edu (Peter H. Froehlich) (2002-07-31) |

Re: Writing Grammars newsham@lava.net (Tim Newsham) (2002-08-04) |

Re: Writing Grammars parsersinc@yahoo.com (SLK Parsers) (2002-08-04) |

Re: Writing Grammars vbdis@aol.com (VBDis) (2002-08-10) |

From: | "Mark" <whopkins@alpha2.csd.uwm.edu> |

Newsgroups: | comp.compilers |

Date: | 31 Jul 2002 01:15:44 -0400 |

Organization: | University of Wisconsin - Milwaukee, Computing Services Division |

References: | 02-07-118 |

Keywords: | parse, syntax |

Posted-Date: | 31 Jul 2002 01:15:44 EDT |

"Tim Newsham" <newsham@lava.net> writes:

*>It seems pretty typical in compiler (and other parsing) implementation*

*>that you would:*

*> - start with a grammar of the language*

*> - modify the grammar to be easy to parse (ie. LALR(1))*

*> - add rules to build an abstract syntax tree from the*

*> resulting parse (either the parse tree, or as actions*

*> during the parse process).*

*>My question is: Are there any tools or research into tools to help*

*>automate the process of going from step 1 to step 2.*

The key is to recognize that parsers don't work with grammars for

context-free languages, but grammars for SSDT's (simple syntax

directed transductions). The latter comprise the class of

context-free subsets of

T = X* x Y*

X = input alphabet, Y = output alphabet.

A point of a parser is to translate inputs to outputs by finding the

set T(w) = { v in Y*: wv is in T } of all outputs that corresponding

via T to the word w.

As long as you recognize that you're actually working with SSDT's and

not CFL's, then you're free to do any algebraic manipulations you want

on the corresponding grammars.

One only says they're parsing a CFG via one of its canonical

extensions to SSDT grammars:

Bottom-Up SSDTG:

Every rule n -> w is modified to n -> w reduce_nw

New start rule: start -> S accept

The output alphabet is

Y = {accept} union { reduce_nw: n->w is a rule in the CFG }

or

Top-Down SSDTG:

Every rule n -> w is modified to n -> node_nw w

New start rule: start -> root S

Output alphabet:

Y = {root} union {node_nw: n->w is a rule in the CFG }

The concern you may commonly hear about "losing the 'structure' when

transforming a grammar" actually pertains to the fact that different

CFG's for the same CFL give you different canonical extensions to

SSDT's, whereas it's the SSDT's, themselves, that's relevant to

parsing not the CFL's.

So, then, any transformation which can be derived from the underlying

algebra is valid -- as long as you keep in mind that you're

transforming the entire SSDTG, not the base CFG.

A grammar is a system of inequalities over the algebra. This is true,

regardless of what type of grammar you're talking about: Chomsky type

0, 1, 2 or 3; grammars for languages (subsets of X*), transductions

(subsets of X* x Y*), nary relations (X1* x X2* x ... x Xn*), or even

over more general word algebras (even groups, like SU(2), SL(2,C) or

Z5, etc.)

One variable, S, is designated the "main" variable. The other

variables, along with S, form the set N of variables of the grammar.

These correspond respectively to the "start" symbol and

"non-terminals".

Write the non-terminals as a set N = {S = n0, n1, n2, ... }, there can

be many solutions (n0 = N0, n1 = N1, ..., nk = Nk) to the system. But

one, alone, will stand out as the "least" solution -- that is, the

least with respect to all the variables.

The language (or transduction or relation, etc.) is that part of the

least solution corresponding to the main variable n0.

The possible rules for transforming a grammar are those of

transforming the corresponding system. A transformation should yield

the same least solution to n0. These may change the set N.

A stronger type of equivalence is where the set N is preserved and the

least solution for ALL the variables is preserved. This provides you

with a more restricted set of possible transformations.

Reference is made to an ordering relation, with the rule n -> w

corresponding to the inequality n >= w. That means the underlying

algebra is partially ordered and embodies the rules associated with a

word algebra, itself:

Axioms: x1 = x = 1x; (xy)z = x(yz)

1 <-> "empty word"

xy <-> "concatenation of x and y.

For regular languages, an algebra is already well-known: the regular

expressions. The ordering relation is

E >= F <==> set(E) includes set(F),

where set(E), set(F) are the sets corresponding to the regular expressions

E and F.

The key property of the ordering is:

A-Completeness:

Every A-type subset has a least upper bound (sum A).

A-Distributivity:

If U, V are A-type subsets, then

sum(UV) = sum(U) sum(V)

where UV = { uv: u in U, v in V }.

You get different types of algebras depending on what you define as

"A-type", with the following table:

A Algebra

Finite subsets Dioid

Rational subsets *-continuous Kleene algebra (*KA)

Context-free subsets (Unnamed)

Turing subsets (Unnamed)

Countable subsets omega-continuous KA (omegaKA)

All subsets Quantale

Conway presented 3 algebras (two called N-algebra and S-algebra) which

are the ones corresponding to *KA, omegaKA and Quantale, but I forget

which is which.

A-type subsets in general must include all finite subsets and be

closed under products of sets in order for the definitions above

to make sense and in order to define the usual operations:

0 = sum {} <--> "fail"

a+b = sum {a,b} <--> "non-deterministic branch" (a | b).

A grammar rule

S -> T | ST

for instance, then corresponds to the inequality

S >= T + ST,

which is equivalent (for illustration's sake) to

S >= T + ST + 0.

If the family A includes all rational operations (as all those do,

which are listed at and under "Rational subsets"), you can also define

a* = sum {1,a,aa,aaa,...} <--> "repetition"

So, with that background, you can formulate transformation rules as

theorems. For example it will be true that:

TRANSFORMATION 1:

In any system containing the rule: n >= a + bn + nc + ndn, the rule

can be replaced by

n >= b* a c* (d b* a c*)*

to yield an equivalent system.

This is true regardless of whether the expressions a, b, c, d themselves

contain the variable n.

Another example

TRANSFORMATION 2: (Special case of left-recursion removal)

In any system containing rules of the form:

n >= na + pb + u

p >= nc + pd + v

these rules can be replaced by

n >= (u + v d* b) (a + c d* b)*

p >= (v + u a* c) (d + b a* c)*

to yield an equivalent system.

Again, this is true regardless of what variables the expressions a, b,

c, d, u and v contain (even p and n).

TRANSFORMATION 3: (Special case of substitution)

In any system containing rules of the form:

n >= u

p >= F(n)

where F(n) is some algebra expression in n, these rules can be replaced by

n >= u

p >= F(u)

In each of these transformations, equivalence is of the strong variety

where all variables' least solutions are preserved ... and, in fact,

of the even stronger variety where the transformations preserves ALL

solutions to the system (not just the least solution).

One, where equivalence in the stronger sense is clearly not preserved

is:

TRANSFORMATION 4: (Partial solution, substitution & elimination)

In any system of the form

ni >= Fi(n0,...,nk,p1,...,pl), i = 0,...,k

pj >= Gj(p1,...,pl), j = 1,...,l

with main variable n0, where the F's and G's are in general algebraic

expressions; if pj = Pj is the least solution to the subsystem

comprising the p's alone, then

ni >= Fi(n0,...,nk,P1,...,Pl), i = 0,...,k

is an equivalent system.

Another, is:

TRANSFORMATION 5: (Elimination)

Any system of the form

ni >= Fi(n0,...,nk); i = 0,...,k

pj >= Gj(n0,...,nk,p1,...,pl), j = 1,...,l

with main variable n0, where the F's and G's are algebraic expressions

is equivalent to the reduced system

ni >= Fi(n0,...,nk); i = 0,...,k

In fact, TR. 4 -- as suggested by its name -- can be factored into TR. 5 plus

two strong-equivalence transformations

TRANSFORMATION 6: Partial solution

Any system with rules of the form

pj >= Gj(p1,...,pl); j = 1,...,l

is equivalent to the system with these rules replaced by

pj >= Pj; j = 1,...,l

where (pj = Pj: j = 1,...,l) is the least solution to the subsystem

comprosing these l rules alone.

TRANSFORMATION 7: Substitution

Any system with rules of the form

ni >= Fi(n0,...,nk,p1,...,pl); i = 0,...,k

pj >= vj; j = 1,...,l

is equivalent to one where the rules involving the n's are replaced by:

ni >= Fi(n0,...,nk,v1,...,vl); i = 0,...,k.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.