Fri, 18 Sep 2015 12:48:02 -0700 (PDT)

Related articles |
---|

I rewrote Yacc federation2005@netzero.com (2015-09-18) |

Re: I rewrote Yacc federation2005@netzero.com (2015-12-16) |

From: | federation2005@netzero.com |

Newsgroups: | comp.compilers |

Date: | Fri, 18 Sep 2015 12:48:02 -0700 (PDT) |

Organization: | Compilers Central |

Keywords: | tools, parse, available |

Posted-Date: | 21 Sep 2015 01:02:33 EDT |

This is actually just the first stage ("normalization") of a long process,

with simplifications, better use of the language (C), and other efficiencies

and changes in the layout. Right now it's around 4000 lines + "skeleton" code.

So, for the time being, regression tests all pass and equivalency is

preserved.

I pulled back from a decision to try and collate this with the Java

version (byacc/J) deferring that to a later time. However I will be

collating this with Bison -- which has also been undergoing the

normalization process for a while now -- before further analysis and

collation takes place.

Berkeley Yacc has included a feature seen in Prolog's DCG parser: the

ability to explicitly parametrize non-terminals. It has also included

a feature ("backtracking") that apparently is paving the way to move

up from BNF to EBNF.

Bison, on the other hand, has included facility for handling the Tomita

parsing formalism (which these days apparently is called GLR). This is

something I've been working with since around 1985-1986 (when Tomita's book

was fresh off the press) -- and actually had some demo software in the

comp.compilers archive for a while in the 1990's. It will also be collated in

with the code. The LR part of this code is dramatically simpler (yet more

robust) than what's used in Yacc or Bison. So it is likely that the innards

are going to be completely gutted and replaced with my stuff.

During the intervening period (since the late 1980's) I developed a framework

(descended ultimately from the Chomsky-Schuetzenberger Theorem) that

seamlessly extends the algebra of regular and rational expressions up from

type 3 to type 2 in the Chomsky Hierarchy into a full-fledged calculus for

"context free expressions". I no longer use the Tomita framework because of

some serious deficiencies that become readily visible from the vantage point

of this latter framework. Though not published,the mathematical foundation for

it has been (2008 under LNCS 4988) -- a *complete* and *consistent* (but

second order theory) which has recently seen equivalent formulations (e.g. the

"Chomsky Algebras" of 1309.0893v1) published in the literature.

There are also serious inefficiencies underlying the LR method itself that

become readily apparent when rendered in the language of context free

expressions -- much of which not only serves as an obstacle to doing EBNF (=

BNF with regular expressions on the right hand sides of rules) but even "CBNF"

(BNF with full-fledged context free expressions on the right hand sides of

rules).

Following normalization/simplification and collation I plan on changing

everything over to a foundation based on context-free expressions. There are

actually multiple compilation methods (each of which serves as an algebraic

version of a proof to the Chomsky-Schuetzenberger Theorem) that can be used --

including one that captures the LR methodology. One of the things I've been

toying with is "semi-LR" where the underlying "characteristic FA" is left as

an NFA (the NFA -> DFA conversion of the characteristic FA is actually the

whole deal of what's going on with item sets).

There are other things I'll be experimenting with. For instance, the

compilation method I normally use (and heretofore I've done all parsers by

hand and lots of them) has a close link to LR but seems to by-pass much of

what's involved there. After a long period (> 2 decades) I have yet to prove

its equivalence with the LR method. So there will be a lot of

experimentation.

I've also intended all along to include most or all the features of Prolog's

DCG formalism (particularly the handling of operator precedences) and to even

turn everything around backwards to make Prolog itself run as a parametrized

non-deterministic push down transducer. (So I've also spent some time in the

recent past redoing the WAM formalism into a version based on the mathematical

theory known as "magmas" but that's another story for another time).

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.