Fri, 25 Feb 1994 18:33:53 GMT

Related articles |
---|

Parsing with infinite lookahead Matt_Timmermans@msl.isis.org (Matt Timmermans/MSL) (1994-02-22) |

Re: Parsing with infinite lookahead jos@and.nl (1994-02-24) |

Parsing with infinite lookahead bevan@cs.man.ac.uk (Stephen J Bevan) (1994-02-24) |

Re: Parsing with infinite lookahead dwohlfor@cs.uoregon.edu (1994-02-24) |

Re: Parsing with infinite lookahead parrt@s1.arc.umn.edu (Terence Parr) (1994-02-25) |

Re: Parsing with infinite lookahead corbett@lupa.Eng.Sun.COM (1994-02-26) |

Re: Parsing with infinite lookahead nandu@cs.clemson.edu (1994-02-27) |

Re: Parsing with infinite lookahead hbaker@netcom.com (1994-02-28) |

Re: Parsing with infinite lookahead hbaker@netcom.com (1994-03-01) |

Re: Parsing with infinite lookahead bromage@mundil.cs.mu.OZ.AU (1994-03-02) |

Re: Parsing with infinite lookahead mareb@cis0.levels.unisa.edu.au (1994-03-24) |

Newsgroups: | comp.compilers |

From: | "Terence Parr" <parrt@s1.arc.umn.edu> |

Keywords: | parse, theory, bibliography |

Organization: | Compilers Central |

References: | 94-02-174 |

Date: | Fri, 25 Feb 1994 18:33:53 GMT |

Matt Timmermans/MSL <Matt_Timmermans@msl.isis.org> writes:

*> Does anyone know of a deterministic method for parsing any unambiguous*

*> context-free grammar -- including those that require infinite lookahead?*

^^^^^^^ "language" (subtle, but important point)

*>*

*> For example:*

*>*

*> S: A a | B b*

*> A: c | A c*

*> B: c | B c*

I think the Early Algorithm (in O(n^3) time?) parses any unambiguous

context-free language. However, most parser-generators are restricted to

the LALR(1), LR(1), or LL(1) grammars and, hence, cannot handle all

context-free languages.

Much work has been done in theory, and some research tools have been

built, that extend LR-based parsers to use regular expressions as

lookahead rather than a single symbol. This is very very cool stuff. See

[CuC73] and [BeS90].

In practice, the only commonly-available tool that I know of that uses

more than a single symbol of lookahead is ANTLR (the parser-generator of

PCCTS). It accepts LL(k) for k>1 grammars and has a nifty feature called

a syntactic predicate [PaQ94] which indicates a syntactic context that

must be satisfied before application of an associated production is

authorized. Removing left-recursion and using the EBNF notation of ANTLR,

I get the following grammar:

s : (c)+ a

| (c)+ b

;

This is non-LL(k) for any finite k (assuming that we cannot left-factor to

keep it interesting). However, I may use a syntactic predicate to specify

what context must be seen for the first production to be seen (the second

would be attempted by default upon failure of the predicate):

s : ((c)+ a)? (c)+ a

| (c)+ b

;

As a shorthand, we allow:

s : ((c)+ a)?

| (c)+ b

;

which says "I'm not sure if production one will match; give it a try."

Basically, the parser scans ahead in the infinite lookahead buffer to see

if an 'a' follows one or more 'c's.

Note that a syntactic predicate is predicting a production with a

context-free language which is stronger than predicting with a regular

language.

ANTLR-generated parsers can recognize all context-free languages if you're

willing to use syntactic predicates all over the place. However, the

majority of the decisions in most grammar are deterministic and, hence,

(...)? predicates are rarely needed, yielding near-linear parse

complexities.

Regards,

Terence Parr

Refs:

[CuC73] Karel Culik II, and Rina Cohen, ``LR-Regular Grammars--an

Extension of LR(k) Grammars,'' Journal of Computer and System

Sciences 7, 1973, pp 66-96.

[BeS90] Manuel E. Bermudez and Karl M. Schimpf, ``Practical Arbitrary

Lookahead LR Parsing,'' Journal of Computer and System

Sciences 41, 1990, pp 230-250.

[PaQ94] Terence J. Parr and Russell W. Quong, ``Adding Semantic and

Syntactic Predicates to LL(k) -- pred-LL(k).'' To Appear in

Proceedings of the International Conference on Compiler;

Edinburgh, Scotland; April 1994.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.