Re: Parsing Expression Grammar

Chris F Clark <cfc@shell01.TheWorld.com>
23 Sep 2005 15:56:19 -0400

From comp.compilers

Related articles
[21 earlier articles]
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar rsc@swtch.com (Russ Cox) (2005-09-22)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-22)
*Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark)* (2005-09-23)**
Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-25)
Re: Parsing Expression Grammar cfc@world.std.com (Chris F Clark) (2005-09-25)
Re: Parsing Expression Grammar cfc@TheWorld.com (Chris F Clark) (2005-09-27)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-27)
Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-30)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-10-02)

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	23 Sep 2005 15:56:19 -0400
Organization:	The World Public Access UNIX, Brookline, MA
References:	05-08-115 05-09-008 05-09-100
Keywords:	parse
Posted-Date:	23 Sep 2005 15:56:19 EDT

Sylvain Schmitz <schmitz@i3s.unice.fr> writes:
> All this thread on PEGs has convinced me of giving them a better look.
...
> All this to get to the conclusion that the immediate translation of a
> CFG into a PEG by replacing "|" with "/" will not necesseraly yield the
> expected result. Even the simple grammar "S -> Ac, A -> a| ab" should
> not be translated "S <- Ac, A <- a / ab" but "S <- Ac, A <- ab / a",
> otherwise input "abc" would not be accepted.

Yes, this is "the" problem with PEG's. / implies ordering of the
search (parsing) space. You need to order your / operators so that
special cases (e.g. longer matches), so that they appear first.
Unfortunately, if you don't do this, nothing will tell you you have a
problem with your grammar, it will simply not parse some inputs. To
me this implies that if one wants to use a PEG to parse some input,
then one must exhaustively test the parser.

The exception being, if the grammar is LL(1), then a PEG will
correctly parse it and the above issue won't occur. However, if you
have made your grammar LL(1), it won't have rules like "A -> a | ab"
because you will have had to left-factor them. The parsing correctly
of "A -> ab / a" is a feature of PEG's since it obviates
left-factoring. However, one could be just as well off using one of
the LL(k) parsers, which automatically left-factor your grammars and
thus handle "A -> a | ab" without the user worrying about ordering or
having inputs that won't parse.

> Russ Cox wrote:
> > One thing PEGs definitely do well (and this is at least one reason why
> > Bryan worked on them) is that they are composable: you can take any
> > terminal symbol in a PEG, replace it with a non-terminal with
> > its own PEG grammar, and you get a bigger, but still valid, PEG grammar.
> This is definitely an excellent property.

Composability is an extremely useful feature. It really helps novice
and naive grammar writers.

Another userful feature is determinism, which PEG's have, by
determinism I mean that there is a unique parse for any legal input,
so you don't have ambiguous specifications.

The final useful property which PEG's have is run-time "linearity",
you can predict parsing time by taking the input length and
multiplying by some constant factor.

The fourth useful feature, which PEG's don't have, is the one cited
above, a kind of "completeness" or transparency. It is possible to
write a PEG grammar such that some alternative never participates in a
parse, and as a result some inputs which "look" parseable to the naive
grammar writer are actually not.

It is useful to compare this to other popular parsing technologies.

LL(1) and LR(1) parsers both fail composability, but succeed on
determinism, linearity, and transparency.

GLR and Earley parsers are composable and transparent, but are not
deterministic or linear. I think the same properties holds for CYK
parsers, but I'm not well-versed in them, so don't quote me.

I think it is possible to get a parser which nearly succeeds on all
four properties by combining the PEG ordering idea with GLR parsing.
In particular, one runs the GLR parser to get multiple ambiguous
parses, but uses the PEG idea of ordering those parses to
deterministically disambiguate them. Doing so, would give one a parser
that is composable, deterministic, and transparent. It would also be
linear on all LL and LR grammars, and just non-linear on ambiguous
grammars. It is even possible to warn on non-LL/LR grammars, so that
the user would know if the grammar is potentially non-linear (and
where the PEG disambiguation rules may be pruning away parses).

By the way, note that the PEG disambiguation rules are not the only
way of disambiguating the parses, but it does have a certain
simplicity to it. Another approach that could be used for
disambiguation is "cost functions" like are used in the Burg tools.
If I recall correctly, iburg has cost functions that impose linear
orders on the result of ambiguous parses.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Parsing Expression Grammar

Chris F Clark <cfc@shell01.TheWorld.com>23 Sep 2005 15:56:19 -0400

Chris F Clark <cfc@shell01.TheWorld.com>
23 Sep 2005 15:56:19 -0400