Re: Parsing Expression Grammar

Sylvain Schmitz <schmitz@i3s.unice.fr>
22 Sep 2005 23:52:20 -0400

From comp.compilers

Related articles
[19 earlier articles]
Re: Parsing Expression Grammar Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-18)
Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar rsc@swtch.com (Russ Cox) (2005-09-22)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-22)
*Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz)* (2005-09-22)**
Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-22)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-23)
Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-25)
Re: Parsing Expression Grammar cfc@world.std.com (Chris F Clark) (2005-09-25)
Re: Parsing Expression Grammar cfc@TheWorld.com (Chris F Clark) (2005-09-27)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-27)
[2 later articles]

| List of all articles for this month |

From:	Sylvain Schmitz <schmitz@i3s.unice.fr>
Newsgroups:	comp.compilers
Date:	22 Sep 2005 23:52:20 -0400
Organization:	Compilers Central
References:	05-08-115 <432FE722.1070801@i3s.unice.fr> <ee9e417a05092006516e0d9d95@mail.gmail.com> <200509201743.39415.baford@mit.edu>
Keywords:	parse
Posted-Date:	22 Sep 2005 23:52:20 EDT

Bryan Ford wrote:
> The fact that it's possible to transform most CFGs into a syntactically
> correct PEG by replacing "->" with "<-" and "|" with "/" does not by any
> means imply that the PEG so generated will actually represent the same
> language as the CFG did. CFGs and PEGs are different syntactic
> meta-languages whose expressions have substantially different
> interpretations, and I've never yet even tried to define a "direct" formal
> relationship directly between similar-looking PEGs and CFGs. In other words,
> the fact that the CFG "P -> x P x | x" looks a lot like the PEG "P <- x P x /
> x" means absolutely nothing formally.
>
> It's probably possible to define some such formal correspondence between a
> very restrictive subset of CFGs, and a similar very restrictive subset of
> PEGs, and prove that the corresponding grammars represent the same language.
> The LL(0) CFGs might be a good starting point for building such a
> correspondence. I haven't done so yet, however, nor has anyone else that I
> know of. Needless to say, the CFG "P -> x P x | x" would probably not be
> covered by this correspondence.

It seems like a good idea; I would have liked a generative
interpretation of all PEGs, but the undecidability of the emptiness
problem is a sign that it's unlikely to happen. The bottom line here is
that it can be difficult to tell what language a PEG will accept.
      Am I wrong if I write that the language L(alpha / beta) recognized by
the PEG "alpha / beta" is

L(alpha / beta) = L(alpha) \cup { xy \in L(beta) | x \not\in L(alpha) }

It looks like a good starting point for understanding PEGs as generative
devices. (At least I find it helpful.)

>> Now, strictly speaking, the language recognized by "P <- x P x / x"
>>*is* {x^(2^n - 1) | n > 0} (and that's quite an achievement in itself),
>
> Actually, if you analyze the behavior of the above PEG further I think you'll
> find that it represents the language {"x", "xxx"}. In other words, the
> PEG-based parser fails to behave in the way the similar-looking CFG does for
> input strings of 5 or more x's, because it deterministically makes the
> "wrong" parsing choices in the middle of the string and never goes back to
> explore different choices as a fully backtracking CFG-based parser might.
>
> It's easy in this case to write a _different_ PEG that recognizes the same
> language {x^(2^n - 1) | n > 0}, e.g., "P <- x x P | x", but that fact doesn't
> seem to generalize to CFGs in general.

It seems like you read the language expression a bit fast; I concur

                                      P -> x P x | x,
                                      P -> x x P | x and
                                      P <- x x P / x

all recognize the same language { x^(2n - 1) | n > 0 }, but

                                      P <- x P x / x

recognizes { x^(2^n - 1) | n > 0 }, a non CF language, not {x, xxx}. I
joined the source of a packrat parser for it (and I apologize for my
poor Haskell skills).

>>Looking at this, I also wonder whether there exists a PEG for the CFL
>>{ww^R | w \in {a,b}^*}.
>
> My guess is no. The basic problem seems to be the lack of syntactic markers
> in the middle of the string from which the PEG-based parser might start
> building correct intermediate results.

Isn't this overall problem handled in Birman's Information and Control
paper? I haven't really read it yet, but he introduces multiple-failure
schemes, (l,m)-TS, which look like a way of expressing the need for more
backtracking power in a "bounded" way; the problem with palindromes
being that they need an "unbounded" number of backtracks, hence their
quadratic time complexity in Earley parsing. (This is just an intuition.)

--
Regards,

        Sylvain
-- PalPackrat.hs
--
-- Packrat parser for grammar P <- x P x / x.
-- See "4.3 Recognition Limitations" in Bryan Ford, Packrat Parsing: Simple,
-- Powerful, Lazy, Linear Time. ICFP'02.
-- The motivation for the study of this grammar is its possibly inherent
-- quadratic time parsing complexity; see Jay Earley, An Efficient Context-Free
-- Parsing Algorithm. Communications of the ACM, 13(2):94-102, 1970.
--
-- Usage in Hugs98: let's see why "xxxxx" is not accepted:
-- Hugs.Base> :load PalPackrat.hs
-- PalPackrat> dvPalindrom (parse "345")
-- "(P <- '3'(P <- '4')'5')"
-- Remains ""
-- PalPackrat> dvPalindrom (parse "2345")
-- "(P <- '2')"
-- Remains "345"
-- PalPackrat> dvPalindrom (parse "12345")
-- "(P <- '1'(P <- '2')'3')"
-- Remains "45"
-- PalPackrat> map eval [1,3..66]
-- [1,3,0,7,0,0,0,15,0,0,0,0,0,0,0,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,0]

module PalPackrat where

data Result v = Parsed v Derivs | NoParse
instance Show v => Show (Result v) where
    show (Parsed v (Derivs r p c)) = show v ++ "\nRemains \"" ++ r ++ "\""
    show (NoParse ) = "No Parse"

data Derivs = Derivs {
                                remain :: String,
                                dvPalindrom :: Result String,
                                dvChar :: Result Char
                            }

-- Returns `n' if the parser accepts an input of length `n', `0' otherwise.
eval :: Int -> Int
eval n = case dvPalindrom (parse (map (\i -> 'x') [1..n])) of
                      Parsed v d -> case dvChar d of -- check for EOF
                          NoParse -> n
                          _ -> 0
                      _ -> 0

-- Construct a parse result structure for an input string.
parse :: String -> Derivs
parse s = d where
        d = Derivs r pal char
        r = s
        pal = pPalindrom d
        char = case s of
                          (c:s') -> Parsed c (parse s')
                          [] -> NoParse

-- P
pPalindrom :: Derivs -> Result String
pPalindrom d = alt1 where
    -- P <- x P x
    alt1 = case dvChar d of
                      Parsed c1 d' -> case dvPalindrom d' of
                          Parsed p d'' -> case dvChar d'' of
                              Parsed c2 d''' -> Parsed ("(P <- " ++ (show c1) ++ p ++ (show c2) ++ ")") d'''
                              _ -> alt2
                          _ -> alt2
                      _ -> alt2
    -- P <- x
    alt2 = case dvChar d of
                      Parsed c d' -> Parsed ("(P <- " ++ (show c) ++ ")") d'
                      _ -> NoParse

-- An equivalent result can be obtained using pappy
-- [http://www.pdos.lcs.mit.edu/~baford/packrat/thesis/]
-- to generate a packrat parser from the following 'Pal.pappy' file:

-- parser Pal:
--
-- top Axiom
--
-- Axiom :: Int =
-- p:Palindrom !Char -> { p }
--
-- Palindrom :: Int =
-- 'x' p:Palindrom 'x' -> { p + 2 }
-- / 'x' -> { 1 }
--
-- {
-- -- test:
-- -- Pal> map eval [1,3..66]
-- -- [1,3,0,7,0,0,0,15,0,0,0,0,0,0,0,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,0]
-- --
-- eval :: Int -> Int
-- eval n = case palAxiom (palParse "Palindrom" (map (\i -> 'x') [1..n])) of
-- Parsed v _ _ -> v
-- NoParse e -> 0
-- }

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Parsing Expression Grammar

Sylvain Schmitz <schmitz@i3s.unice.fr>22 Sep 2005 23:52:20 -0400

Sylvain Schmitz <schmitz@i3s.unice.fr>
22 Sep 2005 23:52:20 -0400