Re: Grammar for optional elements

Chris F Clark <cfc@shell01.TheWorld.com>
Tue, 19 Jun 2007 09:45:19 -0400

          From comp.compilers

Related articles
[2 earlier articles]
Re: Grammar for optional elements anton@mips.complang.tuwien.ac.at (2007-06-16)
Re: Grammar for optional elements bobduff@shell01.TheWorld.com (Robert A Duff) (2007-06-16)
Re: Grammar for optional elements cfc@shell01.TheWorld.com (Chris F Clark) (2007-06-16)
Re: Grammar for optional elements 148f3wg02@sneakemail.com (Karsten Nyblad) (2007-06-17)
Re: Grammar for optional elements dot@dotat.at (Tony Finch) (2007-06-17)
Re: Grammar for optional elements torbenm@app-2.diku.dk (2007-06-18)
Re: Grammar for optional elements cfc@shell01.TheWorld.com (Chris F Clark) (2007-06-19)
Re: Grammar for optional elements dot@dotat.at (Tony Finch) (2007-06-19)
Re: Grammar for optional elements lowell@coasttocoastresearch.com (Lowell Thomas) (2007-06-19)
Re: Grammar for optional elements dot@dotat.at (Tony Finch) (2007-06-20)
Re: Grammar for optional elements Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2007-06-20)
Re: Grammar for optional elements cfc@shell01.TheWorld.com (Chris F Clark) (2007-06-21)
Re: Grammar for optional elements dot@dotat.at (Tony Finch) (2007-06-21)
[2 later articles]
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Tue, 19 Jun 2007 09:45:19 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 07-06-019 07-06-029
Keywords: parse, design
Posted-Date: 19 Jun 2007 10:28:33 EDT

Tony Finch <dot@dotat.at> writes:


> Mohitz <coolmohitz@gmail.com> wrote:
>>
>>I need to create a parser for a language something like this.
>>
>>attribute1: value;
>>attribute2: value;
>>attribute3: value;
>>
>>All the attributes are optional but can occur only once...
>
> You can do this with Parsing Expression Grammars.
> http://pdos.csail.mit.edu/~baford/packrat/
>
> start = attr*
> attr = attr1 / attr2 / attr3
> attr1 = "attribute1" COLON value SEMICOLON !(attr* attr1 attr*)
> attr2 = "attribute2" COLON value SEMICOLON !(attr* attr2 attr*)
> attr3 = "attribute3" COLON value SEMICOLON !(attr* attr3 attr*)
>
> This says that each attribute expression is not followed by a sequence
> of attributes containing another of the same kind of attribute expression.
>
> Tony.
> --
> f.a.n.finch <dot@dotat.at> http://dotat.at/


This is an interesting approach. It should work with predicate
grammars also, as long as they support negative predicates (negative
predicate = match this rule unless the lookahead is this regular
expression). I don't recall if pccts has negative predicates; I know
plain yacc doesn't (but yacc++ does) and the notation used suggests
that we are discussing yacc or pccts.


There is one little part I don't understand. Why doesn't it require
two copies of the attr1, attr2, and attr3 rules, one covered by the !
condition and one "raw"? Here's the reason I ask that question:


The attr1 rule uses a ! which should mean it applies unless the
lookahead is "attr* attr1 attr*". Ok, so if there is another attr1
later in the sequence it won't apply, so a sequence like attr1 attr2
attr1 is rejected. However, now such a sequence is not an attr. Does
that mean that a sequence like attr1 attr1 attr2 attr1 can match (for
one token), since the second attr1 is not matched as one, which means
the first attr is not followed by the prohibited sequence? I guess
the point I'm asking is whether the proposed grammar has the
(desirable) fails on first error property. It definitely fails
whenever there is an error. However, does it point out the first
occurence?


Compare to this grammar:


start: attr*;
attr: attr1 | attr2 | attr3;
attr_raw: attr1_raw | attr2_raw | attr3_raw;
attr1: attr1_raw !(attr_raw* attr1_raw attr_raw*);
attr1_raw: "attribute1" COLON value SEMICOLON;
attr2: attr2_raw !(attr_raw* attr2_raw attr_raw*);
attr2_raw: "attribute2" COLON value SEMICOLON;
attr3: attr3_raw !(attr_raw* attr3_raw attr_raw*);
attr3_raw: "attribute3" COLON value SEMICOLON;


In it, the non-terminals in the predicate use unguarded rules, so they
will match no matter what. Thus, the predicate is not influenced by
whether the predicate has matched in the rules the predicate depends
upon (because there are no predicates in the rules the predicates
depend upon).


I don't know if this matters in this case, but in some cases it
matters (for performance) because nested predicates can cause n**2
behavior. If I understand it, they can't for PEGs that are memoized,
but that should mean there are some predicated languages PEGs can't
parse (the ones which require the n**2 behaviour, perhaps
palindromes--my theoretical knowledge gets thin in that area).


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.