Re: A Grammar Writing Question

Steve Horne <>
Fri, 27 Jul 2007 03:41:51 -0700

          From comp.compilers

Related articles
A Grammar Writing Question (Lowell Thomas) (2007-07-23)
Re: A Grammar Writing Question (Chris F Clark) (2007-07-26)
Re: A Grammar Writing Question (glen herrmannsfeldt) (2007-07-26)
Re: A Grammar Writing Question (Steve Horne) (2007-07-27)
Re: A Grammar Writing Question (Hans-Peter Diettrich) (2007-07-27)
Re: A Grammar Writing Question (Chris F Clark) (2007-07-28)
Re: A Grammar Writing Question (Sylvain Schmitz) (2007-07-29)
A Grammar Writing Question (Lowell Thomas) (2007-08-07)
| List of all articles for this month |

From: Steve Horne <>
Newsgroups: comp.compilers
Date: Fri, 27 Jul 2007 03:41:51 -0700
Organization: Compilers Central
References: 07-07-08107-07-093
Keywords: parse
Posted-Date: 27 Jul 2007 09:31:02 EDT

Chris F Clark wrote:
> hand-written copy of unix2dos that I've lost the source code to, but


> For this I wrote a grammar that tried to handle whitespace at the end
> of the line specially, such that while doing the \r \n conversions, I
> could also trim trailing whitespace on a line.

Are you sure a CF parser is the way to go? Personally, my first choice
would be Ragel ( for
something like this. Very powerful regular grammar handling, support
for lex-like backtracking scanning, and explicit call/ret nesting when
some simple recursive descent is needed.

> That is the point, one wants to write grammars fairly naively. But,
> then, one wants a formal tool to check and prove one hasn't composed
> nonsense. We don't have good tools for that beyond LL and LR.

This is a good point, IMO. Error checking requires that there must be
erroneous grammars to complain about. The more powerful the parsing
method, the fewer the grammars that the method considers erroneous,
meaning less chance of useful error messages.

I've been toying with a tabular LR parsing idea for a while. You may
remember me from the "Have I discovered something new" thread in 2002
- that time I was very very wrong, but this idea will work. I am
confident because it turns out to be a re-invention - once I had the
idea (inspired of course by tabular LL), I was disappointed to
discover that others had it first (in particular, Mark-Jan Nederhof) :-(

In theory, tabular LR should handle all CF grammars that Tomita or
backtracking LR can handle. However, there is always a possibility
that either Tomita or backtracking LR can get caught in an infinite
loop that cannot be predicted. I think of the problem as local
infinite ambiguity (the ambiguity might get resolved later, but in an
L-to-R scan you'll never reach the resolution because of the infinite

In these cases, tabular LR equally cannot predict the problem when
compiling parse tables - but it can detect the problem for a
particular input while parsing, so that termination can be guaranteed.
There are other advantages in terms of easy handling of EBNF (with no
need to convert to BNF rules) since reduction pop-counts can be
determined in an input-sensitive way from the table, rather than being
built into the state model.

Of course a major price to pay is the table itself - size proportional
to the number of states in the LR-ish state model times the number of
tokens in the input. Much of this is empty though, and there are ways
to determine which parts of the table will never be referenced again,
so this can be managed in various ways depending on what trade-offs
you are willing to make.

But if no CF grammar is ever considered erroneous at compile-time, how
can you possibly sanity check your grammar? The best you can do is
probably to warn in the cases where LR(1) would give an error.

Also, I'm not certain that context handling of the type needed for C-
like grammars can work using tabular LR.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.