Re: Philosophical question regarding statement terminators

Chris F Clark <cfc@world.std.com>
25 Nov 2000 01:12:40 -0500

          From comp.compilers

Related articles
[6 earlier articles]
Re: Philosophical question regarding statement terminators cfc@world.std.com (Chris F Clark) (2000-11-14)
Re: Philosophical question regarding statement terminators jerrold.leichter@smarts.com (Jerry Leichter) (2000-11-14)
Re: Philosophical question regarding statement terminators cfc@world.std.com (Chris F Clark) (2000-11-15)
Re: Philosophical question regarding statement terminators vbdis@aol.com (2000-11-17)
Re: Philosophical question regarding statement terminators vbdis@aol.com (2000-11-19)
Re: Philosophical question regarding statement terminators adrian@sartre.cs.rhbnc.ac.uk (A Johnstone) (2000-11-21)
Re: Philosophical question regarding statement terminators cfc@world.std.com (Chris F Clark) (2000-11-25)
| List of all articles for this month |

From: Chris F Clark <cfc@world.std.com>
Newsgroups: comp.compilers
Date: 25 Nov 2000 01:12:40 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 00-11-069 00-11-090 00-11-101 00-11-143
Keywords: syntax
Posted-Date: 25 Nov 2000 01:12:40 EST

I wrote:
> If the 1st token of every rule is unique, it is easy to show that
> the grammar is LL(1).


Adrian corrected me:
> Much as I hate to tinker with Chris's usually reliable posts, this
> isn't so. For a grammar to be LL(1) it must be both left factored
> and follow determined. . . . . Follow determinism for the simple
> case of one symbol lookahead simply means that for a rule that can
> match the empty string (epsilon or lambda depending on whose papers
> you read), not only must the first sets of the alternate productions
> be unique but they must also not share any tokens with the follow
> set of the nonterminal. . . .


Sometimes one gets a little casual in postings and doesn't clearly say
what one means. Of course, this results in miscommunication.


What I meant to say was that every rule (alternative) begins with a
unique token (terminal). I also meant that the token had to appear
directly in the rule (and not as the result of a reference to another
non-terminal).


Grammars where every rule begins with a unique token are called
"Simple LL(1) Grammars" (See _The Theory of Parsing, Translation, and
Compiling_ Vol I, by Aho and Ullman, pg 336.) Note, my definition was
overly strong even for Simple LL(1), as the tokens only need to be
unique within the alternatives of one non-terminal, not across all
non-terminals in the grammar.


By beginning with a unique token there are no empty (epsilon or
lambda) rules, as every rule has at least one token. Thus, the follow
sets are never invoked in determining whether to apply a rule or not.
Each rule is implied by the appearance of its first token.


However, if you allow empty rules, then it is NOT SUFFICIENT (as
Adrian points out) for the only the non-empty rules to begin with a
unique token. With an empty rule (or a rule which can derive an empty
rule), one must consult the follow set to determine when to apply the
rule. If there is a case where some rule begins with a token and that
token is in the follow set of an empty rule at the same point in the
derivation tree, the grammar is not LL(1).


The other thing one has to disallow is regular expressions in the
right-hand-side. There is a simple translation that turns them into
recursive rules, so that's not much of an issue. However, regular
expressions are effectively implicit rules and they need to be
considered when determining if a rule has a unique initial token.


Thus, there were several possible sources of misinterpretation.


1) Some authors use the term rule to apply to all alternatives
      describing a single non-terminal. Others use rule to mean a single
      alternative. In the first usage, the grammar can be un-left-
      factored.


2) Starting with a unique token, might have been thought to only apply
      to rules that cannot derive an empty string. Rules that can derive
      (directly or indirectly) an empty string are called "nullable"
      require consulting the follow sets to determine when they apply.


3) Regular expressions can create implicit rules where the first token
      is not unique or that involve empty expansions.


If any one of the three things are included, my rule is incorrect. I
apologize for any confusion that I might have caused as a result.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.