9 Jan 2004 23:34:15 -0500

Related articles |
---|

What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-07) |

Re: What is the trick of languages being LL(k+1), but not LL(k)? cfc@shell01.TheWorld.com (Chris F Clark) (2004-01-09) |

Re: What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-12) |

Re: What is the trick of languages being LL(k+1), but not LL(k)? cfc@shell01.TheWorld.com (Chris F Clark) (2004-01-12) |

Re: What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-16) |

Re: What is the trick of languages being LL(k+1), but not LL(k)? cfc@world.std.com (Chris F Clark) (2004-01-17) |

Re: What is the trick of languages being LL(k+1), but not LL(k)? oliver@zeigermann.de (Oliver Zeigermann) (2004-01-22) |

From: | Chris F Clark <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | 9 Jan 2004 23:34:15 -0500 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 04-01-027 |

Keywords: | parse, LL(1) |

Posted-Date: | 09 Jan 2004 23:34:15 EST |

Oliver asked why the rule in the next line is LL(k+1)

*> a : A^k B s | C ;*

This sounds supsiciously like a homework problem, so I will only give

a partial answer--one that should be sufficient for you to understand,

but not sufficient to allow you to turn it in as a homework answer.

The reason why the grammar is LL(k+1) is that is takes k tokens to get

through the A's and the k+1'st token is the B. No amount of rewriting

can get rid of those k A's (and still accept the same language). I

will leave it as an exercise to you to find the rule(s) which

"conflict(s)" with the "a" rule and can also start with k A's.

That is the key concept of LL(k) processing. There must be a way

within the first k tokens of a rule to determine which rule applies.

If there are two possible rules that apply at the k-th token, then the

grammar is at least k+1.

In practice, this is not such a bad limit as most "constructs" in most

programming languages have a unique starting token that tells you the

statement type--think of BASIC and how each line consists of a line

number followed by a unique keyword for each "type" of line. Even

languages like C have only one of several keywords that can begin a

declaration and not an assignment statement. The key part there is

that one must know which identifiers represent types and which

represent variables to make that ture, and hence the common hack of

tying the lexer to the symbol table.

The one place this tends to get one in trouble is "expressions",

because in expressions the keyword (operator) is often infix (or

perhaps suffix) and you cannot tell "expression + expression" from

"expression * expression" by looking at the tokens of the first

expression (which may be of unbounded length). Thus, grammars for

expressions which are written with recursion on both the left and

right sides are not LL(k) for any k, even though they are simple for

humans to understand--the recursion on the left is the problem.

However, it is usually possible to recast those expressions into a

form where there is no left recursion, by making up names for the

different types of sub-expressions (e.g. factor, term, primary, etc.)

and making them not left recursive.

To assure yourself that you understand the problem, is the reverse of

the language LL(k), and if so, for what k.

s: a s A | ;

a: s B A^k | C;

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

19 Bronte Way #33M voice : (508) 435-5016

Marlboro, MA 01752 USA fax : (508) 251-2347 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.