Re: Are lalr parse-tables viable?

Chris F Clark <>
28 Jun 2004 19:59:41 -0400

          From comp.compilers

Related articles
Are lalr parse-tables viable? (craig) (2004-06-11)
Re: Are lalr parse-tables viable? (2004-06-12)
Re: Are lalr parse-tables viable? (A Johnstone) (2004-06-13)
Re: Are lalr parse-tables viable? (Chris F Clark) (2004-06-13)
Re: Are lalr parse-tables viable? (craig) (2004-06-21)
Re: Are lalr parse-tables viable? (craig) (2004-06-25)
Re: Are lalr parse-tables viable? (Chris F Clark) (2004-06-28)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 28 Jun 2004 19:59:41 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 04-06-038 04-06-055 04-06-086
Keywords: LALR, parse, practice
Posted-Date: 28 Jun 2004 19:59:41 EDT

> My Analyzer Generated 327 Unique States And 16258 Transitions Between
> These. 16258 / 327 ~= 50 Transitions Per State.

I must have fat-fingered the division, resulting in an off by 10x
error. 50 transitions per state is quite reasonable (not small) but
certainly in the ballpark of what one would expect.

The key point is that many (most, probably 75% of the) states will
have a few (< 10) transitions, and a few key states (starts of things
with a lot of fan-out) will be the places where the transitions will
be concentrated.

Key examples:

At the start of a statement "state", you will have one transition for
each keyword the starts a statement, thus "if", "while", "for", "let",
"print", ....

At the start of an expression, you will have one transition for each
basic constituent, nummbers (int and float probably), strings,
identifiers, prefix operators (+, -), and parens surrounding nested
expressions. Because expressions get embedded, you will probably have
several copies of this state with some extra context info.

At the 2nd slot in an expression, you will have transitions for each
binary operator (+, -, *, /), and any suffix operators. Again, you
may get multiple copies of this state depending on context.

Those are the kinds of states that tend to pull the number up. Unless
your grammar uses expressions in a wide variety of contexts or has
many places you can nest statements, the number of these high fan-out
states should be less than a quarter of your grammar. That means that
for them to bring the number of transitions per state up, the number
of transitions in these high fan-out states must be substantially
higher than the number of states in the low fan-out states (perhaps
twice the average). Twice 50 is 100, so that's a fairly large number
of different transitions in the high fan-out states.

Now, you did say you were using precedence in your parser generator,
meaning you have to write the rules using different non-terminals to
express the expression hierarchy. This could (I can't say for sure)
increase the percentage of high-fan-out states, as there may be
different states that work through the different expression
non-terminals. Your fan-out numbers could also be elevated, if the
transition counts include both terminal (token) and non-terminal
transitions (our parser generator Yacc++ does that, but it isn't
normal to count them that way, as traditionally the parse action and
goto tables are separated).

Another way to increase fan-out is if the keywords of your language
are PL/I style (can be used as identifiers also) and not Pascal style
"reserved words", which cannot be used in any place except where they
are specified by the grammar. PL/I style keywords increase the
fan-out drammtically anywhere an identifier is allowed, unless one
uses a lexical hack to map them into identifiers where they aren't

Hope this helps,

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.