Technical question about yacc and LALR parsers

jar@florida.HQ.Ileaf.COM (Jim Roskind x5570)
Sat, 20 Apr 91 22:04:33 EDT

          From comp.compilers

Related articles
Technical question about yacc and LALR parsers jar@florida.HQ.Ileaf.COMid AA07671; Sun, 21 Apr 91 (1991-04-20)
Re: Technical question about yacc and LALR parsers jml@wally.altair.fr (1991-04-22)
Re: Technical question about yacc and LALR parsers mauney@eos.ncsu.edu (1991-04-22)
Re: Technical question about yacc and LALR parsers bliss@sp64.csrd.uiuc.edu (1991-04-22)
Re: Technical question about yacc and LALR parsers boyland@aspen.berkeley.edu (1991-04-23)
Technical question about yacc and LALR parsers compres!chris@crackers.clearpoint.com (1991-04-21)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jar@florida.HQ.Ileaf.COM (Jim Roskind x5570) (5.61/UUNET-shadow-mx) id AA07671; Sun, 21 Apr 91 00:33:37 -0400
Keywords: yacc, parse, question, errors
Organization: Compilers Central
Date: Sat, 20 Apr 91 22:04:33 EDT

I have been spending this weekend hacking byacc to automate the
process that I go through to evaluate conflicts. One part of this is
walking backwards through a parser table. When I generated backwards
pointers for each state in the parse, I noticed something rather
surprising (at least to me). Basically, for each state k in the
parser, there are many states that can transition to k (no surprise),
but all such states transition with for the same reason (I was
surprised) i.e, all shifts/goto's into state k are made on the same
terminal/nonterminal. To put it another way, when the parser
generator is sitting in any given state k, the token just to the left
of the carrot is a function of k only.


When I first coded my data structures, I assumed that for each state
k, I would need a list of prior states, and I also wanted to save the
token name that was used to transition from the "prior state". When
all was said and done, I found that it was sufficient to save a "list
of prior states" for each state, as well as a single "prior token" for
each state.


If I haven't still made myself clear, the following is an example of
something that will *never* be found in the verbose output from yacc:


state 727
parened_type : '(' VOID . ')'
parened_type : '(' INT . ')'


Basically, if this was ever seen, it would mean that in state 727, it
was possible to have two distinct tokens to the immediate left of the
carrot. Note that the following is possible (I think):


state 747
gathered_type : '[' VOID . ']'
gathered_type : '(' VOID . ')'


as it is certainly *not* the case that the state defines the entire
left context of the carrot.


One corollary (via the pigeon hole principle) of this observation is
that there must me more states in such a parser than there are
terminals plus nonterminals.


Assuming that I have explained myself, the question is: Is this
property true of all LALR parsers? It was very conceivable that I
could hand code a table driven "LR like" parser that would *not* have
this property. Needless to say, my fear is that my "simplified data
structures" are not sufficient for all grammars that byacc might
generate.


Jim Roskind- Author of a YACCable C++ grammar
Independent Consultant
(407)729-4348 or (617)290-4990 x5570
jar@hq.ileaf.com or ...!uunet!leafusa!jar
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.