Re: Lookahead vs. Scanner Feedback

bliss@sp64.csrd.uiuc.edu (Brian Bliss)
Wed, 8 Jan 92 17:55:13 GMT

          From comp.compilers

Related articles
[3 earlier articles]
Re: Lookahead vs. Scanner Feedback sef@kithrup.COM (1992-01-07)
Re: Lookahead vs. Scanner Feedback Jan.Rekers@cwi.nl (1992-01-07)
Re: Lookahead vs. Scanner Feedback burley@geech.gnu.ai.mit.edu (1992-01-07)
Re: Lookahead vs. Scanner Feedback drw@lagrange.mit.edu (1992-01-07)
Re: Lookahead vs. Scanner Feedback smk@dcs.edinburgh.ac.uk (1992-01-07)
Re: Lookahead vs. Scanner Feedback bill@twwells.com (1992-01-08)
Re: Lookahead vs. Scanner Feedback bliss@sp64.csrd.uiuc.edu (1992-01-08)
Re: Lookahead vs. Scanner Feedback nigelh@sol.UVic.CA (1992-01-08)
Re: Lookahead vs. Scanner Feedback dww@inf.fu-berlin.de (1992-01-08)
Re: Lookahead vs. Scanner Feedback jwoods@convex.com (1992-01-09)
Re: Lookahead vs. Scanner Feedback jwoods@convex.com (1992-01-10)
Re: Lookahead vs. Scanner Feedback bliss@sp64.csrd.uiuc.edu (1992-01-13)
Re: Lookahead vs. Scanner Feedback megatest!djones@decwrl.dec.com (1992-01-13)
| List of all articles for this month |

Newsgroups: comp.compilers
From: bliss@sp64.csrd.uiuc.edu (Brian Bliss)
Keywords: parse, C
Organization: UIUC Center for Supercomputing Research and Development
References: 92-01-032
Date: Wed, 8 Jan 92 17:55:13 GMT

In article 92-01-032, smk@dcs.edinburgh.ac.uk writes:
|> [Reusing a typedef name] shouldn't be a problem, because this is not really
|> an ambiguous occurrence. You can deal with that by having a production
|>
|> any_ident : ident | type_ident;
|>
|> and using any_ident for the identifier in a declarator (and several other
|> places). This should be possible without introducing any ambiguities.
|>
|> But for some parts of the C syntax this is not so easy, for labels you
|> probably have to expand the any_ident production to allow programs like
|>
|> typef int foo;
|> main ()
|> { foo: ;
|> }
|>
|> because otherwise there is a shift-reduce conflict
|> (reduce type_ident to any_ident for labels, shift for declarations).
[It's not impossible, but it's tricky and messy to get right. -John]


O.K. I haven't got out the grammar and done the actual table construction
(read: disclaimer), but declarations ARE the one place where you do need
the separate tokens for ident and type_ident. any other place, the
any_ident->ident|type_ident rule works fine (On labels, for instance, the
: in the lookahead stream resolves the ambiguity. I have also sucessfully
used the above productions to allow a typedef name to also be a tag name).
Consider the code fragment:


typedef int z;
main() {
      long z;
}


is z being redeclared as a local variable in main(), or are you just
specifying the empty declaration for a long int type? The ambiguity
depends upon which token you return from the lexical analyzer when a is
encountered for the second time. The ANSI C grammar in the back of K&RII
is not ambiguous: it assumes that the lexer resolves the ambiguity, not
the parser.


The fix to this problem is much easier than I first thought: Just use
lex's right-context sensitivity operator (/) to search ahead in the input
stream for one of [,{;] (preceeded by optional whitespace) when an
identifier is encountered. In cases that match, always return the IDENT
token; on cases that don't, lookup the name and return TYPE_NAME if the
identifier is a typedef name, return IDENT otherwise.


As for my original statement


>One place where every yacc/lex based C compiler I know of is broken


I knew sun's cc was broken & any C compiler I had work on was too,
couldn't figure out a way to easily fix the problem, and over-generalized :-)


bb
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.