Using LALR machine to disambiguate tokens

"Monty Hall" <chickenkungpao@hotmail.com>
7 Sep 2004 23:58:33 -0400

          From comp.compilers

Related articles
Using LALR machine to disambiguate tokens chickenkungpao@hotmail.com (Monty Hall) (2004-09-07)
| List of all articles for this month |

From: "Monty Hall" <chickenkungpao@hotmail.com>
Newsgroups: comp.compilers
Date: 7 Sep 2004 23:58:33 -0400
Organization: SBC http://yahoo.sbc.com
Keywords: LALR
Posted-Date: 07 Sep 2004 23:58:33 EDT

        Just finished an LALR(k) dfa generator that also generates lexer regular
expression dfa in hopes of creating an integrated parse/lex rapid
development tool that's relatively 'hands free'. One thing that I am toying
with is disambiguating tokens. From the RE/grammar bnf snippet below:


    string = [a-z]+
    <start> ::= 'max' 'lookahead' '=' int
                            | 'start' 'rule' '=' int
                            | string '=' int


        When tokens may assume only one accept symbol, I simply find it annoying
that max, lookahead, start, and rule, are in string's dfa. One common
solution that I've seen is:


    <start> ::= string string
            { string[0] = 'max' && string[1] = 'lookahead' .....}


        I was thinking of using the LALR(k) machine to disambiguate
tokens. It could be done by adding a bitmask to each LALR state for
allowable input and using lookahead if the bitmasking yields a truly
ambiguous token. Does anybody have information on the topic of token
disambiguation or parsing keywordless programming languages(pitfalls,
concerns & considerations) and if possible as it relates to a LR
machine?


Regards,




Monty
chickenkungpao@hotmail.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.