Re: bison parser : retrieving values from recursive pattern

Kaz Kylheku <864-117-4973@kylheku.com>
Fri, 07 Jul 2023 01:14:04 -0000

          From comp.compilers

Related articles
bison parser : retrieving values from recursive pattern desharchana19@gmail.com (Archana Deshmukh) (2023-07-06)
Re: bison parser : retrieving values from recursive pattern 864-117-4973@kylheku.com (Kaz Kylheku) (2023-07-07)
| List of all articles for this month |

From: Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups: comp.compilers
Date: Fri, 07 Jul 2023 01:14:04 -0000
Organization: Compilers Central
References: 23-07-001
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="50127"; mail-complaints-to="abuse@iecc.com"
Keywords: yacc, design
Posted-Date: 06 Jul 2023 21:17:02 EDT

On 2023-07-06, Archana Deshmukh <desharchana19@gmail.com> wrote:
> Hello,
>
> I have a following rule
>
> num :
>| integer comma num
>| integer closeroundbkt
>| integer closesquarebkt
>


Recognizing close brackets in a different rule from the open ones is
not absolutely off the table, but it's a code smell.


Consider a nice grammar like


    list : '(' items ')'
              | '(' ')'
              | '[' items ']'
              | '[' ']'


    items : items ',' item
                | item
                ;


    item : list | num | type | decl


    decl : keyword ':' oper list


    keyword : KW_main | KW_data | KW_param_1


    type : TYPE_float32 | ...


    oper : OPER_r | OPER_or


I'd make all the symbols just one token type SYMBOL, and deal with it
all semantically later in the pipeline.


I.e. the over-generated grammar would allow nonsense like


    @data(%float32: foo[(1, 2, 3, 4), param_1], main: ...)


This would be checked for validity semantically; that the right
kinds of symbols are in the right positions in the shape.




    list : '(' items ')'
              | '(' ')'
              | '[' items ']'
              | '[' ']'


    items : items ',' item
                | item
                ;


    item : list | num | SYMBOL | decl


    decl : SYMBOL ':' SYMBOL list


Lisp teaches us that reserved keywords are largely inflexible
and counterproductive.


Make your SYMBOl objects interned, and give them a type like
"struct symbol *". Interned means that when the same symbol
occurs more than once, the parser returns the same pointer:


      SYMBOL { $$ = intern($1); } /* $1 is the yytext lexeme */


The first time intern("foo") is called it creates and return
s a symbol sym such that sym->name is foo (a strdup-ed copy)
The second time intern("foo") is called, it returns exactly
the same object!


In your program you can have initialization like this:


    struct symbol *float32_s;


    void global_init(void)
    {
          float32_s = intern("float32");


          ...
    }


Then when the parser sees float32, it will produce
the same pointer.


The upshot is that you never have to compare strings.
If you want to check, is x the float32 symbol, you just use
the == operator;


    void foo(struct symbol *x)
    {
        if (X == float32_s) {
            // we are looking at the float32 symbol


        }


    }


Because symbols are just pointers, they are also fast to hash.
A hash table which maps symbols to other things just has
to hash the 4 or 8 byte pointer, not the string. This can
be done in a few bit operations.


Important global properties about symbols can be stored
in the struct symbol itself. For instance float32 is
a type, so there can be a sym->is_type property,
which is true for float32. Then you can easily check
whether some list has a type symbol in a certain position.
First check there is a symbol and if so, that it is
one with the is_type property true.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.