Re: Why are LR parsers faster than using if conditions

Hans Aberg <haberg@matematik.su.se>
30 Jun 2004 23:13:59 -0400

          From comp.compilers

Related articles
[4 earlier articles]
Re: Why are LR parsers faster than using if conditions cdc@maxnet.co.nz (Carl Cerecke) (2004-06-21)
Re: Why are LR parsers faster than using if conditions t.zielonka@zodiac.mimuw.edu.pl (Tomasz Zielonka) (2004-06-25)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-26)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-28)
Re: Why are LR parsers faster than using if conditions clint@0lsen.net (Clint Olsen) (2004-06-28)
Re: Why are LR parsers faster than using if conditions tmoog@panix.com (2004-06-30)
Re: Why are LR parsers faster than using if conditions haberg@matematik.su.se (Hans Aberg) (2004-06-30)
Re: Why are LR parsers faster than using if conditions christian.bau@cbau.freeserve.co.uk (Christian Bau) (2004-07-11)
Re: Why are LR parsers faster than using if conditions try_vanevery_at_mycompanyname@yahoo.com (Brandon J. Van Every) (2004-07-13)
Re: Why are LR parsers faster than using if conditions paulbmann@yahoo.com (Paul B Mann) (2004-07-28)
| List of all articles for this month |

From: Hans Aberg <haberg@matematik.su.se>
Newsgroups: comp.compilers
Date: 30 Jun 2004 23:13:59 -0400
Organization: Compilers Central
References: 04-06-122
Keywords: parse
Posted-Date: 30 Jun 2004 23:13:59 EDT

Clint Olsen <clint@0lsen.net> wrote:
>... your comment about matching long input doesn't jive with the
>flex documentation. In fact, you usually *want* to match the longest
>input possible to spend as little time as possible doing work per
>character of input. The flex documentation goes to great lengths to
>show how refactoring can affect performance in a positive way at the
>expense of some more verbose and seemingly redundant regular
>expressions. The LOPLAS article on re2c also suggests this technique.
>It can be likened to loop unrolling in the regular-expression domain.
...
>I do seem to recall that the flex developers have dealt with the
>historical problem of keeping track of yylineno affecting lexer
>performance. The long input match problem you mentioned seems to
>indicate a problem with their input buffering mechanism.


These matters have (of course) been thoroughly discussed in the Help-flex
mailing list:
    help-flex@gnu.org
    http://mail.gnu.org/mailman/listinfo/help-flex


And, indeed, I also think that the long rule matching problem suggests
a buffering problem in flex. The problem has been noted, if not fixed
by now, it will be in the future. Until then, the case where
tokenization make a significant difference is for comments: There it
is far more faster to scan a multi-line comment line-by-line that
doing it in one whole chunk. See the thread "Nested comments" in the
Flex mailing list, especially Akim Demaille's suggesting 2003-07-03. I
gather the same thing applies if one admits multiline strings. Then it
is probably faster to also scan them line by line, and join the
segments together in the lexer actions. As for the question of
yylineno tracking, turning it on/off, and lexing performance I could
not find any significant difference in the example.


Here is the Flex code I use for nestable [* ... *] style comments:
"[*" { BEGIN(comment); comment_level = 1; }
<comment>{ /* Comments. */
    "*]" { /* End of the comment. */
        if (--comment_level == 0) {
            BEGIN INITIAL;
        }
    }


    "[*" { comment_level++; }
    [^*[\n]+ {}
    \n+ {}
    . { /* Stray `*' and `['. */ }


    <<EOF>> {
        std::cerr << "Error: Nested comments not properly closed at end of
file.\n";
        BEGIN(INITIAL); return YYERRCODE;
        /* exit_set (exit_scan); */
    }
}
"*]" { std::cerr << "Error: Too many comment closings *].\n";
                BEGIN(INITIAL); return YYERRCODE; }


As you can see, the line
    [^*[\n]+
only scans the comment line-by-line.


    Hans Aberg


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.