Re: State-of-the-art algorithms for lexical analysis?

Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Tue, 7 Jun 2022 06:52:45 +0200

          From comp.compilers

Related articles
[2 earlier articles]
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? costello@mitre.org (Roger L Costello) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
State-of-the-art algorithms for lexical analysis? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-07)
Re: State-of-the-art algorithms for lexical analysis? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-07)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-08)
Re: counted characters in strings robin51@dodo.com.au (Robin Vowels) (2022-06-10)
Re: counted characters in strings martin@gkc.org.uk (Martin Ward) (2022-06-11)
Re: counted characters in strings drb@msu.edu (2022-06-11)
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups: comp.compilers
Date: Tue, 7 Jun 2022 06:52:45 +0200
Organization: Compilers Central
References: 22-06-006 22-06-007 22-06-008 22-06-013
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="28212"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, comment
Posted-Date: 07 Jun 2022 10:51:05 EDT

On 6/6/22 8:16 PM, Christopher F Clark wrote:


> In fact, there is only thing that I have not seen a DFA lexer do that I think is
> worth doing at the lexical level (and not via a screener). That is recognizing
> tokens the start with a length prefix, e.g. 10Habcdefhij. Such tokens are
> common in things like network protocols and they would be relatively easy
> to implement, but I've not seen it done.


I'm not sure what you mean. The nnH syntax has to be included into
general number syntax (like 0x... or nnE...).


Or do you mean a token built from the next nn input characters? In this
case both a lower and upper bound were interesting for e.g. (recognized)
identifier length or distinction of Unicode codepoint formats.


> Beyond that it is my relatively firm belief that one should almost always
> have only simple regular expressions, e.g. that the one for floating point
> numbers should be one of the most complex ones. Otherwise you are trying
> to do too much in the scanner. And you are asking for trouble when you do.


ACK


DoDi
[I believe he means Fortran style Hollerith strings, where the number says
how many characters are in the following string. The number is just a count,
not semantically a number in the language. DFAs can't do that other than by
enumerating every possible length. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.