Re: Learning only one lexer made me blind to its hidden assumptions

"Ev. Drikos" <drikosev@gmail.com>
Wed, 13 Jul 2022 14:58:50 +0300

From comp.compilers

Related articles
Learning only one lexer made me blind to its hidden assumptions costello@mitre.org (Roger L Costello) (2022-07-07)
Re: Learning only one lexer made me blind to its hidden assumptions luser.droog@gmail.com (luser droog) (2022-07-12)
Re: Learning only one lexer made me blind to its hidden assumptions jvilar@uji.es (Juan Miguel Vilar Torres) (2022-07-13)
*Re: Learning only one lexer made me blind to its hidden assumptions drikosev@gmail.com (Ev. Drikos)* (2022-07-13)**
Re: Learning only one lexer made me blind to its hidden assumptions antispam@math.uni.wroc.pl (2022-07-13)
Re: Learning only one lexer made me blind to its hidden assumptions gneuner2@comcast.net (George Neuner) (2022-07-14)
Re: Learning only one lexer made me blind to its hidden assumptions 480-992-1380@kylheku.com (Kaz Kylheku) (2022-07-15)
Re: Learning only one lexer made me blind to its hidden assumptions antispam@math.uni.wroc.pl (2022-07-15)

| List of all articles for this month |

From:	"Ev. Drikos" <drikosev@gmail.com>
Newsgroups:	comp.compilers
Date:	Wed, 13 Jul 2022 14:58:50 +0300
Organization:	Aioe.org NNTP Server
References:	22-07-006
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="78718"; mail-complaints-to="abuse@iecc.com"
Keywords:	lex, history
Posted-Date:	13 Jul 2022 11:23:44 EDT
Content-Language:	en-US

On 07/07/2022 20:49, Roger L Costello wrote:
> ...

> Difference:
> - Flex allows overlapping regexes. It is up to Flex to use the 'correct'
> regex. Flex has rules for picking the correct one: longest match wins, regex
> listed first wins.
> - ScanGen does not allow overlapping regexes. Instead, you create one regex
> and then, if needed, you create "Except" clauses. E.g., the token is an
> Identifier, except if the token is 'Begin' or 'End' or 'Read' or 'Write'
>
> ...

As you can imagine there are many such options. A DFA builder may have
options a) to behave as Flex b) to treat only some tokens as reserved,
others as non reserved and c) to allow you examine shorter matches.

Who knows what else there is out there! (I don't claim to be an expert)

> Difference:
> - Flex deals with individual characters
> - ScanGen lumps characters into character classes and deals with classes. Use
> of character classes decreases (quite significantly) the size of the
> transition table
>

FYI, there is also a related controversial issue that may fire flames!

Bison also doesn't support character classes and this could be a reason
that scannerless parsing sounds weird to several people. Of course one
may use Bison down to the character level, but with many more states.

Also, if the grammar allows two consecutive identifiers, a lookahead
operator is likely necessary. (admittedly, a better alternative to
scannerless parsing may be different start states as supported by Flex).

When I played in the past with a scannerless GRL parser for SQL I hadn't
seen dramatic runtime slow downs with a few single/multi line commands.
Yet, I wouldn't try (or suggest) such an approach for XML processing.

> ...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Learning only one lexer made me blind to its hidden assumptions

"Ev. Drikos" <drikosev@gmail.com>Wed, 13 Jul 2022 14:58:50 +0300

"Ev. Drikos" <drikosev@gmail.com>
Wed, 13 Jul 2022 14:58:50 +0300