Re: What does it mean to "move characters" in the lexer?

Kaz Kylheku <480-992-1380@kylheku.com>
Wed, 22 Jun 2022 01:05:52 -0000 (UTC)

          From comp.compilers

Related articles
What does it mean to "move characters" in the lexer? costello@mitre.org (Roger L Costello) (2022-06-21)
Re: What does it mean to "move characters" in the lexer? gah4@u.washington.edu (gah4) (2022-06-21)
Re: What does it mean to "move characters" in the lexer? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? tkoenig@netcologne.de (Thomas Koenig) (2022-06-22)
| List of all articles for this month |

From: Kaz Kylheku <480-992-1380@kylheku.com>
Newsgroups: comp.compilers
Date: Wed, 22 Jun 2022 01:05:52 -0000 (UTC)
Organization: A noiseless patient Spider
References: 22-06-057
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="24266"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, performance
Posted-Date: 21 Jun 2022 21:47:45 EDT

On 2022-06-21, Roger L Costello <costello@mitre.org> wrote:
> Hi Folks,
>
> Page 89 of the dragon book says:
>
> Because a large amount of time can be consumed moving characters, specialized
> buffering techniques have been developed to reduce the amount of overhead to
> process an input character.


It's not clear what exactly this is referring to, but probably just to
the practice of making multiple copies of the same data in the
processing stack. If we put an imaginary tracer on a single character,
we may see it hopping among multiple buffers. In the Chaper 2 lexical
analyzer, getchar is used to read a character; getchar fills a buffer in
the stdio stream, and the program is sucking it out from there one
character at a time. So to build a lexeme, it has to have its own buffer
for the lexeme, which is another copy of the data.


The technique described in chapter 3 allows bulk reads into a buffer,
eliminating the stream library. The tokens are delimited right inside
the buffer, reducing a copy.


[The technique seems closely related to the "flip buffer",
which can be found in places like TTY implementations of Unix
kernels. Linux had one; there was once a "struct tty_flip_buffer".
At a quick glance, it looks like that today there are nmore than two
buffers used which are allocated and freed on the fly. There is still a
<linux/tty_flip.h> which contains a modicum of "flip" terminology,]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.