Re: What does it mean to "move characters" in the lexer?

gah4 <gah4@u.washington.edu>
Tue, 21 Jun 2022 10:30:58 -0700 (PDT)

          From comp.compilers

Related articles
What does it mean to "move characters" in the lexer? costello@mitre.org (Roger L Costello) (2022-06-21)
Re: What does it mean to "move characters" in the lexer? gah4@u.washington.edu (gah4) (2022-06-21)
Re: What does it mean to "move characters" in the lexer? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-22)
Re: What does it mean to "move characters" in the lexer? tkoenig@netcologne.de (Thomas Koenig) (2022-06-22)
| List of all articles for this month |

From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Date: Tue, 21 Jun 2022 10:30:58 -0700 (PDT)
Organization: Compilers Central
References: <AdiFWBix4QF9p6qWTPmZjnkljZpiHA==> 22-06-057
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="37104"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, performance, comment
Posted-Date: 21 Jun 2022 15:49:51 EDT
In-Reply-To: 22-06-057

On Tuesday, June 21, 2022 at 9:25:12 AM UTC-7, Roger L Costello wrote:


(snip)
> Because a large amount of time can be consumed moving characters, specialized
> buffering techniques have been developed to reduce the amount of overhead to
> process an input character.


(snip)
> I don't understand what they mean by "moving characters". Do they mean copying
> characters? Do they mean reading characters from a file into memory? Would you
> explain what this "character movement" thing is all about, please?


Yes it is copying, and yes it can take a lot of the time.


On many systems, the disk controller reads the data into its own buffer,
and then the OS copies the data from the controller buffer into its buffer.


Then when the user does an I/O (input) request, the data is copied
into the program's own buffer, and finally into the place where the data
actually goes. So maybe four copies.


Early in the days of TCP/IP there was trailer encapsulation.
(I never saw it used, but some have the ability to turn it on.)


If you follow the ISO seven letter model, or even if you don't.


The program gives data to TCP, which divides it up into
packets to send. Each of those gets a TCP header.
It is then passed to IP where IP puts its header on.
And then before sending, it gets an Ethernet header.


Since there is often something before the buffer, but
the buffer might not be full, so there might be space at
the end, there was trailer encapsulation. Instead of
putting the TCP and IP header on the beginning, you
put them on the end! Less copying!


I believe people found other ways to reduce copying, though.


The I/O hardware for IBM S/360 copies data directly
from the I/O device into memory. (Memory was expensive!)
Also, it is blocked on disk the same as it is for the user,
unlike most systems now. It would be usual, though,
for the last copy -- from the I/O buffer to/from the actual
data area -- to be an actual copy. IBM has locate mode
I/O to eliminate that one. For locate mode, instead of
copying, the program gets a pointer to the actual buffer.
(That works in assembly and PL/I, C hadn't been invented.)


For write, you request the address of the output buffer,
operate on the data there, and then request it be written.


There has been much work over the years on reducing
the amount of data copying, or operations needed to
copy it. For byte addressed machines, to copy data
a whole word at a time. (Depending on alignment.)


There are also search algorithms like Boyer-Moore,
to search strings without looking at every character.
[Now you can usually ask operating systems to map a file into your
process so there is no extra copying at all, the disk reads a block
into a page frame and you address the data directly in that page
frame. In a program called grepcidr that does grep-like searches for
IP address strings, for large files it got somewhat faster when I
switched from stdio to mapping the whole file in and treating it as
one big string. This is pretty remote from compilers, though. tl;dr
less copying is faster. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.