Re: State-of-the-art algorithms for lexical analysis?

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Tue, 7 Jun 2022 19:40:11 +0300

From comp.compilers

Related articles
[3 earlier articles]
Re: State-of-the-art algorithms for lexical analysis? costello@mitre.org (Roger L Costello) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
State-of-the-art algorithms for lexical analysis? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-07)
*Re: State-of-the-art algorithms for lexical analysis? christopher.f.clark@compiler-resources.com (Christopher F Clark)* (2022-06-07)**
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-08)
Re: counted characters in strings robin51@dodo.com.au (Robin Vowels) (2022-06-10)
Re: counted characters in strings martin@gkc.org.uk (Martin Ward) (2022-06-11)
Re: counted characters in strings drb@msu.edu (2022-06-11)

| List of all articles for this month |

From:	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups:	comp.compilers
Date:	Tue, 7 Jun 2022 19:40:11 +0300
Organization:	Compilers Central
References:	22-06-006 22-06-007 22-06-008 22-06-013 22-06-015
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="50232"; mail-complaints-to="abuse@iecc.com"
Keywords:	lex, comment
Posted-Date:	07 Jun 2022 13:05:09 EDT

Yes, as our moderator explained. I was talking about things like
FORTRAN Hollerith strings, but more importantly network packets, where
they give the size of the "field" within a packet and then you simply
take that many characters (or bytes or bits or some other quanta) as
the "token". This is quite important for parsing "binary" data. And,
sometimes the numbers are text like I showed but in many protocols the
numbers are "binary" e.g. something like

\xAHabcdefghij where \xA is a single 8 bit character (octet) whose
bits are "0000 1010" (or maybe 4, 8 bit, characters -- 4 octets),
that represent a 32 integer).

And, as our moderator pointed out, this makes a terrible regular
expression, NFA, DFA, but it is actually quite easy in nearly any
programming language. You read the length in, convert it to an integer
and then loop reading that many characters from the input and call
that a "token".

Kind regards,
Chris

--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------
[Right. When I was writing Fortran lexers, Hollerith strings were among the
simplest of the kludges I had to use. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: State-of-the-art algorithms for lexical analysis?

Christopher F Clark <christopher.f.clark@compiler-resources.com>Tue, 7 Jun 2022 19:40:11 +0300

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Tue, 7 Jun 2022 19:40:11 +0300