|lexing backwards email@example.com (Stefan Monnier) (2003-04-05)|
|Re: lexing backwards firstname.lastname@example.org (2003-04-07)|
|Re: lexing backwards email@example.com (Chris F Clark) (2003-04-07)|
|Re: lexing backwards firstname.lastname@example.org (Marat Boshernitsan) (2003-04-07)|
|Re: lexing backwards email@example.com (Stan Zaborowski) (2003-04-13)|
|Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-04-13)|
|Re: lexing backwards firstname.lastname@example.org (Stefan Monnier) (2003-04-15)|
|Re: lexing backwards cfc@TheWorld.com (Chris F Clark) (2003-04-15)|
|Re: lexing backwards email@example.com (2003-05-06)|
|Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-14)|
|Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-16)|
|Re: lexing backwards firstname.lastname@example.org (2003-05-16)|
|[2 later articles]|
|From:||"Ron Pinkas" <Ron@Profit-Master.com>|
|Date:||13 Apr 2003 12:18:45 -0400|
|Posted-Date:||13 Apr 2003 12:18:44 EDT|
> One can understand this by looking at the three general classes of
> tokens that exist in most programming languages.
I'm happy you brought this point, because I was "forced" to develop a lexing
engine after recognizing that there are few very specific classes of tokens,
but no lexing engine I was familiar with tried to offer a solution based on
this approach. After carefuly reviewing few programming languages I found
the following classes of tokens:
These are like the C Language:
They are single charcaters that outside of the context of Streems are
unconditional delimiters of prior input and are also tokens on their own.
These may also be considered a sub class of Delimiters, i.e. Disposable
Delimiters. Once found outside the context of a Strem, they function as a
terminator of the prior input, but they themself are no longer valuable, and
may be disposed of.
These are tokens like the C language:
-> ++ -- := ==
That's to say no Delimiter is required to terminate such token. One may also
think of this class of tokens, as Multi-Character delimiters. Once found in
the input outside the context of a Stream, they serve as unconditional
terminator of the prior input, and are also tokens on their own.
These are like the C language:
"This is a string"
Streams are made of a Stream Prefix like " in C, (' and [ or even [[ in some
other languages) followed by a steam of any number of charcters terminated
with the given *matching* Stream Terminator, like " in C (may also be multi
[Comments may also be considered streams, though they may be commonly
handled at a pre-lexing statge.]
End of Line
These tokens are like the C language:
; (and OS dependants New Line character)
Once found outside the context of a Stream they serve as unconditional
terminators of the prior input, and are typically used as flags, indicating
the context of a New Line.
These are tokens like the C language:
int void signed volatile function
This class of tokens *must* be *delimited* by a pre and post Delimiter (or
disposable delimiters). While this class of tokens is considered reserved
words in the C language they may just as well be non reserved in other
languages, where context allows them to be non reserved tokens.
These tokens are like the C Language:
static switch case while
This class of tokens *have* to be the *first* non disposable token in a
given line (signified by BOF or EOL).
Any and all input found *between* the 5 kinds of unconditional terminators
( Delimiters, Self Contained, White Space, Streams, and End of Line ) that
are *not* Words or Key Words, are Elements of the given language, and are
usually divided to:
Since I regard data driven solutions to be generally superior
solutions, I developed SimpLex
(http://sourceforge.net/projects/simplex) a Lexing engine accepting
simple definitions of the above classes of tokens for a given language
and thus serves as a full featured scanner for the given language.
Such scanner is tipically about 1/4 the size of an eqivalent [F]Lex
generated Scanner [mariginally faster too], and does not require a
"compilation" step as is needed by [F]Lex.
Return to the
Search the comp.compilers archives again.