Re: Tokenizer theory and practice

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Sun, 18 May 2008 10:29:59 +0200

          From comp.compilers

Related articles
[2 earlier articles]
Re: Tokenizer theory and practice mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-05-16)
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-17)
Re: Tokenizer theory and practice haberg_20080406@math.su.se (Hans Aberg) (2008-05-17)
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-17)
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-18)
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-18)
Re: Tokenizer theory and practice mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-05-18)
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-18)
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-20)
| List of all articles for this month |

From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.compilers
Date: Sun, 18 May 2008 10:29:59 +0200
Organization: cbb software GmbH
References: 08-05-050 08-05-066 08-05-069
Keywords: lex
Posted-Date: 19 May 2008 21:26:18 EDT

On Sat, 17 May 2008 10:22:34 +0200, Hans-Peter Diettrich wrote:


> Dmitry A. Kazakov schrieb:
>
>> When I do similar stuff, I do it in a way that the parser returned
>> typed objects rather than copies of the source. The whole idea to
>> copy the source is bogus, IMO.
>
> Indeed, textual copies are of little use. Can you suggest a
> descriptive formalism for the objects, returned by an lexer?


Not with a bottom-up approach. But when parser does it top-down or
else somewhere in the middle, it well knows what to expect at the
cursor. Being at the top it knows the exact type, so that parsing
either fails or yields a token. Below that it knows only some set of
types, i.e. in OO terms, a class of types. In this case the returned
token would be a polymorphic object from that class (or else a
failure). The class could be like "infix operation","literal" etc. In
fact, this is merely the abstract factory pattern. The parser acts a
factory, the parsed source at the cursor determines the concrete token
type and then its value.


I think this could be formalized. One premise is that the set of
tokens forms a tree/forest-like hierarchy, which is, I believe, almost
always the case.


--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.