Re: What should be check in Lexical Analyzer along with generating tokens?

"Joachim Durchholz" <joachim_d@gmx.de>
22 Sep 2002 12:15:53 -0400

          From comp.compilers

Related articles
What should be check in Lexical Analyzer along with generating token vikramasanjeeva@hotmail.com (Vikrama Sanjeeva) (2002-09-14)
Re: What should be check in Lexical Analyzer along with generating t joachim_d@gmx.de (Joachim Durchholz) (2002-09-19)
Re: What should be check in Lexical Analyzer along with generating tok clint@0lsen.net (Clint Olsen) (2002-09-20)
Re: What should be check in Lexical Analyzer along with generating t joachim_d@gmx.de (Joachim Durchholz) (2002-09-22)
Re: What should be check in Lexical Analyzer along with generating clint@0lsen.net (Clint Olsen) (2002-09-25)
Re: What should be check in Lexical Analyzer along with generati clint@0lsen.net (Clint Olsen) (2002-09-29)
Re: What should be check in Lexical Analyzer along with generating tok joachim_d@gmx.de (Joachim Durchholz) (2002-09-29)
Re: What should be check in Lexical Analyzer along with genera joachim_d@gmx.de (Joachim Durchholz) (2002-10-13)
Re: What should be check in Lexical Analyzer along with generating tok lex@cc.gatech.edu (Lex Spoon) (2002-10-18)
Re: What should be check in Lexical Analyzer along with generating t joachim_d@gmx.de (Joachim Durchholz) (2002-10-20)
[2 later articles]
| List of all articles for this month |

From: "Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups: comp.compilers
Date: 22 Sep 2002 12:15:53 -0400
Organization: Compilers Central
References: 02-09-087 02-09-110 02-09-121
Keywords: lex, design
Posted-Date: 22 Sep 2002 12:15:53 EDT

Clint Olsen wrote:
> Joachim Durchholz wrote:
>
>>Contrary to common wisdom, I believe that the lexer should
  >>not really [do checking]
>
> But you do agree that it's the lexer's job to check comments,
  > right?


That depends.
If your software should simply discard any comments, it's easiest to
discard them in the lexer.
If comments are just blobs of text, then you're still best off doing
comment recognition by hand.
Things begin to change once you want to do more with the stuff within
comments. For example, documentation extraction tools expect language
entities in some comments; most languages require string detection
within comments; you may be writing a language processing toolkit, and
you want to lex the comment contents anyway, because you foresee that
some editor will want to do syntax highlighting even within comments.


For a lexer in full generality, it can be advantageous to split the
lexer into several levels:
- Reader. Character set conversion, line ending conventions.
      Also keeps track of line and column numbers.
- Tokenizer. Groups characters into tokens.
      In particular, does string recognition (for nestable comments,
      these must be known to avoid mis-lexing stuff like /* "*/" */).
- Comment recognition.


  > How do you expect to be able to represent comments in a CFG?


comment ::= "/*" {token} "*/"
token ::= comment
                  | string | integer | ... (literals)
                  | "if" | "then" | ... (keywords)
                  | <error>


(The <error> token is meant to be whatever the lexer returns if a
character sequence is not a legal token.)


Regards,
Joachim


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.