Re: What should be check in Lexical Analyzer along with generating tokens?

"Joachim Durchholz" <joachim_d@gmx.de>
19 Sep 2002 01:12:56 -0400

          From comp.compilers

Related articles
What should be check in Lexical Analyzer along with generating token vikramasanjeeva@hotmail.com (Vikrama Sanjeeva) (2002-09-14)
Re: What should be check in Lexical Analyzer along with generating t joachim_d@gmx.de (Joachim Durchholz) (2002-09-19)
Re: What should be check in Lexical Analyzer along with generating tok clint@0lsen.net (Clint Olsen) (2002-09-20)
Re: What should be check in Lexical Analyzer along with generating cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-20)
Re: What should be check in Lexical Analyzer along with generating t joachim_d@gmx.de (Joachim Durchholz) (2002-09-22)
Re: What should be check in Lexical Analyzer along with generating clint@0lsen.net (Clint Olsen) (2002-09-25)
Re: What should be check in Lexical Analyzer along with generati clint@0lsen.net (Clint Olsen) (2002-09-29)
Re: What should be check in Lexical Analyzer along with generating tok joachim_d@gmx.de (Joachim Durchholz) (2002-09-29)
[5 later articles]
| List of all articles for this month |

From: "Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups: comp.compilers
Date: 19 Sep 2002 01:12:56 -0400
Organization: Compilers Central
References: 02-09-087
Keywords: lex
Posted-Date: 19 Sep 2002 01:12:56 EDT

Vikrama Sanjeeva wrote:
> The primary job of Lexical Analyzer is to generate tokens.But what
> other functionality is added in Lexical Analyzer in order to make it
> efficient?. I mean,
>
> 1: It may check for defined variables/identifiers
> 2:It may check for accurate opening and closing comments
> etc.


A lexer should avoid checks as far as possible IMHO. Reasons:


1. There isn't much efficiency to be gained. The check will be done,
and it will take about the same time, regardless of which phase of the
compiler does it. (There is some efficiency to be gained by keeping
the total number of passes down, but I think the days when that was
relevant are gone.)


Anyway, it's better to design the code for overall
flexibility. I.e. set the checks up so that you can move them from
lexer to parser to code generation and back, with no more adaptation
work than absolutely necessary. This way, you can redesign the
compiler with relative easy when efficiency (or any other of a dozen
design criteria) mandates it. Besides, your code will be cleaner -
making code modular does require a lot of work, but your code will
have better interfaces, less bugs, be more maintainable, and probably
also more efficient (efficiency considerations added at an early stage
tend to make things really complicated after a couple of maintenance
cycles, at which point the compiler will spend a lot of time working
around the limitations imposed by the early implementation).


(Been there, done all that. I've spend a lot of the last few months
trying to figure out how much of a long-forgotten optimization is
still lurking in our code.)


2. Other phases of the compiler may have better information to check
against. E.g. checks for defined names are impossible until after
semantic analysis.
You can interleave lexing, parsing and semantic analysis, but this
usually results in a complicated web of interdependencies between the
phases, with the associated maintenance problems.
This doesn't mean it isn't done. Actually many compilers do this - there
are lots of languages around where semantics has repercussions on
syntax. But if you have a choice, avoid intermixing layers.


In summary: first make it run, then make it run fast.
This is an old advice that probably predates my own professional
experience by years, but obviously it still bears repeating...


Afterthought: If eliminating comments from the source is a part of your
lexer, then of course it must check whether opening an closing comment
delimiters match. Contrary to common wisdom, I believe that the lexer
should not really do that - in modern times, there is too much software
around that does all sorts of code transformations that go beyond
translating to machine language (e.g. prettyprinting, documentation
extraction). Many of these tools want to retain the comments but still
need a lexed representation of the source.
A lossless lexer (i.e. a lexer that emits enough information to allow a
character-by-character reconstruction of the source) should be the most
flexible approach.


Regards,
Joachim


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.