Re: Buffered input for a lexer?

Ray Dillinger <bear@sonic.net>
23 Apr 2002 00:03:46 -0400

          From comp.compilers

Related articles
[16 earlier articles]
Re: Buffered input for a lexer? joachim_d@gmx.de (Joachim Durchholz) (2002-04-16)
Re: Buffered input for a lexer? cgweav@aol.com (2002-04-17)
Re: Buffered input for a lexer? rhyde@cs.ucr.edu (Randall Hyde) (2002-04-19)
Re: Buffered input for a lexer? monnier+comp.compilers/news/@RUM.cs.yale.edu (Stefan Monnier) (2002-04-19)
Re: Buffered input for a lexer? rhyde@cs.ucr.edu (Randall Hyde) (2002-04-20)
Re: Buffered input for a lexer? joachim_d@gmx.de (Joachim Durchholz) (2002-04-23)
Re: Buffered input for a lexer? bear@sonic.net (Ray Dillinger) (2002-04-23)
Re: Buffered input for a lexer? rhyde@cs.ucr.edu (Randall Hyde) (2002-04-23)
| List of all articles for this month |

From: Ray Dillinger <bear@sonic.net>
Newsgroups: comp.compilers
Date: 23 Apr 2002 00:03:46 -0400
Organization: Compilers Central
References: 02-04-061 02-04-081 02-04-094
Keywords: lex, practice
Posted-Date: 23 Apr 2002 00:03:46 EDT

ralph@inputplus.co.uk wrote:


> Unix text files have always had lines *terminated*, not separated,
> with newlines, e.g. ASCII character 10.
>
> AFAIK it wasn't until EMACS showed its face that Unix text files
> started appearing with an unterminated final `line'.
>
> What editor do you use?


Emacs, as you surmised.


> > I actually think it would make sense if an editor's language mode
> > just did it automatically (terminate all saved source files with a
> > single end-of-line in the native syntax of the host system when you
> > save the file).
>
> EMACS does have some option or other to always terminate the last line.
> It's a shame it isn't on by default.


Actually, I wouldn't mind terribly if it did simply append a newline
character. The default setting for the emacs in my linux distribution
(which I disabled immediately because it was annoying) was to pester
the user about it. Whenever I saved a file, it would pester me with:
"File does not end in newline. Add one?" instead of saving the file.
So I edited the modes and it doesn't any more.


Anyway, I've always just regarded this as one of the border cases
lexers need to deal with. You don't have any assurance that someone's
not going to try and lex an arbitrary binary file, or a directory, or
stdin from some other thread that might get cut off without a newline,
or anything else. You can treat it as an error case if you prefer,
and halt and return a reasonable error message instead of lexing such
a file; but if your lexer *crashes* on it your lexer is just wrong.


I get a character count before I start lexing, and keep a count of
the characters lexed (and a separate count of newlines) while lexing.
I needed to keep counts for the purpose of giving line-and-column error
messages for lexical errors, anyway. But since I keep counts, it's
very simple to just stop reading after the last character in the file.
Speed isn't important enough to me to deliberately introduce the
possibility of an error.


Bear
[I think this thread has run its course. -John]





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.