|Simple question on lex/yacc specifications firstname.lastname@example.org (Eric Fowler) (2009-03-13)|
|Re: Simple question on lex/yacc specifications email@example.com (Eric Fowler) (2009-03-14)|
|Re: Simple question on lex/yacc specifications firstname.lastname@example.org (russell kym horsell) (2009-03-15)|
|Re: Simple question on lex/yacc specifications email@example.com (Max Hailperin) (2009-03-15)|
|Re: Simple question on lex/yacc specifications firstname.lastname@example.org (Eric Fowler) (2009-03-15)|
|From:||Eric Fowler <email@example.com>|
|Date:||Sun, 15 Mar 2009 16:30:26 -0700|
|Posted-Date:||15 Mar 2009 21:56:24 EDT|
I am aware using lex for this project is overkill but (a) I have a lot
of different sentence types to scan, and I want a consistent and
bulletproof way to do it (the specification I am working from defines
about 100-200 "sentences" that all look a little like this), and (b)
some of the fields themselves can be complicated and I want to tackle
them with a parser anyways, and (c) it's an excuse to get back into
learning lex and yacc with a simple problem set.
It seems most of my issues revolve around not knowing where I should
be doing error checking on the input. For instance, if I am expecting
a number less than 100 in a particular place, i.e., "...,50,..." at
what point should I be weeding out empty tokens, i.e., "...,,..." (in
other places I will have numeric fields that can be blank)?
Intuitively I think you want to get them early in the process but that
means the tokenizer just tells you if you have an empty field or a
NUMBER token. So I am defining tokens for NUMBER and for COMMA
(overkill again) and leaving it to the parser to figure it out ...
which is, as far as I can see now, the Right Way[tm] to do it.
Yes, I could be doing it all with strtok(). But I like doing things
the hard way.
PS. strtok() actually is not your best friend here because when you
get delimiters side-by-side with nothing intervening, strtok() removes
them all. For example, strtok(",,,FOO,,,", ",") will return the single
token "FOO" on it's first call and nothing thereafter. So you have to
tokenize another way. Not that it's real hard.
Return to the
Search the comp.compilers archives again.