|Parsing a text stream firstname.lastname@example.org (2004-04-28)|
|Re: Parsing a text stream email@example.com (Dmitry A.Kazakov) (2004-04-29)|
|Re: Parsing a text stream firstname.lastname@example.org (2004-05-02)|
|Re: Parsing a text stream email@example.com (Pete Jinks) (2004-05-02)|
|Re: Parsing a text stream Postmaster@paul.washington.dc.us (Paul Robinson) (2004-05-24)|
|Date:||28 Apr 2004 14:38:41 -0400|
|Posted-Date:||28 Apr 2004 14:38:41 EDT|
I surprisingly don't seem to be able to find a clear explanation of
"How to lexically analyse a chunk of text data".
What I'm looking for is a bit just a bit different form what I've
found so far. For example using a parser such as GOLD Parser with a
grammer , lets say HTML, we can parse a HTML file and tockenize it.
However !!, the problem is here. These parsers only succeed till the
end of data as long as every thing goes according to plan. If say you
have left out a HTML tag open, e.g "<FONT color=black
<I>something</I>", here FONT tag is not closed with a corresponding
">". As with all lexical analysers I have found so far, they can't
handle this sort of situations. If you ask why should they, then the
answer is in a text editor where somebody is not done with the code
yet, and syntax highlighting feature is supposed to ease the writer's
task, even unfinished tokens must be highlighted.
I have an idea already which is using Regular Expressions. The problem
with regex however is that we just can search and find a match. We
can't recognize parts and sections of a code - lets say in a C
program, - such as a function body or any other section made of
logical sub parts.
So in a summary, what I'm looking for is something with Lexical
Parsers capability and at the same time being able to handle errors
(whether by telling it, or make a new error handling mechanism) such
as what I said above.
If anyone in any kind is familiar with syntax highlighting (my actual
goal) or parsing stuff, I would be very pleased to hear any
suggestion, help, recommendation, etc.
Thanks for your time
[I'm not aware of any good way to parse snippets of code other than
ad-hoc regex hacks. Hey, PhD students, get on it. -John]
Return to the
Search the comp.compilers archives again.