|Syntax Highlighting email@example.com (Tim Roberts) (1997-01-16)|
|Re: Syntax Highlighting firstname.lastname@example.org (John Lilley) (1997-01-16)|
|Re: Syntax Highlighting email@example.com (Scott Stanchfield) (1997-01-19)|
|From:||"Scott Stanchfield" <firstname.lastname@example.org>|
|Date:||19 Jan 1997 21:46:52 -0500|
I posted a bit about this last month in comp.compilers.tools.pccts. Here
Not off topic at all -- it's a language parsing issue, ya know...
Anyway, I don't have any source (any more -- I started to write an editor
that did this a while ago on the Amiga) but I have some ideas.
First, will this be for an editor or just for output?
If for output, it's very easy. Basically, you write a recognizer for the
language you want to colorize (using PCCTS of course!) and have the
recognizer echo everything it sees with color information around the pieces
you want to color. One of the easiest ways to do this would be to create
an HTML flie from the source code that adds <font> tags around the
keywords, comments and other items you want to colorize. Then the browser
you use (MSIE, Netscape, whatever browser that supports the color attribute
to the <font> tag) does all the grunt work. For colorizing, you can take a
decent stab at it JUST BY SCANNING.
This means don't bother writing a grammar, just a lexical analyzer. For
#token "if" <<out("<font color=red>if</font>");>>
#token "else" <<out("<font color=red>else</font>");>>
#token "//" <<out("<font color=blue>"); mode(COMMENT);>>
#token "\n" <<out("<\font><br>");>>
#token "~" <<out(lextext());>>
and so on...
For an editor, it gets more complicated because you have to think about
partial parsing -- that is, the source code might not be valid input. You
can do the parsing at several points:
-- as the user types each character
-- as the user types a whitespace character color the previous word
-- as the user moves off a line parse the line
Again, you can do this with just a simple scanner that reads the segment to
parse. When I started my editor I used a line-based approach. Each line
would have attached to it some state information about the context of the
first character in the line. This information is basically:
-- am I in a string?
-- am I in a comment?
Those were the two things that could span multiple parts of a file. My
editor would re-color when the user left the line. Basically, you need to
scan from the edited line on until the context at the start of a line is
the same OR you have passed the last visible line on the screen. So if you
are in a comment at the start of the line, and the user enters a "*/" in
the line and leaves the line, the parse will continue through all lines
after changing the state of them until the old "*/" was found or you have
moved off screen. If you stop the parse because you have moved off screen
but the state of the next line would have changed, you need to keep track
of the fact that a parse is pending for that line -- if the user scrolls
down and the line (or lines after a pending line) become visible, you need
to restart the parse at the pending line and continue until the state
doesn't change or you hit an invisible line.
Doing it line-wise isn't too difficult. Word or character-wise could be
accomplished using the same logic (starting the parse on the current line
and moving on) just performing the parse after each keypress or space
entered. (There are lots of optimizations you can do here -- certain
keystrokes will never affect the state of the next line; only string and
comment delimiters will...)
> Sorry for the off topic question... Is there any public domain source
> code available for syntax coloring...
Return to the
Search the comp.compilers archives again.