Re: How to do on-the-fly parsing ( like in IDEs ) ?

"Bob Foster" <>
14 Sep 2003 17:09:51 -0400

          From comp.compilers

Related articles
How to do on-the-fly parsing ( like in IDEs ) ? (2003-09-09)
Re: How to do on-the-fly parsing ( like in IDEs ) ? (Andrew Begel) (2003-09-14)
Re: How to do on-the-fly parsing ( like in IDEs ) ? (Bob Foster) (2003-09-14)
Re: How to do on-the-fly parsing ( like in IDEs ) ? (Marco van de Voort) (2003-09-22)
| List of all articles for this month |

From: "Bob Foster" <>
Newsgroups: comp.compilers
Date: 14 Sep 2003 17:09:51 -0400
Organization: Comcast Online
References: 03-09-049
Keywords: parse
Posted-Date: 14 Sep 2003 17:09:50 EDT

"Remesh Job" <> wrote in message
> Can anybody tell me how to do "on-the-fly parsing" ... like they do in
> IDEs ie parsing while typing? This can be used for actions like code
> completion, error highlighting while editing text etc . I would
> preferably want to do the parsing with a parser generated using an
> automatic parser generator.

An interesting question which I hope will garner more responses.

I am intimately familiar with the parsing strategies used by two
products, Visual Cafe (an ex-product) and Eclipse, and know several
others, e.g., JEdit, NetBeans, less well. These products employ at
least two different levels of parsing. Foreground parsing is used to
determine syntax coloring and other easy-to-derive information in real
time; background parsing, by which I mean employing a full parser or
compiler in a background thread, is used for all other purposes, e.g.,
constructing the DAG for code completion, error highlighting, etc.

The engineering reason to divide up the work like this is simple:
syntax coloring is needed immediately, or drawing cannot be done
correctly, and needed fast, or editing will have user-perceptible
delays; on the other hand, the error highlighting, the derivation of
annotated DAGs for code completion, etc., is not needed so
urgently. If it follows the true state of the program by a few
seconds, it is not an issue.

This strategy is so successful commercially that it has been a long
while since I have heard of a product that uses only one parser for
all such applications. (Some products use three or four, but that is
another story.) One of the benefits of the foreground-background
strategy is that gives good results without requiring new science. It
is of little theoretical interest and might not be worth mentioning
except I have witnessed two occasions over the past few years where
otherwise professional developers did not appreciate the need for this
kind of division, and went through needlessly long periods where the
performance of their respective editors was unacceptable before coming
to grips with the problem.

Foreground parsing is more akin to lexing than parsing. Most syntax
coloring only needs to know what a lexer needs to know in order to
tokenize correctly. More sophisticated coloring schemes color (lex)
differently within different regions of a document. JEdit is a good
example of the latter; it builds table-driven lexing rules based on
XML declarations which allow recursive parsing states, including the
ability to tokenize differently for different languages in the same
document. A JSP document containing three languages - JSP
declarations, Java code and HTML or XML - is a practical example
served well by this kind of tool.

Eclipse uses ad hoc, hand-coded "scanners" each of which applies a set
of "rules" to partition a document into contiguous subranges; each
partition represents both a starting parse state and a consistent
lexing range. A second kind of scanner applies similiar rules to each
partition to color the text. This two-level representation handles
coloring in simple nested languages like the JSP example and, e.g.,
JavaDoc comments in Java code, but is not so easily extended to
recursive rules. The scanners could, of course, be generated from
grammars, but for most Eclipse editors, they are not. I have extended
this scheme to allow the partitions to be used as simple DAGS for such
things as outline views, by separating reconstruction of parent-child
relationships from partition scanning. Obviously, one could take this
further in the direction of a real-time incremental parser, but there
seems little engineering reason to do so.

Bob Foster

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.