30 Apr 2001 22:21:46 -0400

Related articles |
---|

partial parsing Sonali@SetuIndia.com (Sonali Chitnis) (2001-04-26) |

Re: partial parsing daw@mozart.cs.berkeley.edu (2001-04-30) |

Re: partial parsing rkrayhawk@aol.com (2001-05-03) |

From: | daw@mozart.cs.berkeley.edu (David Wagner) |

Newsgroups: | comp.compilers |

Date: | 30 Apr 2001 22:21:46 -0400 |

Organization: | University of California, Berkeley |

References: | 01-04-134 |

Keywords: | parse |

Posted-Date: | 30 Apr 2001 22:21:45 EDT |

Originator: | daw@mozart.cs.berkeley.edu (David Wagner) |

Sonali Chitnis wrote:

*>I am working on an editor whose files adhere to a specific grammar.*

*>My requirement is the Editor should also be able to parse the file*

*>when the file is yet being created and not only after the file is*

*>written compeletly.*

Although I am not expert in this area, I know that there are many

papers on this topic (look up incremental parsing, etc.), and you

might find them interesting to read.

Nonetheless, despite my infamiliarity with the area, let me mention

one beautiful idea from the literature. I don't know whether it is

practical, but I want to mention it merely because it is so pretty.

In the usual parsing problem, we are given a context-free language L

and a word w, and we must decide whether w is in L.

In your partial parsing problem, I think you are asking us to imagine

a user who might write a program from start to end, and the goal is to

determine whether a partial program is a possible prefix of some legal

program or not. If I got that right, the partial parsing problem can

be formalized in the following way: given a cfl L and a word w, decide

whether there is any word w' in L so that w is a prefix of w'. In

other words, we are asking whether there exists any word in L matching

the regular expression w.Sigma^*, where Sigma is the alphabet.

Thus, we can consider the following problem:

The "intersection problem": Given a context-free language L and

a regular language R, decide whether L intersect R is non-empty.

Note that partial parsing can be reduced to this problem, if we let

R be the language recognized by w.Sigma^*; also, the usual parsing

can be reduced to this problem by taking R = {w} (so that R is the

language containing just the word w). Note that in both cases, R

forms a regular language.

Now the interesting part is that most parsing algorithms for general

context-free languages seem to apply not only to the usual parsing

problem but can also be applied to the "intersection problem" stated

above as well. Thus, we can take CYK parsing or Tomita parsing and

use them to solve the partial parsing problem by reducing to a special

case of the "intersection problem".

Or, you can note that the "intersection problem" is easily solved

by methods taught in basic automata theory classes. In particular,

we know that L' := L intersect R is a context-free language if L is

context-free and R is regular, and L' can be effectively computed.

Moreover, we know that given any context-free language L' we can

easily test whether L' is empty or not. These two basic results from

language theory yield algorithms for the "intersection problem".

We can also use this framework to consider more general types of

partial parsing, for instance, where one edits in the middle of the

program. This example can be expressed as some regular expression

like v.Sigma^*.w, where v is the part before what is currently edited

and w is the part after.

Endnote: Strictly speaking, I am describing above recognition problems

(where the answer is yes/no), not parsing problems (where one should

also output a parse tree), but everything I write about recognition

problems generalizes to full parsing as well.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.