|Building a scanner(or pattern matcher) from a CFG grammar? email@example.com (Georgios Petasis) (2004-12-16)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? firstname.lastname@example.org (Ira Baxter) (2004-12-17)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? email@example.com (Georgios Petasis) (2004-12-19)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? firstname.lastname@example.org (christopher diggins) (2004-12-19)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? email@example.com (christopher diggins) (2004-12-22)|
|From:||"Georgios Petasis" <firstname.lastname@example.org>|
|Date:||19 Dec 2004 23:50:18 -0500|
|Organization:||National Technical University of Athens, Greece|
|Posted-Date:||19 Dec 2004 23:50:18 EST|
Thank you very much for your reply. I am downloading the
"XT" tools to see what they provide.
What I am trying to do is to use a CFG for pattern matching.
I have a corpus, and I want to identify some specific text portions in it.
I dump all the annotated text portions I want to learn how to locate, and
using a grammatical inference algorithm of my own, I create a
CFG grammar from the text portions. What I am missing is a
way to locate the text portions covered by the (generalised) CFG
I cannot use a parser (like yacc) for this, as I don't want to match
the whole corpus, just parts of it. A way would have been with a
parser that does "partial matches", but my tests with PCPATR for
example have shown that the return parse trees are not so
trustworthy. So, I am looking for a solution for this problem...
(Also, I think that yacc cannot handle arbitrary CFGs...)
What is the "DMS Software Reengineering Toolkit" you are reffering at
the end of your post? Is it something I can try? Right now, the only
solutions I have found is the XT tools (which I haven't yet evaluated)
and the GrammarForge toolkit, which seems to have a parser capable of
working in scanning mode.
"Ira Baxter" <email@example.com> wrote
>> I am trying to find software that can convert a CFG grammar into a
>> scanner, for locating patterns in text. I don't want a full parse, but
>> find patterns in free text (like for example lex-based scanner do),
>> but with the pattern language being CFG instead of regular. Any ideas?
> Well, if you want to use a CFG, then you can't avoid "parsing" in the
> conventional sense of the word. There's an easy way to do this: build
> a parser whose tokens are individual characters; then you can use
> "YACC". This may be rather inefficient, but perhaps you don't care.
> Eelco Visser did a lot of interesting work on "scannerless" parsers
> that are driven by arbitrary CFGs. Google for "scannerless parsers".
> Much of this work has been packaged up in "XT", which is an academic
> program transformation tool; you can find this at.
> http://catamaran.labs.cs.uu.nl/twiki/pt/bin/view/Tools/XT. You don't
> have to use all the machinery; you could use just the parser generator
> part. (I haven't used these tools personally, but I have great
> respect for the people that built them).
> Our experience is that when you have enough complexity to justify
> parsing (as opposed to regular expressions), that the "text" you are
> searching really does break into words, and so conventional lexing
> techniques apply. This helps make the process efficient. We do
> arbitrary pattern recognition on arbitrary structured text using
> arbitrary CFGs with our DMS Software Reengineering Toolkit.
> Can you describe why you don't want to have a lexer?
Return to the
Search the comp.compilers archives again.