Re: How to verify a parser from the language grammar

Chris F Clark <cfc@shell01.TheWorld.com>
11 Jan 2007 09:03:57 -0500

          From comp.compilers

Related articles
How to verify a parser from the language grammar salam3@connect.carleton.ca (Shahid Alam) (2007-01-09)
Re: How to verify a parser from the language grammar cfc@shell01.TheWorld.com (Chris F Clark) (2007-01-11)
Re: How to verify a parser from the language grammar torbenm@app-0.diku.dk (2007-01-11)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 11 Jan 2007 09:03:57 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 07-01-031
Keywords: parse, testing
Posted-Date: 11 Jan 2007 09:03:57 EST

Shahid Alam <salam3@connect.carleton.ca> writes:


> Hi,
>
> Does anybody know of any tool that lets you validate and verify your
> parser (hand implemented) from the BNF grammar of the language.
>
> -- Shahid
> [I doubt it. It's wouldn't be hard to contrive test cases that
> exercise all of the rules in a BNF grammar, but how are you going to
> test that the parser rejects everything that's not in the grammar?
> -John]


I doubt that are any existing tools that tool that do exactly what you
want. Now, whether you could build such a thing depends on what you
mean by validate.


For example, if by validate, you mean as John, our moderater, has
interpreted, that you wish to create a series of test inputs that
demonstrate irrefutable compliance, then John is absolutely correct,
you cannot do that, not even for simple regular grammars. The
theorems to prove that are from the machine learning circles, where
they try to establish whether a program can automatically "learn" a
language from examples and counter-examples. Except for finite
languages, it isn't possible without some simplifying assumptions.


Of course, you could make certain simplifying assumptions and make it
very possible. For example, very few languages that allow exactly n
items in a list, but not unboundedly long lists. Moreover, you aren't
likely to miscode the program so that it allowed only a specific
number of items in the list when you meant to code an unbounded list.
Therefore, one could substitute a 3 (or 4 or 5) item list for an
unbounded list and be reasonably certain you've caught any likely
logic errors. If you assume that you can get away with representing
unbounded lists by a small finite number, then you simply have to
repetitively walk the grammar creating a test consisting of all the
terminals you encounter on each walk, exploring all the possiblities
upto the desired length of list repetitions. That set of walks of the
grammar describe a finite language and will give you every string in
the finite language. You can even create the set of error cases, by
figuring out which finite strings, you have skipped. (If you look
hard enough, you may even find a tool that does this. I believe
someone wrote about such a tool and discussed it on this group (or in
Sigplan Notices) in the last couple of years.)


Now, there are some wrinkles if you go this route. Here are a couple
that I recall.


First, unless your grammar is exceptionally simple, certain tokens,
such as identifiers and literal constants, e.g. numbers and strings
are actually little languages themselves. One key thing one has to
determine is how one will substitute in values (actual identifier
names) when the grammar says that an identifer should occur.


Second, the tests so created are only guaranteed to be syntactically
valid. There may be semantics associated with the language that such
a tool will not know how to enforce. (For example, perhaps your
language requires variables to be declared before use.) Now, for
testing simple parser validity, this might not be an issue. However,
once your parser is complete and you have semantic checks, some of the
automatically generated tests might fail those checks.


Ok, that's one definition of validate. Another way of validating a
parser is to compare it to a known good parser. I actually use this
method when checking Yacc++. I compare the next version of the
generated parsers from Yacc++ with the previous version, and except
where I've made intentional changes, they should be identical.


Let's look at how you might do this. You say that you have a
hand-written parser. Therefore, it is likely that you've written a
recursive descent parser. You say, you also have a BNF for the
language. So, how do you get a known good parser to compare yours
too. Simple you run the BNF through a recursive descent parser
generator. Then, you compare that parser to yours. Of course, if you
can do that, then why generate the parser by hand to begin with?


Now, there is also a problem with this approach. The BNF may not be
the same grammar as you used to generate your parser. For example, it
might have different non-terminals and it may divide them up
differently. Sadly, if it does this, the comparison can be impossible
to make, because deciding if two arbitrary grammars parse the same
string is undecicable.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.