Re: Parser testing while grammar rules are unknown

russell kym horsell <>
Sun, 15 Mar 2009 03:05:10 +0000 (UTC)

          From comp.compilers

Related articles
Parser testing while grammar rules are unknown (0x0badf00d) (2009-03-15)
Re: Parser testing while grammar rules are unknown (russell kym horsell) (2009-03-15)
Re: Parser testing while grammar rules are unknown (Chris F Clark) (2009-03-15)
Re: Parser testing while grammar rules are unknown (Martin Ward) (2009-03-16)
| List of all articles for this month |

From: russell kym horsell <>
Newsgroups: comp.compilers
Date: Sun, 15 Mar 2009 03:05:10 +0000 (UTC)
Organization: Central Iowa (Model) Railroad, Plano, TX, USA
References: 09-03-061
Keywords: C, testing
Posted-Date: 15 Mar 2009 16:58:01 EDT

0x0badf00d <> wrote:
> Hi.
> I'm sorry if this group is not relevant. I'll be thankful for any idea,
> on where to look for answer.
> Let's say, we have language somehow similar to ADA. And we have a parser
> /compiler for this language. And we know it contain bugs.
> We do not have grammar rules for language, instead, we have a tons of
> source code written in this language.
> We would like to test parser against possible bugs. We could take source
> code, alter it slightly and pass to parser and see, if it will return
> error. Or, we could apply Markov chains algorithm (which could alter
> human readable texts into humorous): but this is not seems very good
> solution.
> Are there something ready and known?

OK. You have at least 2 choices. An old test dating back to the 60s is
to simply take real source code and shuffle up the lines (or
statements in a modern version). This at least checks the compiler
doesn't dump core. You may also be able to automate a test for
"reasonableness of error messages/recovery" using this "mash up" idea.

Of course, learning the language from samples of source code is an
undecidable problem and even approximations to it are in the AI "hard"
basket. A simple approximation as you suggest is a statistical
approach where sample code is segmented (e.g. tokenised or broken into
common sequences of tokens) and a Markov model built to generate
further samples not guaranteed to be in the language. I.e. you again
can only check for core dumps, because generated samples can not be
absolutely guaranteed to be correct -- if the compiler gets syntax
errors from your Markov model it doesn't mean there is a parser bug.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.