|Parser testing while grammar rules are unknown firstname.lastname@example.org (0x0badf00d) (2009-03-15)|
|Re: Parser testing while grammar rules are unknown email@example.com (russell kym horsell) (2009-03-15)|
|Re: Parser testing while grammar rules are unknown cfc@shell01.TheWorld.com (Chris F Clark) (2009-03-15)|
|Re: Parser testing while grammar rules are unknown firstname.lastname@example.org (Martin Ward) (2009-03-16)|
|From:||russell kym horsell <email@example.com>|
|Date:||Sun, 15 Mar 2009 03:05:10 +0000 (UTC)|
|Organization:||Central Iowa (Model) Railroad, Plano, TX, USA|
|Posted-Date:||15 Mar 2009 16:58:01 EDT|
0x0badf00d <firstname.lastname@example.org> wrote:
> I'm sorry if this group is not relevant. I'll be thankful for any idea,
> on where to look for answer.
> Let's say, we have language somehow similar to ADA. And we have a parser
> /compiler for this language. And we know it contain bugs.
> We do not have grammar rules for language, instead, we have a tons of
> source code written in this language.
> We would like to test parser against possible bugs. We could take source
> code, alter it slightly and pass to parser and see, if it will return
> error. Or, we could apply Markov chains algorithm (which could alter
> human readable texts into humorous): but this is not seems very good
> Are there something ready and known?
OK. You have at least 2 choices. An old test dating back to the 60s is
to simply take real source code and shuffle up the lines (or
statements in a modern version). This at least checks the compiler
doesn't dump core. You may also be able to automate a test for
"reasonableness of error messages/recovery" using this "mash up" idea.
Of course, learning the language from samples of source code is an
undecidable problem and even approximations to it are in the AI "hard"
basket. A simple approximation as you suggest is a statistical
approach where sample code is segmented (e.g. tokenised or broken into
common sequences of tokens) and a Markov model built to generate
further samples not guaranteed to be in the language. I.e. you again
can only check for core dumps, because generated samples can not be
absolutely guaranteed to be correct -- if the compiler gets syntax
errors from your Markov model it doesn't mean there is a parser bug.
Return to the
Search the comp.compilers archives again.