|Parser validation/test suites ? email@example.com (Kenn Heinrich) (2006-08-09)|
|Re: Parser validation/test suites ? firstname.lastname@example.org (Karsten Nyblad) (2006-08-10)|
|Re: Parser validation/test suites ? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-08-10)|
|Re: Parser validation/test suites ? Colin_Paul_Gloster@ACM.org (Colin Paul Gloster) (2006-08-14)|
|Re: Parser validation/test suites ? email@example.com (2006-08-14)|
|Date:||14 Aug 2006 15:10:34 -0400|
|Posted-Date:||14 Aug 2006 15:10:34 EDT|
Kenn Heinrich wrote:
> A question perhaps more software engineering than compiler theory, but
> here goes:
> How do people construct a good validation and regression test system
> for a compiler, in particular a parser module? I'm thinking along the
> lines of a set of test source programs, and expected results. Some
> kind of script automagically runs all the sources, and produces a
> simple log file (Web page, ...?) which could be checked into your
> version control for tracking the compiler right along with the source.
> How this set might be organized? By directories, by a suite that must
> pass, a suite that must fail, a suite that tests optional extensions,
My compiler test suite consists of a series of directories where test
cases related to a particular compiler pass are collected. For example,
tests for the parser are in one directory, tests for the semantic
analysis in another, tests for a particular analysis in its own
directory, etc. Each test case is essentially a text file with the
first few lines containing a comment which has information for the test
harness. For example, a test case for the parser has a line that
indicates the test targets the parser; the testing framework locates a
harness that runs the parsing phase for the file and then compares the
result with what is specified as the expected result in the file.
> And how would you best indicate pass or fail? I've seen systems using
> special embedded comments that should match a compiler output, parsed
> out and chcked by a test suite dispatcher script, as well as systems
> that organize "good" source in one directory, "should fail with error
> XXX" sources in directory "XXX". What are some of the other schemes
> the masters recommend?
I think the answer depends on what part of the compiler the case
targets. For example, if you are testing a semantic error, you might
want to have a test case that should produce a compile error, and the
expected result should indicate the expected error type and/or message
and its location in the source test case. For an optimization test
case, the test case may include a code snippet that should match the
result of performing the optimization, which the test harness will
compare to the optimized code in a structural way. For an end-to-end
compiler test (i.e. compile this program to machine code and run it),
you could indicate the expected output of the program; the test harness
would compile the program, run it, and compare the expected result.
> And what types of source ought to go into a good test suite? A set of
> small programs, each excercising a small section of the grammar or
> semantic checker? Or one big program filled with nuance and subtlety?
I find small programs to be preferrable. It is far easier to pinpoint
the source of a failure in a small program that only uses a specific
feature than in a large complex program. Try to test each source
construct as independently as possible. There will be dependencies of
course, but do your best here; they'll be your best shot at pinning
down the root cause of a failure when changes are made later to the
Do include some large programs as test cases as well, however. I found
that some features can interact in complex ways that small programs
> How about grammar testing? I know there's a trick of introducing a
> special keyword or token into the token stream to allow parsing from
> arbitary productions (any production = goal), has anyone tried building
> a "grammar production tester" which lets you run a bunch of tests of one
> production in isolation? For example, a set of files containing only
> <expr> or <stmt> text to avoid having to repeatedly boilerplate a
> complete, legal top-level program just to check a simple statement.
I haven't felt the need to test individual productions this way; I have
a suite of parsing test cases that are essentially pass or fail. For
nasty corners, I treat them in a case-by-case basis, as I encounter
errors working on later stages of compilation.
At some point one must apply induction in testing the compiler and
assume for the sake of testing later phases that the earlier phases are
more or less correct, or at the very least, that errors in earlier
phases will manifest them in errors in later phases. Thus good coverage
in a later phase may be enough to establish confidence that earlier
passes of the compiler are working correctly.
You may also want to consider writing an interpreter for the language
that operates on the ASTs or IRs that your compiler uses internally.
This can be a huge boost in productivity gains for testing
optimizations, as you can write optimization and transformation passes
that automatically check themselves by actually running the code in in
the interpreter and comparing the output to the original.
Return to the
Search the comp.compilers archives again.