Re: Debugging of optimized code (Max Copperman)
Tue, 24 Jan 1995 07:13:06 GMT

          From comp.compilers

Related articles
Debugging of optimized code (1995-01-09)
Re: Debugging of optimized code (1995-01-23)
Re: Debugging of optimized code (1995-01-13)
Re: Debugging of optimized code milt@Eng.Sun.COM (Milton Barber) (1995-01-23)
Re: Debugging of optimized code (Sean Levy) (1995-01-23)
Re: Debugging of optimized code (1995-01-24)
Re: Debugging of optimized code (1995-01-24)
Re: Debugging of optimized code (1995-01-26)
Re: Debugging of optimized code (1995-01-26)
Re: Debugging of optimized code (1995-01-27)
Debugging of optimized code (1995-01-27)
Re: Debugging of optimized code (Stefan Monnier) (1995-01-27)
Re: Debugging of optimized code (1995-01-27)
[15 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Max Copperman)
Keywords: debug, optimize
Organization: Rank Xerox Research Centre - Grenoble Laboratory
References: 95-01-036
Date: Tue, 24 Jan 1995 07:13:06 GMT

There are improvements in debugging optimized code which are
technologically practical. I have no idea when they will get into all
production compilers.

The most important of these is tracking variables as they move between
memory and registers. This is becoming commonplace for debuggers; it
isn't hard, and new debug formats support it.

Another is 'clear ways of describing the "current location in code" in
source terms, when code motion and cross-statement scheduling has blurred
the statement boundaries', in the absence of radical loop transformations.
The "clear way" is simple: highlight the source position of the
instruction about to be executed (which could be the middle of a
statement, or a declaration, if it involves initialization). If there is
more than one position, highlight them all. Control flow will potentially
dance about the screen, as opposed to proceeding down it. You lose the
characteristic that the code above the cursor has been executed and the
code below it has not---but you've lost that anyway once you've optimized
the code.

The enabling technology is an enhanced line table, in which the compiler
records the address of the first instruction in each contiguous block of
instructions that are generated from the same statements or set of
statements, along with the set of statements it is generated from.

That description is a bit hard to read, but the idea is not complicated.
Take the case when we have no sharing of code, so each instruction is
generated from at most one statement. Have the compiler record, for each
generated instruction, the statement is it generated from. Put this in a
table in increasing order of instruction addresses. Compress the table so
contiguous instructions generated from the same statement are represented
by the first instruction in the group. Extend this to the case when
instructions are generated from multiple statements by making the second
field in the table a set.

Note that the descriptive information about tne instruction need not be
limited to the statement it is generated from. Convex has a
compiler/debugger interface that records expressions, loops, blocks---I
forget exactly, you could look it up in a PLDI proceedings (SigPlan
notices from 1992 or 1993, I think).

This handles simple code motion and code sharing. It runs into problems
with major source-to-source transformations like loop inversion (when the
inner loop becomes the outer loop), loop reversal (running a loop
backwards), especially when these kinds of optimizations are composed with
each other. (Loop unrolling by itself can be handled, but when composed
with other loop transformations gets hard.) The problem is not that the
wrong point in the program will be highlighted, but that it doesn't help
the debugger user much. A highlighted source location tells the debugger
user next to nothing about the program, since the program has been so
highly transformed. With the more commonplace optimizations (code motion,
dead store elimination, common-sub-expression elimination, ...) the
execution can still be understood.

Now, suppose you are debugging optimized code, with or without this
"current location in code" information. When you go to examine a
variable, you don't know if optimization has caused it to be set early or
late---you don't know if you can trust it's value to be meaningful. I
developed a compiler/debugger interface and some algorithms to solve this
problem in the general case (for non-parallelizing optimizations) in my
dissertation. More practical work (smaller, faster, less complex) that
handles most common optimizations was done at CMU. So it is
technologically practical to solve this problem as well. The problem is
getting compiler and debugger writers to implement the solutions.

There are two markets in which there are production compilers that support
debugging optimized code to some extent---the supercomputer market and the
embedded market (really big and really small machines), where running the
unoptimized code is not an option.

Standard debug-info formats are limiting, but not fundamentally so. They
evolve, most support extensions, and the compiler and debugger can always
communicate via some other files. The issues are not technological but
economic. DWARF2 does eliminate a number of shortcomings of earlier
formats, but is not "the answer" to debugging optimized code.

If Tandem is willing to trade performance for debugging support, is it
willing to trade money for debugging support? Compiler writers are often
willing to extend their compilers for a price, and debugger writers as
well (the debugger writers might extend the debuggers for free, if they
could just get the information they need out of the compiler).

Max Copperman

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.