Re: object code vs. assembler code (Detailed response)

segfault!rfg@uunet.UU.NET (Ron Guilmette)
Sat, 13 Mar 1993 19:22:55 GMT

          From comp.compilers

Related articles
object code vs. assembler code (1993-02-19)
Re: object code vs. assembler code (Detailed response) (1993-02-22)
Re: object code vs. assembler code (Detailed response) segfault!rfg@uunet.UU.NET (1993-03-13)
| List of all articles for this month |

Newsgroups: comp.compilers
From: segfault!rfg@uunet.UU.NET (Ron Guilmette)
Keywords: assembler, performance
Organization: Ron Guilmette Computing
References: 93-02-105 93-02-122
Date: Sat, 13 Mar 1993 19:22:55 GMT (Clyde Smith-Stubbs) writes:
+The issues affecting assembler vs. object code as I've seen them over the
+years are these:
+Firstly, my customers overwhelmingly tell me they like being able to read
+the compiler output. In fact a significant effort goes into making the
+output readable (formatting, source code as comments, notes about register
+allocation etc.). I can recall very few occasions (none in recent years)
+where a user would have happily traded this for slightly faster

It is my (biased) opinion that compilers producing assembly code should
provide an option (or options) which would render the generated assembly
code "readable". This should *not* be the default behavior however, as
"verbose" human-readable assembly code can in fact cause a degradation
(albeit a small one) in overall compilation speed.

As it happens, I am the implementor of the code in the GNU C compiler
which is responsible for the production of DWARF symbolic debugging
information. This feature of the compiler has been implemented in such a
way that the user may obtain either a "terse" representation of the DWARF
information (in the assembly code file) or, alternatively, a "verbose"
representation of that same information.

Unless a special option is used, the DWARF symbolic debugging information
generated by GCC (which itself consists of just a set of additional
assembly code statements which gets put into a distinct set of sections of
the object file) is generated in a "terse" way, suitable only for getting
this infor- mation through the assembler and into the object file. When
expressed in this way (to the assembler) the assembly code containing the
DWARF format debugging information is totally incomprehensible to humans.
It consist of lengthy blocks of data definitions directives containing
various standardized "magic codes" which are unique to (and defined by)
DWARF. For example, given the source code:

extern int printf ();
int main () { printf ("Hello world!"); }

The DWARF debugging information entry for `main' (on an i486/SVR4) looks

.section .debug
.4byte .L_D3_e-.L_D3
.2byte 0x6
.2byte 0x12
.4byte .L_D4
.2byte 0x38
.string "main"
.2byte 0x55
.2byte 0x7
.2byte 0x111
.4byte main
.2byte 0x121
.4byte .L_f1_e
.2byte 0x8041
.4byte .L_b1
.2byte 0x8051
.4byte .L_b1_e
.4byte 0x4

I think that most readers will agree that this is not easily
comprehensible by any normal human. However when the special
-fverbose-asm option is used, the DWARF debugging information entries are
annotated with comments which make these entries a good deal more
decipherable by humans who know DWARF (and its rules):

.section .debug
.4byte .L_D3_e-.L_D3
.2byte 0x6 / TAG_global_subroutine
.2byte 0x12 / AT_sibling
.4byte .L_D4
.2byte 0x38 / AT_name
.string "main"
.2byte 0x55 / AT_fund_type
.2byte 0x7 / FT_integer
.2byte 0x111 / AT_low_pc
.4byte main
.2byte 0x121 / AT_high_pc
.4byte .L_f1_e
.2byte 0x8041 / AT_body_begin
.4byte .L_b1
.2byte 0x8051 / AT_body_end
.4byte .L_b1_e
.4byte 0x4

I implemented the -fverbose-asm option in GCC (which currently has an in-
fluence *only* on the verbosity of the DWARF stuff, and not on the
verbosity of, for example, the executable code) simply because I could not
live without it. I found it damn difficult to debug the code which
generates this stuff unless I had some help like this (from the compiler)
so that I could read and interpret what I was generating.

I have a strong feeling that such annotations could also be quite helpful
for executable code. For example, if an instruction fetches a word from
(for example) @(sp+16), wouldn't it be helpful to know that it was really
fetching the contents of the user-level local variable called `foo'?
// Ronald F. Guilmette
// domain address: rfg@segfault.uucp
// uucp address: ...!uunet!!segfault!rfg

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.