Re: High Level Assemblers vs. High Level Language Compilers

"Randall Hyde" <>
6 Apr 2002 22:42:49 -0500

          From comp.compilers

Related articles
[2 earlier articles]
Re: High Level Assemblers vs. High Level Language Compilers (Ira D. Baxter) (2002-03-22)
Re: High Level Assemblers vs. High Level Language Compilers (2002-03-22)
Re: High Level Assemblers vs. High Level Language Compilers (Randall Hyde) (2002-03-24)
Re: High Level Assemblers vs. High Level Language Compilers (Randall Hyde) (2002-03-24)
Re: High Level Assemblers vs. High Level Language Compilers (2002-03-24)
Re: High Level Assemblers vs. High Level Language Compilers (2002-03-31)
Re: High Level Assemblers vs. High Level Language Compilers (Randall Hyde) (2002-04-06)
Re: High Level Assemblers vs. High Level Language Compilers (Randall Hyde) (2002-04-06)
| List of all articles for this month |

From: "Randall Hyde" <>
Newsgroups: comp.compilers
Date: 6 Apr 2002 22:42:49 -0500
Organization: Prodigy Internet
References: 02-03-120 02-03-127 02-03-202
Keywords: assembler, design
Posted-Date: 06 Apr 2002 22:42:48 EST

>In the former case, the code directs the formatting of memory images,
>in the latter case the code IS the memory image (modulo translation).
>The closest equivalent to high-level assembly would be if the C
>macro processor, itself, were made into a full-fledged programming

Been there, tried to do that.
Soundly rejected by the C/C++ crowd.
They really want nothing to do with macros ("templates and
consts give you everything you need without the headaches
of macros." Or so I've been told).

Still, you might want to check out the High Level Assembler
It *does* have a macro processor that is a full-fledged programming
language (indeed, I refer to it as the "compile-time language" within

>>>There were two major problems which were unresolved which blocked
>>>that line of development: (1) a comprehensive scheme for properly
>>>handling the weirdness associated with the way assemblers allow for
>>>references to yet-to-be-defined addresses but yet allows them to
>>>be used in assembler directives and expressions (a VERY nasty
>>>recursion issue lurks beneath this),
>>This problem doesn't get resolved via multi-phase assembly? Maybe I
>>don't understand the exact nature of the problem you are describing.
>Code such as this:
[code example snipped.
>which illustrates, actually, both major issues: (1) the resolution of
>conditional code generation in the presence of forward or unresolved
>references and (2) the problem of resolving conditional code
>generation in the presence of references that are only made available
>at LOAD TIME (e.g., if the absolute addresses of the segments are only
>defined by the loader).

Okay, now I understand; I was unaware that you were referring to
load time objects.
Forward references are not a problem with multi-phase assembly.
But, obviously, if the reference cannot be resolved during assembly,
no number of phases (short of moving the linking phase into the
assembler) would solve that issue.

>This is only the simplest example of this kind of phenomenon.
>Either the general problem must be dealt with or it must be avoided or
>limited by the use of more or less ad hoc expediencies, such as the
>one commonly used by assemblers (of distinguishing pass 1 time
>assembly from pass 2 time).

You are aware, of course, that many modern assemblers don't have
a notion of pass 1 vs. pass 2. They make as many passes (phases)
as necessary to resolve and optimize the code. I can't speak for
every assembler on the planet, but most of the x86 ilk I've been using
for the past 5-10 years have had this support. They still can't optimize
displacements to external references (outside the current assembly),
but they do a great job on source code that is part of a single assembly.

>>In HLA, I differentiate these two *VERY DIFFERENT things* by referring
>>to them as the "compile-time language" and the "run-time language".
>Right. The former gives you a high-level assembler. The latter a
>compiler. What you're calling HLA is actually a mix of the two,
>since you're doing some compilation of high level source language
>(your "run time language").

HLA technically is a "compiler for an assembly language."
Though the run-time language is only a little more powerful than,
say, MASM or TASM (which most people think of as "traditional assemblers"
despite high-level control statements like .IF or .WHILE).
However, the argument of whether HLA is really a compiler or an
assembler is a complete non-issue to me. Like Beth Stone says,
"it's a language translator" and that's good enough for me.

>>Our definitions of a "high level assembler" definitely diverge at this
>>point. You're after a "reconfigurable assembler" (i.e., you want to
>>provide a parser generator as part of the package).
>Reconfigurability and high-levelness in a high-level assembler are
>in essence all about the same thing. The main benchmark test of a
>high-level code generator tool is whether it can (in principle) define
>the entire binary formatting for the mnemonics of the given CPU by macros.

Perhaps HLA would fit your definition of a "high level assembler"
because it's macro capabilities are sufficiently powerful that I could
write an 8051 assembly with it. However, I would not consider an
arbitrary "meta-assembler," even one that is configurable at
compile-time, to be a "high level assembler." To me, a high level
assembler provides high-level language like control structures (e.g.,
the run-time .if/.while that MASM/TASM provides, and the
if/elseif/else/endif and while/endwhile that HLA provides) as well as
support for high level data abstractions (records, classes, unions,
etc.). I'm not claiming that my definition is the right one, or even
the best. David Soloman's text has two others that are quite
different from mine. I'm just pointing out that I think that a "high
level assembler" needs to be something more than a reconfigurable
assembler IMO. So while HLA may certainly fit your definition of a
'high level assembler' I suspect that my definition is a superset of

OTOH, I would point out that based on the previous posts, I have
relaxed reserved words in the design of HLA v2.0 so that they can now
be redefined (so you can use MOV as a macro name if you really want to
redefine the MOV instruction). Somewhat essential if you want to
write an assembler for a different processor within HLA (without
expending a whole lot of work).

>A parser generator of not needed even for a universal assembler.
>At most, you require a general syntatic framework for expression
>Ex -> "(" Ex ")" | [Ex] "[" Ex "]" | Ex "?" Ex ":" Ex | "{" Ex "}"
> unary Ex | Ex binary Ex | Ex postfix | constant
>plus a set of assemble-time directives for defining arities and
>precedences, exactly as in Prolog:
>Prefix: fx Op, Prec; fy Op, Prec
>Postfix: xf Op, Prec; yf Op, Prec
>Infix: xfx Op, Prec; xfy Op, Prec; yfx Op, Prec
>>>define move(accum + @ (reg R), #D) { ... }
>>>which matches (for instance) to "move A + @R3, #35.
>>Overloaded macros is something HLA doesn't directly support, though
>>overloading is not necessary to do what you're trying here.
>All processors define overloaded mnemonics. So, it's an absolute
>necessity in order to reach the benchmark.

Not at all.
macro mov( arg1, arg2 );
  << compile-time code>>

The compile-time code inside this macro can do the pattern matching
on the arguments you pass in. HLA's macro processing facilities do
not need to provide any support for overloading, the user can take
care of that themselves (so, I guess, you could say that HLA *indirectly*
provides support for overloading since it provides the mechanism
whereby the user can implement it themselves).
Of course, it would be a lot easier if HLA did the pattern matching for
you (this would reduce the amount of pattern-matching compile-time
code you'd have to write).

>C-AS came out in 1992 and these issues were discussed extensively
>here over the last decade.

Well, it probably wasn't *that* extensive since I managed to miss it
and I've been following this newsgroup one and off for a bit longer
than that.
Do you have references? (Google wasn't very helpful here).
I can't comment on the applicability of this without further research.
Randy Hyde
[Re two-pass assemblers, of course assemblers still have two passes.
Generating long and short displacements has been well understood for
25 years. The easiest approach is in pass 1 to assume they're all long
and to make a table of all of the jumps that might be shortened. Then
between the passes, iterate over the table looking for one you can
shorten, then adjust the symbols after that jump, repeat until you
don't find any more. Then do pass 2. Works great, as close to optimal
as you can reasonbly get. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.