Re: High Level Assemblers vs. High Level Language Compilers

"Randall Hyde" <rhyde@cs.ucr.edu>
6 Apr 2002 23:25:07 -0500

          From comp.compilers

Related articles
[3 earlier articles]
Re: High Level Assemblers vs. High Level Language Compilers fjh@cs.mu.OZ.AU (2002-03-22)
Re: High Level Assemblers vs. High Level Language Compilers rhyde@cs.ucr.edu (Randall Hyde) (2002-03-24)
Re: High Level Assemblers vs. High Level Language Compilers rhyde@cs.ucr.edu (Randall Hyde) (2002-03-24)
Re: High Level Assemblers vs. High Level Language Compilers kgw-news@stiscan.com (2002-03-24)
Re: High Level Assemblers vs. High Level Language Compilers whopkins@alpha2.csd.uwm.edu (2002-03-31)
Re: High Level Assemblers vs. High Level Language Compilers rhyde@cs.ucr.edu (Randall Hyde) (2002-04-06)
Re: High Level Assemblers vs. High Level Language Compilers rhyde@cs.ucr.edu (Randall Hyde) (2002-04-06)
| List of all articles for this month |

From: "Randall Hyde" <rhyde@cs.ucr.edu>
Newsgroups: comp.compilers
Date: 6 Apr 2002 23:25:07 -0500
Organization: Prodigy Internet http://www.prodigy.com
References: 02-03-120 02-03-127 02-03-202
Keywords: assembler
Posted-Date: 06 Apr 2002 23:25:07 EST

Mark wrote


>Code such as this:
>
>segment A:
>...
>If (address A1 - address B1 < 1000) {
> generate this code
>} else {
> generate this code
>}
>A1:
>...
>
>segment B:
>...
>B1:
>...
>
>which illustrates, actually, both major issues: (1) the resolution of
>conditional code generation in the presence of forward or unresolved
>references and (2) the problem of resolving conditional code
>generation in the presence of references that are only made available
>at LOAD TIME (e.g., if the absolute addresses of the segments are only
>defined by the loader).


I thought I had mentioned my issue here in a previous post, but this
thread is getting somewhat convoluted, so forgive me if I repeat
myself here.


If your tool set has complete control over all the code from source to
executable, then what you are proposing is, unquestionably, the best
solution. In an 8051 environment, you can probably get away with the
1 & 1/3 pass architecture of C-AS.


In the "real world" however, the object code emitted by an assembler
is most often linked with object files created by other tools (e.g.,
most x86 assembly code is not written as part of a stand-alone
assembly program, but, rather, is linked with code emitted by a HLL
compiler). This means that you cannot create your own super-duper
object file format that your special linker/loader can process; you
must live within the confines of the standard object file formats that
other tools use. This gets especially tricky if you have to deal with
different formats (e.g., OMF and PE/COFF under Windows, ELF and a.out
under Linux) and/or tools that support only a subset of the object
file formats' capabilities (e.g., Delphi uses a subset of OMF under
Windows and Kylix uses a subset of ELF under Linux).


Considering all this, it's realistic to simply treat external
references as "maximum displacement" objects or require the external
definition to supply additional information (e.g., segment
information) to the assembler. Anything beyond this point becomes an
intractible problem if you need interoperability with other tools
(over which you have no control).


>>Our definitions of a "high level assembler" definitely diverge at this
>>point. You're after a "reconfigurable assembler" (i.e., you want to
>>provide a parser generator as part of the package).
>
>Reconfigurability and high-levelness in a high-level assembler are
>in essence all about the same thing. The main benchmark test of a
>high-level code generator tool is whether it can (in principle) define
>the entire binary formatting for the mnemonics of the given CPU by macros.


This is a good definition for a "meta-assembler." However, I've never
seen (nor used) this definition for a "high-level assembler." Of
course, the term "high-level assembler" doesn't have a standard
definition and I've seen it used (and abused) all over the place.
However, David Salomon 's "Assemblers and Loaders" textbook (1992,
Ellis Horwood) devotes a whole chapter to describing and defining high
level assemblers; I don't particularly agree with his definitions
either, but one could argue that he's researched the term and we
should be willing to accept something like his definitions for "high
level assemblers."


FWIW, here are the two definitions Salomon provides:


(1):
A high-level assembler language (HLA) is a programming language where
each instruction is translated into a few machine instructions. The
translator is somewhat more complex than an assembler, but much
simpler than a compiler. Such a language should not have features like
the if, for, and case control structures, complex arithmetic, logical
expressions, and multi-dimensional arrays. It should consist of simple
instructions, closely resembling traditional assembler instructions,
and of a few simple data types.


------------------
This definition suggests that a high level assembler contains some built-in
macro instructions that expand to two or more machine instructions. Since
this definition covers nearly every modern (macro) assembler available,
I feel that the definition is a bit too loose.


(2):
A high-level assembler language (HLA) is a language that combines most
of the features of higher-level languages (easy to use control
structures, variables, scope, data types, block structure) with one
important feature of assembler languages namely, machine dependence.
--------------------
The problem I have with this definition is that some high level
languages fall under this definition. Indeed, if we relax the term
"machine dependence" a bit, then "C" falls into this category. The
big problem I have with this definition is that it doesn't require
that the high level assembler provide access to the architectural
features of the CPU (is it just "some machine dependence" or "complete
machine dependence"?).


Therefore, I crafted the following definition for high level assemblers:


A "high level assembly language" (HLAL) is a language that provides a
set of statements or instructions that practically map one-to-one to
machine instructions of the underlying architecture. The HLAL exposes
the underlying machine architecture including access to machine
registers, flags, memory, I/O, and addressing modes. Any operation
that is possible with a traditional assembler should be possible
within the HLAL. In addition to providing access to the underlying
architecture, the HLAL must provide some abstractions that are not
normally found in traditional assemblers and that are typically found
in traditional high level languages; this could include structured
control statements (e.g., if, for, and while), high level data types
and data structuring facilities, extensive compile-time language
facilities, run-time expression evaluation, and standard library
support. A "High Level Assembler" is a translator that converts a high
level assembly language to machine code.
-------------------------


"Extensive compile-time language support" means support for a
compile-time interpreter that goes above and beyond the normal
facilities provided by traditional macro assemblers (e.g., something
beyond macros and conditional assembly). The point of my definition
is to differentiate modern macro assemblers from high level
assemblers. Note, btw, that this definition does *not* require
run-time high level control structures like if, while, etc. They are
a sufficient condition, but not necessary condition, for a high level
assembler (by this definition).


>A parser generator of not needed even for a universal assembler.
>At most, you require a general syntatic framework for expression
>syntax:
>
>Ex -> "(" Ex ")" | [Ex] "[" Ex "]" | Ex "?" Ex ":" Ex | "{" Ex "}"
> unary Ex | Ex binary Ex | Ex postfix | constant
>
>plus a set of assemble-time directives for defining arities and
>precedences, exactly as in Prolog:
>
>Prefix: fx Op, Prec; fy Op, Prec
>Postfix: xf Op, Prec; yf Op, Prec
>Infix: xfx Op, Prec; xfy Op, Prec; yfx Op, Prec


If you've not implemented macros in C-AS yet, definitely take a look
at the macro facilities provided by the Dylan programming language.
They provide a very powerful facility that lets you define syntactical
constructs on the fly with a minimum of effort on the programmer's
part.


Having said that, I would recommend *some* sort of generalized grammar
parsing scheme in your system. Specifying expressions is only a tiny
part of the overall syntax a user needs to specify when using a
meta-assembler. E.g., how well would your scheme work when developing
an IA-64 assembler with it's syntax?


Basically, you're creating a framework by which people can develop
DSELs (domain specific embedded languages) within C-AS. The domain,
of course, is assembly languages. The real power to an embedding
language for use in creating DSELs is how well the embedding language
lets the ultimate end-user escape the syntax of the embedding language
and use the syntax of the DSL. You'll need more than a generalized
pattern matching scheme for expressions to handle this.




>All processors define overloaded mnemonics. So, it's an absolute
>necessity in order to reach the benchmark.


I guess that depends upon how you define "overloaded." HLA's
compile-time language certainly gives the programmer the ability to
examine macro arguments and deal with those arguments differently,
depending on the examination. I guess you could consider this
overloading. Personally, I don't. Overloading is something the
compiler/assembler takes care of automatically for you (and this is
definitely what your examples suggest happens). However, HLA can
achieve the same result (albeit with a little more work) in a
different manner. OTOH, HLA's approach is a bit more general and
flexible. Of course, the ideal solution would be to support both
schemes. That allows the end-user to define overloaded macros when
they work well (thus reducing the programming effort) or use the
compile-time language to handle more complex syntax.


>
>C-AS came out in 1992 and these issues were discussed extensively
>here over the last decade.


Interestingly enough, that's also when Salomon's text was published
with his definitions for high level assemblers. I'd be really
suprised to find that those definitions did not wind up in the
discussion.


I actually downloaded C-AS from the following URL:
www.csd.uwm.edu/~whopkins/8051/index.html


After reading the documentation, I must admit I was a bit
disappointed. The documentation claims that this is an 8051 assembler
that uses a C-like syntax. This raised all kinds of expectations on
my part that weren't met. Granted, I'm a special case having written
an x86 assembler that uses a Pascal-like syntax, but I'd bet I'm not
the only one that expected more.


The C-AS version I downloaded offers the following two C-like
features: (1) address expressions use a C-like syntax, and (2)
conditional assembly directives use a (run-time) C-like syntax.


Everything else about C-AS (at least the version I grabbed) is
traditional asssembler fare. Now, before I go too far, I do want to
point out that C-AS is one of the better 8051 assemblers I've seen
(free or otherwise) and the 1 & 1/3 pass architecture is nice.
However, in its current form I argue it doesn't meet any of the
definitions appearing earlier for a high level assembler.


Here are the actual claims:
>>>>>>
    (a) Features
      This is a free full-featured one-pass 8051 assembler, it could very
well be the first one-pass assembler for the popular MCS-51 family of
microprocessors. What you get are the following features:


                  * Seperately assembleable files. There are two stages of assembly:
                          - Pass 1: Creation of object files
                          - Pass 1 1/3: Linking of object files
                  * Segmentation
                          - RELATIVE ADDRESSING supported for all segment types
                  * Conditional assembly, with a C-like syntax. Example:
                                if (Condition) {
                                      Assembly instructions...
                                } else {
                                      Assembly instructions...
                                }
                  * Multiple statements per line with C-like syntax.
                  * C-like expression syntax.
                  * Command-line options similar to those of *NIX C compilers.
                  * An extensive archive of real-life assembly language programs,
                      including a multi-tasking library and an 8051 disassembler.


      Plus, if you don't want to learn all the elaborate ins and outs of this
tool right away, it is just as easy to use the first time out as an 8051
assembler which contains only minimal features.


      You simply will not find anything this extensive anywhere in the public
domain. But it's yours, here, for free.
<<<<<<<<<<<<<




Here are my concerns with C-AS:


Quite honestly, the name suggest a C-like variant of PL/M. That's
what I fully expected when I first read the name and the description
of the product ("an 8051 assembler that uses a C-like syntax."). It
was a bit of a disappointment to discover that the conditional
assembly directives and address expressions were the extent of the
C-like syntax. While these features are better than found on many
other 8051 assemblers, they aren't particularly spectacular when
considering the set of assemblers as a whole.


Even considering the use of C-like syntax as part of the compile-time
language rather than the run-time language, there are many things that
surprised me. For example, I'd naturally have assumed that for
conditional assembly you'd use CPP syntax (i.e., #if, #endif) rather
than the C run-time language syntax. Likewise, why not use '#include'
rather than 'include'? Also, why not just drop EQU and SET and use
the C assignment statement syntax? e.g.,


        label = expr; // for redefineable symbols, if you allow this.
        const label = expr; // for symbols you can't redefine.


While you're at it, why not use C-like syntax for variable declarations?
E.g.,


xdata
        byte b;
        byte bs[5];


code
        byte "Hello World";


Also, why not add typedefs?


typedef char byte;


data
        char ch[16];
        char "Hello World";


structs and unions would also be nice. Another really cool feature
would be array and struct constants (union constants would be neat
too, but I've never figured out how to do them properly, so I can't
expect them from C-AS).


Certainly by adding these features C-AS would meet the definition
of a "high level assembler."


One thing that surprises me is that the compile-time language
(if, include, =) is very sparse. Why not add 'while', 'switch', and
other directives? Maybe 'switch' is going a bit far, but 'while'
is an especially useful compile-time statement.


I don't know if you're claiming the current version of C-AS to be a
high level assembler or whether you're attaching this term to the next
version (the meta-assembler version). Certainly if the macro/pattern
matching capabilities of the next version allows one to create
statements or declarations that are in-line with the definitions of
high level assemblers given earlier, then C-AS would qualify.
However, I'll have to reserve judgement until I see the actual
implementation.


Of course, no one says that the definitions I've given in this post
are the final word on the definition of a "high level assembler." So
you're free to use your own definition of the term; we can choose to
disagree about the definition (I certainly disagree with IBM's
definition for their HLAsm product since I see very little in the way
of HL features in the documentation I've read for that product).


Randy Hyde
[Re long and short displacements in the linker, it's true, it's a pain
in the neck to do so nobody does it. Back 20 years ago when every
byte was precious, early versions of Vax Unix had two sets of C object
libraries, one with two-byte external displacements and one with
four-byte. If your program fit in 64K, you could compile and link
two-byte, but if the linker told you it didn't fit, you'd have to trade
up to four byte. Now that memory is so cheap, nobody cares any more
except in the low-level embedded market where I get the impression
they still assemble the whole program in one chunk, or else set the
sizes explicitly. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.