Assembling span-dependent instructions (Anton Ertl)
Wed, 27 Jul 2022 16:32:50 GMT

          From comp.compilers

Related articles
Assembling span-dependent instructions (2022-07-27)
Re: Assembling span-dependent instructions (Kaz Kylheku) (2022-07-27)
Re: Assembling span-dependent instructions (2022-07-28)
Re: Assembling span-dependent instructions (2022-07-28)
RE: Assembling span-dependent instructions (Christopher F Clark) (2022-07-28)
Re: Assembling span-dependent instructions (gah4) (2022-07-29)
Re: Assembling span-dependent instructions (Kaz Kylheku) (2022-07-29)
[2 later articles]
| List of all articles for this month |

From: (Anton Ertl)
Newsgroups: comp.compilers,comp.arch
Date: Wed, 27 Jul 2022 16:32:50 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Injection-Info:; posting-host=""; logging-data="57987"; mail-complaints-to=""
Keywords: code, optimize
Posted-Date: 27 Jul 2022 13:57:17 EDT

Over the years I have seen several discussions of the topic of
selecting between long and short versions of instructions in
assembling and linking. After one such discussion in 2019 I wrote

and after the recent discussion (e.g., <tan76v$qnd$>) I
finally proofread it and created a publically visible link to it. To
save you from the throuble of switching to a web browser, here's the
contents in plain text.

Crossposted to comp.compilers and comp.arch; maybe you want to reduce
this on followups.

                                    Assembling Span-Dependent Instructions

Some instruction sets support encoding immediate operands with
different sizes. E.g., on AMD64 the assembly language statement

movl 8(%rdi),%rax

can be encoded as

8b 47 08

or as

8b 87 08 00 00 00

The first form can encode only offsets in the range -128..127, while
the latter can encode offsets in the range -2147483648..2147483647.

Architectures with fixed-size instructions tend to split operations
with large immediate operands into more instructions than operations
with small immediate operands, so they following also applies to them.

If the immediate operand is a constant in the assembly language source
code, it is straightforward to just select the smallest encoding for
which the operand fits.

However, if the immediate operand refers to labels in the assembly
program, e.g., for a branch instruction, things become more
complicated: the minimum size of the operand may depend on how the
program is assembled, and how the program is assembled depends on the
minimum size of the operand.

The common case is a relative branch to some label, with some code
between. In this case, making the code larger can only make the minimum
size of the operand larger, and this is relatively benign, as we will
see below.

However, one can also construct cases where making the code larger can
reduce the minimum size of the immediate operand, e.g.:

          movl foo+133-bar(%rdi),%eax

If you choose the large encoding, the offset is 127 and would seem to
fit in the small encoding. But if you choose the small encoding, the
offset is 130 and does not fit in the small encoding. So in this case,
only the large encoding is a valid solution.

In general, such expressions involving labels may depend on how other
instructions are encoded, which may depend on yet other instructions,
with possible dependency cycles. In general, as Szymanski proved in
1978, this problem is NP-complete.

Now, as soon as anybody mentions NP-completeness, many people become
very gullible; at least I see no other explanation for the amount of
misunderstandings propagated about this topic, which prompted me to
write this web page.

* Szymanski's algorithm

Szymanski himself mentions the approaches discussed below, then
describes a more sophisticated graph-based algorithm (read the paper
for details). For dealing with cases like the second movl instruction
above, he gives criteria for identifying whether an instruction may
suffer from this condition, and suggests that they should be
unconditionally compiled in the long encoding (and gas actually does
that, even if foo+133-bar is replaced with foo-bar, for which the
shorter encoding is sufficient in either case). Given that such cases
are exceedingly rare, that's a good approach.

* Start big and shrink

A simpler algorithm is to start by using the largest encoding of all
instructions, determine the values of the labels, then make those
instructions smaller that can be made smaller. This can be repeated
until a steady-state is reached, or it can be stopped earlier to reduce
assembling time.

This algorithm can fail if the program contains nasty instructions like
our second movl: after the first pass, it decides to use the smaller
encoding for the movl, but that does not work.

If you then want to make this instruction larger again, the algorithm
is no longer guaranteed to terminate (see the end of section 2 for an
example that starts small).

One solution is to identify the nasty instructions as described by
Szymanski, and always keep these big.

A disadvantage of this algorithm is that its steady state may be worse
than for the algorithm below.

* Start small and grow

Another simple algorithm is to start by using the smallest encoding of
all instructions, determine the values of the labels, then make those
instructions larger that need to be larger. Repeat until a steady-state
is reached. Note that you never must shrink an instruction again,
otherwise the algorithm is no longer guaranteed to terminate.

Compared to the start-big approach, this algorithm is simpler (no need
to identify nasty instructions, the algorithm grows them just like the
others) and can produce better results, but typically need one
additional pass.

Compared to Szymanski's algorithm, this algorithm is simpler (no need
to identify nasty instructions) and runs slower (one pass more).
Compared to Szymanski's algorithm with a gas-like imprecise
identification of nasty instructions, the algorithm may produce smaller
code; but given that such instructions do not really occur in real
code, that's not a practical advantage.

* Misunderstandings

Over the years, a number of misunderstandings of this topic have been
posted. Comparing a recent one[1] to the paper, I can imagine how the
paper spawned it: The paper presents the start-big algorithm as just
working, then presents a start-small-then-grow-or-shrink algorithm, and
an example that does not converge (i.e., it does not terminate). It
does not say that the same example will either not terminate or fail if
you start big. The paper also explains that the general problem is
NP-complete, and also that this is not practically relevant. So if you
do not think about it too much, you might think that start-big is the
safe choice, while start-small has problems with termination (and a few
decades later that is conflated with the NP-completeness).

* References

Thomas G. Szymanski: Assembling Code for Machines with Span-Dependent
Instructions, CACM 21(4), 1978, p. 300-308

[1] <>
        in the web as <>

- anton
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <>

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.