how to generate code for (a,b):=(b,a)

Dave Lloyd <Dave@occl-cam.demon.co.uk>
8 May 1997 21:31:08 -0400

          From comp.compilers

Related articles
how to generate code for (a,b):=(b,a) anton@mips.complang.tuwien.ac.at (1997-05-05)
Re: how to generate code for (a,b):=(b,a) chase@world.std.com (David Chase) (1997-05-08)
how to generate code for (a,b):=(b,a) Dave@occl-cam.demon.co.uk (Dave Lloyd) (1997-05-08)
Re: how to generate code for (a,b):=(b,a) bart@time.cirl.uoregon.edu (1997-05-08)
Re: how to generate code for (a,b):=(b,a) Robert.Harley@inria.fr (1997-05-08)
Re: how to generate code for (a,b):=(b,a) dlmoore@ix.netcom.com (David L Moore) (1997-05-08)
Re: how to generate code for (a,b):=(b,a) preston@cs.rice.edu (1997-05-08)
Re: how to generate code for (a,b):=(b,a) cliffc@risc.sps.mot.com (Cliff Click) (1997-05-12)
Re: how to generate code for (a,b):=(b,a) wilson@cs.utexas.edu (1997-05-12)
[15 later articles]
| List of all articles for this month |

From: Dave Lloyd <Dave@occl-cam.demon.co.uk>
Newsgroups: comp.compilers
Date: 8 May 1997 21:31:08 -0400
Organization: Compilers Central
References: 97-05-058
Keywords: code, optimize

Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> In a code generator we have a problem of moving the contents of
> several registers to other registers, as if the copies happened
> simultaneously. This is equivalent to the problem of recoding the
> multi-assignment
>
> (a1, a2, a3, ...) := (b1, b2, b3, ...)
>
> into simple assignments, where the bs are not necessarily different
> from as or other bs (but ai<>aj for i<>j).
[snip]
> A simple algorithm for this problem is to have two passes: The first
> one copies any register, that is the source of a move to
> higher-numbered target register, to a free register and replaces the
> reference to that source register in the multi-assignment with a
> reference to the (once) free register. The second pass performs all
> the copies, starting with the lowest-numbered target register


I had to solve a similar problem a while back. As you mention later, the
main use for this is in preparing the registers before a procedure call.


My shuffle algorithm is somewhat different: We look at the permutation
implied by the assignments and identify cycles of moves e.g., (a, b) :=
(b, a) and chains e.g., (a, b) := (b, c). Chains can be moved from top
to bottom without further consideration, but a cycle requires an extra
register to close the loop.


Many processors have an exchange instruction and those that don't
usually provide an XOR instruction that can be used to synthesise an
exchange in three instructions BUT with the advantage of not requiring
an extra register (which reduces the register pressure). So for a
cycle of length 2 we exchange instead of introducing an extra
register. For longer cycles you usually lose with exchanges.


My implementation of this algorithm is further extended to allow
sources to be constants and simple loads (e.g., [ax +4]) as well as
preloaded registers. Otherwise you may find that a constant or load
are put in an arbtrary register before the shuffle and then moved
again during the shuffle.


> One way this problem does arise is in the following piece of C code:
>
> long bar(long,long,long);
>
> long foo(long a, long b, long c)
> {
> return bar(c,a,b);
> }
>
> BTW, I have compiled this on an Alpha under Digital "Unix" 4.0b with
> "cc -O5" and "gcc -O3". Both compilers generate 5 moves for this piece
> of code, i.e., one too many (of course, given that they generate 17
> and 16 instructions (respectively) overall, one additional move does
> not hurt that much.


With our Algol 68 compiler, the equivalent code generates (x86):
PROC foo (E:\tests\a68\perf\swaps.a68)[6]
              di := cx
              cx := bx
              bx := ax
              ax := di
              CALL bar
              RETURN


(Yes I know it should be just JMP foo with no return, one day soon I'll
tidy that one up).


Incidentally I have limited our compiler to passing only two values in
FP registers on the x86 as the shuffling problem is really horrendous
on that processor (no general register-register move, everything must
go through st0) and more registers often lead to an explosion of code
to get everything in place. Anyone have a good solution for this one?


Regards,
----------------------------------------------------------------------
Dave Lloyd mailto:Dave@occl-cam.demon.co.uk
Oxford and Cambridge Compilers Ltd http://www.occl-cam.demon.co.uk/
Cambridge, England http://www.chaos.org.uk/~dave/
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.