Re: compilers using MMX instructions in the generated code

bcombee@metrowerks.com (Ben Combee)
9 Jan 2000 22:47:47 -0500

          From comp.compilers

Related articles
compilers using MMX instructions in the generated code ramkik@sasi.com (Ramkishor) (2000-01-06)
Re: compilers using MMX instructions in the generated code bcombee@metrowerks.com (2000-01-09)
Re: compilers using MMX instructions in the generated code jkahrs@castor.atlas.de (Juergen Kahrs) (2000-01-09)
Re: compilers using MMX instructions in the generated code Milind.Girkar@intel.com (Milind Girkar) (2000-01-09)
Re: compilers using MMX instructions in the generated code plakal@cs.wisc.edu (2000-01-09)
Re: compilers using MMX instructions in the generated code lindahl@pbm.com (2000-01-12)
Re: compilers using MMX instructions in the generated code olefevre@my-deja.com (2000-01-12)
Re: compilers using MMX instructions in the generated code mlross@jf.intel.com (2000-01-12)
[6 later articles]
| List of all articles for this month |

From: bcombee@metrowerks.com (Ben Combee)
Newsgroups: comp.compilers
Date: 9 Jan 2000 22:47:47 -0500
Organization: Metrowerks
References: 00-01-011
Keywords: code

Ramkishor wrote:
> Are there any compilers, which can use MMX instructions(Any SIMD
> instructions like 3DNow from AMD or VIS from SUN etc.) in the code
> generated by them?


The CodeWarrior x86 compiler has used MMX and 3DNow! instructions for
doing vector computations since the Pro 3 release. It cannot always
detect when they are legal and useful, but for simple loops where the
compiler can determine there is no aliasing between array references,
it will use the wider SIMD instructions.


For example, the code


short int a[50], b[50], c[50];


void foo(void)
{
        int i;
        for (i = 0; i < 50; i++) a[i] = b[i] + c[i];
}


would be vectorized using the MMX PADDW instruction to add 4 16-bit
integers at a time for 12 iterations, followed by standard code to
handle the two left over additions.


Here is a disassembly of this function compiled with our 2.3.2 release,
optimization level 4, targetting MMX/Pentium II.


name = _foo
offset = 0x00000000; type = 0x0020; class = 0x0002


00000000: 31 D2 xor edx,edx
00000002: 89 D0 mov eax,edx
00000004: D1 E0 sal eax,1h
00000006: 0F 6F 80 00 00 00 00 movq mm0,qword ptr [eax+_b]
0000000D: 83 C2 04 add edx,4
00000010: 0F FD 80 00 00 00 00 paddw mm0,qword ptr [eax+_c]
00000017: 0F 7F 80 00 00 00 00 movq qword ptr [eax+_a],mm0
0000001E: 05 08 00 00 00 add eax,8
00000023: 83 FA 2F cmp edx,47
00000026: 7C DE jl $-32 ; --> 0x0006
00000028: 66 8B 0D 60 00 00 00 mov cx,word ptr _b+96
0000002F: 66 03 0D 60 00 00 00 add cx,word ptr _c+96
00000036: 66 89 0D 60 00 00 00 mov word ptr _a+96,cx
0000003D: 66 A1 62 00 00 00 mov ax,word ptr _b+98
00000043: 66 03 05 62 00 00 00 add ax,word ptr _c+98
0000004A: 83 C2 02 add edx,2
0000004D: 66 A3 62 00 00 00 mov word ptr _a+98,ax
00000053: 0F 77 emms
00000055: C3 ret near


There are limitations... I used global arrays on purpose here -- we do
not yet support the ISO C 1999 keyword "restrict" that would let you
give the compiler enough information to know that the b and c arrays
did not alias with array a, so if you passed the arrays in as
parameters, we would not attempt the vectorization.
--
Ben Combee <bcombee@metrowerks.com> -- x86/Win32/Linux/NetWare CompilerWarrior


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.