Re: Auto vectorization

=?ISO-8859-1?Q?Roland_Lei=DFa?= <>
Wed, 21 May 2008 08:17:57 -0700 (PDT)

          From comp.compilers

Related articles
Auto vectorization (2008-05-15)
Re: Auto vectorization (2008-05-20)
Re: Auto vectorization (=?ISO-8859-1?Q?Roland_Lei=DFa?=) (2008-05-21)
Re: Auto vectorization (Anton Lokhmotov) (2008-05-23)
| List of all articles for this month |

From: =?ISO-8859-1?Q?Roland_Lei=DFa?= <>
Newsgroups: comp.compilers
Date: Wed, 21 May 2008 08:17:57 -0700 (PDT)
Organization: Compilers Central
References: 08-05-061
Keywords: optimize, parallel
Posted-Date: 22 May 2008 22:20:51 EDT


> 1-- How you define the profitability of auto-vectorization phase? Is
> it just the speed up? If we do not get any speed up over scalar code
> then there is no need to do auto-parallelization.

If you have vectorized code, which is just as fast as the scalar code
-- use the scalar code. Usually the vectorized version consumes more
memory. But gcc also does two versions of a loop and does runtime
checks whether the loop is long enough so the vectorized version is
worth the overhead. Otherwise the scalar version is taken. If you
know that your loop will never benefit from vectorized code in all
cases, it is better to use the scalar version. So the profitability
can be defined as speed up and memory consumption (if you think you
have enough memory, keep in mind that your cache is not that big and
cache misses are expensive).

> 2--What are the phases or features in a compiler ( especially in the
> GCC) that control the quality of auto-vectorization?

This is very complicated. I have fiddled around with the auto-
vectorizer more than once it is hard to get good results. Have a look
at this:
and this:

Some hints:

Use proper alignment (16 bit aligned data with SSE for instance). You
can achieve this with __attribute__ ((aligned (16))). This is _NOT_
guaranteed to work with dynamic memory. There you have to use tricks
or use posix_memalign(). If you are using C++ you can overload the new
operator for your class to automatically use posix_memalign when using
new. However it seems to be very hard for gcc to proof that memory is
properly aligned. So use pragmas. Use -ftree-vectorizer-verbose and -
ftree-vectorizer-verbose=5 to see whether all your effort is worth the
trouble. But in the end you can only be 100% sure what happens if you
see the asm output (with -S).

Hope, I could help.

Roland Leissa

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.