Looking for Program Transformation Examples

tseng@rice.edu (Chau-Wen Tseng)
Fri, 24 Apr 1992 03:49:42 GMT

          From comp.compilers

Related articles
Looking for Program Transformation Examples tseng@rice.edu (1992-04-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: tseng@rice.edu (Chau-Wen Tseng)
Keywords: optimize, question
Organization: Rice University
Date: Fri, 24 Apr 1992 03:49:42 GMT

Hello netters,


I am looking for example program kernels that can benefit from
the following compiler optimizations.


(1) Unimodular Loop Transformations


        Unimodular loop transformations use matrix calculations to guide
        some combination of loop interchange, skewing, and reversal to
        parallelize perfectly nested loops. A simple example
        is Successive-Over-Relaxation (SOR),


            do j = 2,N-1
                do i = 2,N-1
                    A(i,j) = F(...A(i-1,j),A(i,j-1),A(i+1,j),A(i,j+1)...)
                enddo
            enddo


        where loop skew and interchange can parallelize the inner loop:


            do i = 4,2*(N-1)
                doall j = max(2,i-N-1),min(N-1,i-2)
                    A(i-j,j) = F(...A(i-j-1,j),A(i-j,j-1),A(i-j+1,j),A(i-j,j+1)...)
                enddo
            enddo


        Another example comes from tridiagonal solves, where loop
        interchange can be used to move a parallel loop outwards.


            do j
                doall i
                    A(i,j) = F(...A(i,j-1)...)
                enddo
            enddo


        However, I can't seem to find much else. Are there common
        computations that can utilize unimodular loop transformations?




(2) Loop Distribution


        Loop distribution creates multiple loops from a single loop. It is
        pretty essential for vectorizing compilers. The most common use
        for loop distribution is to separate a sequential reduction from a
        parallel loop. For instance,


            do j
                A(j) = F(...)
                t = t + A(j)
            enddo


        can be distributed to yield the following loops:


            doall j
                A(j) = F(...)
            enddo
            do j
                t = t + A(j)
            enddo


        I'm having a difficult time discovering other cases where loop
        distribution can be used to increase parallelism.


I would appreciate pointers to computations that can benefit from
either of these techniques. I'll summarize responses, if I get any :-)


Thanks,


Chau-Wen Tseng
Grad Student
Rice University
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.