Prediction of local code modifications

Tim Frink <plfriko@yahoo.de>
Thu, 27 Mar 2008 14:30:25 +0000 (UTC)

          From comp.compilers

Related articles
Prediction of local code modifications plfriko@yahoo.de (Tim Frink) (2008-03-27)
Re: Prediction of local code modifications gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-03-28)
Re: Prediction of local code modifications preston.briggs@gmail.com (preston.briggs@gmail.com) (2008-03-28)
Re: Prediction of local code modifications max@gustavus.edu (Max Hailperin) (2008-03-28)
Re: Prediction of local code modifications gah@ugcs.caltech.edu (glen herrmannsfeldt) (2008-03-29)
Re: Prediction of local code modifications preston.briggs@gmail.com (preston.briggs@gmail.com) (2008-03-29)
Re: Prediction of local code modifications plfriko@yahoo.de (Tim Frink) (2008-04-01)
[13 later articles]
| List of all articles for this month |

From: Tim Frink <plfriko@yahoo.de>
Newsgroups: comp.compilers
Date: Thu, 27 Mar 2008 14:30:25 +0000 (UTC)
Organization: CS Department, University of Dortmund, Germany
Keywords: optimize, DSP, question
Posted-Date: 28 Mar 2008 00:01:02 EDT

Hi,


Maybe you have some ideas how to cope with this problem:


I'm trying to optimize assembler code for a complex DSP which is an
in-order superscalar processor (integer and load/store
pipeline). Before it decodes instructions (which might be 16- or
32-bit), they are fetched into a 64bit fetch buffer. Thus, the fetch
is aligned to 8byte addresses and in case an instruction at a branch
target goes beyond the 8byte-border (misalignment), the processor
stalls for additional cycles (extra transfer of control
penalties). The DSP supports also an instruction cache which makes
things even more complicated since multiple instructions are read from
the cache and again they might span over multiple lines leading to
extra cycles.


My optimizations deal with moving basic blocks (determined by some
cost functions) from the slow main memory to a small but fast memory
thus allowing a fast access to these particular blocks. However, I
have large problems with the "optimized" code. The moved blocks
benefit from the faster memory but due to the moving the addresses of
the subsequent instructions obviously change. Sometimes it's even
sufficient to add one instruction which modifies the address of the
following code to get significant runtime changes. The reason are new
misaligned jump targets, differently loaded fetch buffers and thus
different filling of the superscalar pipeline which might have a
positive or negative effect on the total program runtime.


Thus, my problem is that I can achieve a local optimization for the
moved blocks but the resulting global influence is not predictable and
might even undo the benefits and even result in a degraded runtime of
the program.


How do compiler developers cope with this problem? Are there any
approaches which allow to predict the influence of a local code
optimization on the global code performance for complex processors?


Regards,
Tim


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.