Re: Cache size restrictions obsolete for unrolling?

jgd@cix.compulink.co.uk
Sat, 10 Jan 2009 10:21:30 -0600

          From comp.compilers

Related articles
Cache size restrictions obsolete for unrolling? linuxkaffee_@_gmx.net (Stephan Ceram) (2009-01-07)
Re: Cache size restrictions obsolete for unrolling? harold.aptroot@gmail.com (Harold Aptroot) (2009-01-09)
Re: Cache size restrictions obsolete for unrolling? gneuner2@comcast.net (George Neuner) (2009-01-10)
Re: Cache size restrictions obsolete for unrolling? linuxkaffee_@_gmx.net (Stephan Ceram) (2009-01-10)
Re: Cache size restrictions obsolete for unrolling? jgd@cix.compulink.co.uk (2009-01-10)
Re: Cache size restrictions obsolete for unrolling? harold.aptroot@gmail.com (Harold Aptroot) (2009-01-10)
| List of all articles for this month |

From: jgd@cix.compulink.co.uk
Newsgroups: comp.compilers
Date: Sat, 10 Jan 2009 10:21:30 -0600
Organization: Compilers Central
References: 09-01-010
Keywords: architecture, storage, DSP
Posted-Date: 10 Jan 2009 13:13:33 EST

linuxkaffee_@_gmx.net (Stephan Ceram) wrote:


> My feeling is that modern processors have sophisticated features (like
> prefetching, fast memories ...) that heavily help to hide/avoid
> instruction cache misses, thus they rarely occur even if a frequently
> executed loop exceeds the cache capacity.


This is likely to be very, very platform dependent. But tuning
performance up to the limit often is.


Some processors try to predict what's going to be fetched next and get
it in advance of need. If your example one happens to do that, and the
pattern of execution is simple enough that it can get it right - as a
simply unrolled loop might well be - then *if* the prefetch can keep
up with the consumption of instructions in execution, you would see
this effect. There are probably other situations that would do it too;
this is merely the first one that came to mind.


You're seeing this with a DSP. I'd be really cautious about assuming
that this applies to "all modern processors", or even "a large range
of algorithms on this DSP". You might well find that anything that
increased the rate of instruction consumption - such as using a
simpler algorithm to do something different - or reduced the overall
effectiveness of prefetching, such as slightly slower bus timing on a
different processor board, would spoil your example.


When I programmed something simpler back in the eighties, on an i/o
bus that didn't need to handshake if you could keep to the timing
specifications, some careful insertion of no-ops was required.


--
John Dallman, jgd@cix.co.uk, HTML mail is treated as probable spam.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.