Re: Compiler support for multicore architectures

Ray Dillinger <bear@sonic.net>
Fri, 28 Nov 2008 19:39:20 -0800

          From comp.compilers

Related articles
[3 earlier articles]
Re: Compiler support for multicore architectures m.helvensteijn@gmail.com (2008-11-19)
Re: Compiler support for multicore architectures walter@bytecraft.com (Walter Banks) (2008-11-19)
Re: Compiler support for multicore architectures toby@telegraphics.com.au (toby) (2008-11-20)
Re: Compiler support for multicore architectures jatin.bhateja@gmail.com (Jatin Bhateja) (2008-11-21)
Re: Compiler support for multicore architectures kamalpr@hp.com (kamal) (2008-11-23)
Re: Compiler support for multicore architectures idbaxter@semdesigns.com (Ira Baxter) (2008-11-28)
Re: Compiler support for multicore architectures bear@sonic.net (Ray Dillinger) (2008-11-28)
Re: Compiler support for multicore architectures scooter.phd@gmail.com (scooter.phd@gmail.com) (2008-11-30)
| List of all articles for this month |

From: Ray Dillinger <bear@sonic.net>
Newsgroups: comp.compilers
Date: Fri, 28 Nov 2008 19:39:20 -0800
Organization: Organized? Seems unlikely.
References: 08-11-086
Keywords: parallel
Posted-Date: 29 Nov 2008 13:53:50 EST

gaurav wrote:


> There has been a lot of discussions going on about compiler support
> for multicore architectures.
>
> I have a few basic questions.
>
> What different things can a compiler do to make mutithreaded programs
> run better on multicore? Isnt it much depends on the threading
> library , OS and the programmar instead on compiler ?


Firstly it depends on what kind of "multicore" the architecture
actually is. Multiprocessor machines with identical cores are
becoming more common, but so are multiprocessor machines where some
CPU cores are specialized (Graphics coprocessors, Digital Signal
Processors, etc) and have different capabilities, efficiencies, and
instruction sets.


Ideally, multiprocessing on heterogenous CPUs distributes each task to
the CPU best equipped to carry it out. In practice, we don't really
even have techniques to measure or talk about what kind of computing
capabilities a task requires or a particular CPU provides, so we do
this in a very ad-hoc static way, usually by linking in libraries with
routines predefined in machine code targeted to a particular processor
out of the several available on the machine. Bringing practice in
heterogenous multiprocessing closer to the ideal would be a really
good place to start.


Even in the simplified world of homogenous multiprocessing where you
are coding for CPU cores that are alike, there are lots of important
questions, like what general architecture is the multiprocessing
system?


Most multiprocessing systems today are what is referred to as
"symmetric multiprocessing" systems, where the cores are identical,
sharing memory for both data and programs. Different CPUs are at
different points in the instruction stream, possibly in different
programs, at the same time, and whenever one writes to memory others
are supposed to block on reads from that location (possibly context
switching so another process/thread can take advantage of the CPU that
would otherwise block) until it's stable and reflects the update.
This is conceptually the most straightforward way to benefit from
multiprocessing while capitalizing on the infrastructure of
monoprocessor code and programming expertise out there, but it's
pretty demanding in terms of cache coherency, memory synchronization,
and flat-out memory bandwidth, and chip complexity therefore seems to
scale exponentially (or maybe only polynomially, but anyway still
badly) with number of CPU cores. This definitely puts a limit on it
compared to other multiprocessing architectures.


The highest performance in parallel architectures is usually achieved
with SIMD, meaning the sequence of instructions is "broadcast" to all
CPUs and each CPU is working with its own set of data and its own data
address space. With SIMD you rarely have to worry about when memory
writes made on one CPU become visible to other CPUs because for the
most part what gets shared is instructions, and most programs don't
rewrite their own instructions. The beauty of it is you need only a
single bus for distributing instructions, and each CPU can have its
own independent I/O channel to its own independent data segments This
allows linear complexity in scaling and drastically simplifies memory
architecture, while eliminating all but one of the data bandwidth
demands for instruction space. Pure SIMD systems never need to block
and context switch. But SIMD also has problems. First it's not
suitable for general multiprocessing; on a modern OS you have dozens,
if not hundreds, of different programs that all want to run at the
same time. SIMD architectures run one instruction stream at a time.
Other Problems with SIMD include most programmers not being ready or
able to make the conceptual leap to it and lack of development
environments. Effective use of it would be a good thing to look into
for language design.


Another very interesting parallel architecture that has good scaling
properties is a dataflow architecture. In a dataflow architecture,
each processor has its own unshared memory space, and a set of ports
connected to other processors that it can read or write. A program is
distributed to some set of the processors, and they can all start it,
but control flow can then diverge as some processors branch one way
and others branch different ways, and the CPUs involved communicate
with one another. On some problems such as databases, spreadsheets,
unification languages like prolog, cellular simulations, etc, dataflow
architectures can exploit isomorphisms to the problem space and
achieve amazing speed. On others, they're noticeably inferior to SMP
architectures and implicitly-shared memory, particularly when CPUs are
competing for scarce resources like monitors, network connections,
etc.


Depending on what type of parallel architecture you're using, you can
benefit to different degrees from programming languages and paradigms,
programmers with different expertise, and different compiler
techniques to translate code to forms usable on these systems.


This is just opinion, but.... I think pure SMP is getting pushed
about as far as it can reasonably go at this point; The "low hanging
fruit" in cache coherency, process blocking, context switching, and
memory synchronization has been discovered, and recent performance
advances in hardware look like they're coming from various forms of
hybridization of SMP with dataflow and/or SIMD architectures. It's
time to start looking again at these more distributed architectures,
and of course hybrids that strike some sort of balance between these
paradigms. Language designers and compiler builders are needed to
build tractable abstractions and effective translators for such new
machines.


                                                                Bear


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.