Re: About exploring TLS parallelism

Mayan Moudgill <>
Fri, 27 Mar 2009 07:20:14 -0400

          From comp.compilers

Related articles
About exploring TLS parallelism (2009-03-25)
Re: About exploring TLS parallelism (Mayan Moudgill) (2009-03-27)
Re: About exploring TLS parallelism (2009-03-30)
Re: About exploring TLS parallelism (Mayan Moudgill) (2009-03-30)
| List of all articles for this month |

From: Mayan Moudgill <>
Newsgroups: comp.compilers
Date: Fri, 27 Mar 2009 07:20:14 -0400
Organization: Compilers Central
References: 09-03-098
Keywords: parallel, optimize
Posted-Date: 27 Mar 2009 07:53:47 EDT wrote:
> As mentioned in some papers, TLS ( Thread-level speculative)
> parallelism is an important complementarity to the ILP.
> In my view, the TLS parallelism exploration is involved with those
> following topics:
> program partition policy;
> thread spawning policy;
> value/branch prediction, memory violation detect technologies, etc 
> those are used in TLS to help with maximize independent thread;
> cost model to drive the TLS considering many facet of overhead such as
> synchronization,communication, thread size, cost of misspeculation;
> mechanism for misspeculation, eg. thread restart, squash...
> I'm a green hand in this field and search for help.
> Could anyone point out the misunderstanding of TLS in my opinion and
> provide me with a clear picture of the TLS trend or recommend some
> good implementations/survey and open research project of TLS ?
> Which topic in the above is worthy of studying ?
> And, does research on TLS has a future or it just has been expired as
> too many works on it?
> Thanks all!
> [I'd think a lot of the VLIW trace scheduling work would be
> relevant here. -John]

It is interesting that "thread-level" parallelism is being considered as
complimentary to ILP. My experience is that, for codes other than
embarassingly-parallel loops, the level of parallelism available at the
local level is small, and that it can be fully consumed by most OoO
superscalar CPUs. So, one couldn't just do a localized splitting of
basic blocks into two threads.

    Further, the work to spawn and join threads, even with hardware
assistance, is high enough that one really wants to be creating
medium-sized threads. For the hardware mechanisms we looked at, we
needed at least 50-100 instructions. So, a compiler will have to look
globally to identify opportunities to kick off new threads. And that's
hard - the alias/memory-carried dependence analysis in
particular. There turns out not to be much difference between thread
extraction and optimizations to identify task level parallelism. This
may change if it becomes cheaper

As an aside - the only decent work I found on thread level parallelism
was done in a 4GL {somehow NOMAD sticks in my mind, but that doesn't
seem right} that had some intersting properties - among them, no
destructive updates. This greatly simplified the dependence analysis.

The other successful approach to multithread "extraction" was done for
the implementation of a dataflow language {Id}; here, every instruction
was conceptually a parallel thread. For perforamnce, the compiler tried
to sequence instructions together to form threads.

I haven't been following TLS; I guess the addition of speculative
aspect is a reaction to the difficulty of the global analyses
required. So, what if, instead of speculative squashing, we had an
oracle that at compile-time would tell us whether two regions were
dependent. Would we still be able to find two independent regions of
reasonable size that could be executed in parallel?

My suggestion (if this for a thesis) is to leave the compiler side
alone, and concentrate on the hardware. At least there, the problems are

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.