[sac-user] with-loop parallelization

Clemens Grelck c.grelck at uva.nl
Mon Jan 10 16:21:28 CET 2011

Hi David,

Juhasz David wrote:
> Hi Carl,
> Thanks for the links, they are useful papers, especially the newest one.
> In that paper you wrote, you have implemented different selectors and 
> schedulers. As I can see, the compiler use static scheduling by default. 
> How can i choose which scheduler or selector will be used?

To be honest the implementations of different schedulers and task selectors
needs some revisiting following a complete re-implementation of the multi
threaded code generation over the last couple of years as a result of
other changes. I would discourage experimentation at this moment. :-(

> Another question is about nested with-loops. If a with-loop is 
> parallelized, there is no way to parallelize an inner with-loop, because 
> there is no idle worker thread. So the multi-threaded code is parallelized 
> by the outermost with-loop. One of my ideas is a tree-based 
> parallelization for nested with-loops, but i don't sure there is a reason 
> for existence of a complex technique like this. However take to the 
> current (one with-loop) parallelization, theoretically, in special cases 
> maybe better if parallelize by an inner with-loop instead of the 
> outermost. It is not possible to give any hint to the compiler, which 
> with-loop to parallelize?

Such a hint would be fairly easy to implement actually. I will make a
note for future releases of SAC, but at the moment we do not have such
a feature.

Re your tree-based parallelisation idea: The SAC compiler aggressively
resolves nested with-loops as far as possible to expose the full amount
of concurrency in a single with-loop.

So far, all real world examples and benchmarks we have come across so far
did not question the design to only parallelise a single level of with-loops.
It boils down to the qestion whether the outermost with-loop provides
sufficient concurrency wrt the target architecture. If you provide with
some realistic example where this is not the case I'm happy to look into
this. Of course, it is easy to construct examples that fool the SAC
compiler, but are these really representative of production code? That
is my point.

Also take into account the following: Any more flexible parallelisation
scheme (which I personally would love to see materialise) must not
create more overhead in the simple bread-and-butter examples of


Dr Clemens Grelck                                     Science Park 904
Universitair Docent                                  1098 XH Amsterdam
Universiteit van Amsterdam
Instituut voor Informatica                       T +31 (0) 20 525 8683
                                                  F +31 (0) 20 525 7490
Office C3.105                               www.science.uva.nl/~grelck

More information about the sac-user mailing list