[sac-user] with-loop parallelization

Robert Bernecky bernecky at snakeisland.com
Mon Jan 10 16:40:27 CET 2011


I agree with Clemens here:

The SISAL people started out with fine-grain parallelism
scheduling, but soon found that the amount of available
work did meet their expectations (Something a chap named
Amdahl had brought up earlier...), so they changed to
coarse-gain parallelization of the outer loops, with
better results. In a SIMD machine, parallelization
of already-optimized inner loops might make sense.
In SPMD/MPMD machines such as the multi-core boxes
we have available now, outer-loop parallelization
almost always performs better for real applications,
especially those in which we know little or nothing
about the application domain.

As a crude example, consider a 1-D convolution:
inner-loop parallelization would give you speedup of
only shape(filter), whereas outer-loop parallelization
would give you speedup of the number of rows in the
convolution matrix.

 From what you say, I get the impression you are attempting
to do both. If so, take Clemens' last paragraph to heart.
I'd be happy if we could get either of them work well on
AKD arrays.

Bob

Clemens Grelck wrote:
> Hi David,
> 
> Juhasz David wrote:
>> Hi Carl,
>>
>> Thanks for the links, they are useful papers, especially the newest one.
>>
>> In that paper you wrote, you have implemented different selectors and 
>> schedulers. As I can see, the compiler use static scheduling by 
>> default. How can i choose which scheduler or selector will be used?
> 
> To be honest the implementations of different schedulers and task selectors
> needs some revisiting following a complete re-implementation of the multi
> threaded code generation over the last couple of years as a result of
> other changes. I would discourage experimentation at this moment. :-(
> 
>> Another question is about nested with-loops. If a with-loop is 
>> parallelized, there is no way to parallelize an inner with-loop, 
>> because there is no idle worker thread. So the multi-threaded code is 
>> parallelized by the outermost with-loop. One of my ideas is a 
>> tree-based parallelization for nested with-loops, but i don't sure 
>> there is a reason for existence of a complex technique like this. 
>> However take to the current (one with-loop) parallelization, 
>> theoretically, in special cases maybe better if parallelize by an 
>> inner with-loop instead of the outermost. It is not possible to give 
>> any hint to the compiler, which with-loop to parallelize?
> 
> Such a hint would be fairly easy to implement actually. I will make a
> note for future releases of SAC, but at the moment we do not have such
> a feature.
> 
> Re your tree-based parallelisation idea: The SAC compiler aggressively
> resolves nested with-loops as far as possible to expose the full amount
> of concurrency in a single with-loop.
> 
> So far, all real world examples and benchmarks we have come across so far
> did not question the design to only parallelise a single level of 
> with-loops.
> It boils down to the qestion whether the outermost with-loop provides
> sufficient concurrency wrt the target architecture. If you provide with
> some realistic example where this is not the case I'm happy to look into
> this. Of course, it is easy to construct examples that fool the SAC
> compiler, but are these really representative of production code? That
> is my point.
> 
> Also take into account the following: Any more flexible parallelisation
> scheme (which I personally would love to see materialise) must not
> create more overhead in the simple bread-and-butter examples of
> parallelisation.
> 
>   Clemens
> 



More information about the sac-user mailing list