Optimizing Load Balance and Communication on Parallel Computers with Distributed Shared Memory

Rudolf Berrendorf

Abstract
To optimize programs for parallel computers with a distributed sharedmemory, two main problems need to be solved: load balance between theprocessors and minimization of interprocessor communication. Many knowntechniques solve either of the two problems, but to get efficientexecution on highly parallel systems the optimization process has to take care of both problems. This article describes a new technique called data-driven scheduling, which can be used on sequentially iterated program regions (with parallelism inside) on parallelcomputers with a distributed shared memory. During the first executionof the program region, statistical data on execution times of tasks and memory access behaviour are gathered. Based on this data, a special graph is generated on which graph partitioning techniques are applied. The resulting partitioning is stored in a template and used in subsequent executions of the program region to efficiently schedule the parallel tasks of that region. Data-driven scheduling is integratedinto the SVM-Fortran compiler. Performance results are shown for the Intel Paragon XP/S with the DSM-extension ASVM and for the SGIOrigin2000.
Contact
Rudolf Berrendorf
Central Institute for Applied Mathematics,Research Centre Juelich,D-52425 Juelich, Germany
r.berrendorf@fz-juelich.de