# Events

**Date:**29.10.2019

**Time:**18:30

**Place:**Particles Conference 2019 VI International Conference on Particle-Based Methods, Barcelona, Spain

*Godehard Sutmann*

Load imbalance is a common problem for parallel applications, which often arises when work load distributions are inhomogeneously distributed in the global system setup, or often occur when local inhomogeneities in the work density show up when increasing the number of processors. In fact, for a lot of parallel applications which rely on domain decomposition as a parallel strategy, the processors define a spatial discretisation. Increasing the processor count, the spatial resolution is increased, which resolves density differences (which are often related to differences in work distribution) on a finer scale which consequently lead to runtime differences on the individual processors. Since differences in work load do lead to reduced parallel efficiency, various methods for an improved load balance between the processors have been proposed [1, 2, 3, 4]. For fast evaluation of redistribution of work among the processors, information about the workload has to be exchanged over the whole system. Convergence can be achieved either by global information, where all processors know the detailed work distribution, or by relaxation methods, where the load is adjusted in an iterative way by local information exchange. This is in very close analogy to relaxation methods applied to solve partial differential equations [5]. Accordingly, convergence is strongly slowed down when working with large processor counts. Therefore, we developed a scheme which combines ideas from multi-level relaxation methods with load balancing echniques to achieve a convergence acceleration for a homogeneous work load distribution over a given set of processors. Algorithms, based on an orthogonal recursive bisectional approach are considered which are combined either with a relaxation approach or an inverse cumulative function approach. The procedure is described of how to partition the system of processors to geometrical space, when global information is needed for the spatial tesselation with a minimal amount of data, which have to be exchanged, in order to apply the scheme successfully during simulations while minimising the epartitioning overhead.