Auto-tuning performance of MPI parallel programs using resource management in container-based virtual cloud
Abstract
Load imbalance problem is one of the major obstacles to achieving optimal performance of High Performance Computing applications. The approach of trying to distribute the problem pieces to each node with the hope of balancing execution time has limits since the performance depends not only on data size but also on many other dynamic factors. This paper describes an approach that uses adaptive resource management enabled by the container-based virtualization to solve the load imbalance problem of MPI programs running in the cloud. Our techniques dynamically adjust CPU resource allocation to MPI processes running as container instances according to the current program execution state and system resource status. The resource allocation among MPI processes is adjusted in two ways: the intra-host level, which dynamically adjusts resources within a host; and the inter-host level, which migrates containers together with MPI processes from one host to another host. We have implemented and evaluated our approach on Amazon EC2 platform using real-world scientific benchmarks and applications, which demonstrates that the performance can be improved up to 31% (with an average of 15%) when compared with the baseline.