Scalable MPI-3.0 RMA on the blue gene/Q supercomputer
Abstract
The MPI Forum has ratified extensions to MPI RMA with a new flexible and high performance passive target synchronization mechanism, new calls for window allocation and atomic operations executed on remote windows. In this paper, we explore an implementation of this new MPI-3.0 RMA interface on the Blue Gene/Q machine with performance results. We take advantage of the one-sided RDMA get and put operations available on the BG/Q machine. We use various micro benchmarks to show performance improvements of MPI-3.0 RMA over MPI-2.2 RMA. We also use the 2D stencil and 3D FFT benchmarks to compare MPI-3.0 RMA with MPI point to point communication. Microbenchmark performance results show MPI-3.0 RMA has lower latency than MPI-2.2 RMA, while application benchmarks show that MPI-3.0 RMA has comparable performance to MPI point-to-point communication.