NUMA-aware algorithms: The case of data shuffling
Abstract
In recent years, a new breed of non-uniform memory access (NUMA) systems has emerged: multi-socket servers of multi-cores. This paper makes the case that data management systems need to employ designs that take into consideration the characteristics of modern NUMA hardware. To prove our point, we focus on a primitive that is used as the building block of numerous data management operations: data shuffling. We perform a comparison of different data shuffling algorithms and show that a naïve data shuffling algorithm can be up to 3× slower than the highest performing, NUMA-aware one. To achieve the highest performance, we employ a combination of thread binding, NUMA-aware thread allocation, and relaxed global coordination among threads. The importance of such NUMA-aware algorithm designs will only increase, as future server systems are expected to feature ever larger numbers of sockets and increasingly complicated memory subsystems.