Aditya Malik, Nalini Ratha, et al.
CAI 2024
In recent years, a new breed of non-uniform memory access (NUMA) systems has emerged: multi-socket servers of multi-cores. This paper makes the case that data management systems need to employ designs that take into consideration the characteristics of modern NUMA hardware. To prove our point, we focus on a primitive that is used as the building block of numerous data management operations: data shuffling. We perform a comparison of different data shuffling algorithms and show that a naïve data shuffling algorithm can be up to 3× slower than the highest performing, NUMA-aware one. To achieve the highest performance, we employ a combination of thread binding, NUMA-aware thread allocation, and relaxed global coordination among threads. The importance of such NUMA-aware algorithm designs will only increase, as future server systems are expected to feature ever larger numbers of sockets and increasingly complicated memory subsystems.
Aditya Malik, Nalini Ratha, et al.
CAI 2024
Leonid Karlinsky, Joseph Shtok, et al.
CVPR 2019
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023