Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
The increasing energy demands of AI/ML workloads in data centers necessitate efficient scheduling solutions. Using NVIDIA's multi-instance GPU (MIG) technology, which allows partitioning a GPU into multiple slices, we propose a scheduling framework that minimizes energy consumption and job tardiness. We first evaluate four heuristic scheduling algorithms on fixed MIG configurations, identifying a promising one. Next, we develop a reinforcement learning (RL)-based approach to dynamically repartition the GPU according to the diurnal changes in workload patterns. This approach outperforms no partitioning by 68%, static partitioning by 31%, and twice-daily repartitioning by 26%, according to a joint energy-tardiness metric.
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Miao Guo, Yong Tao Pei, et al.
WCITS 2011