Shixuan Zhao, Zhongshu Gu, et al.
CCS 2025
The optimization of GPU data paths is critical for application performance, yet is fundamentally limited by the host CPU’s role as a data-plane bottleneck. This paper presents a DPU-based framework that achieves this optimization through two tightly coupled mechanisms: real-time diagnostics and on-demand CPU control. Our approach to real-time diagnostics employs a GPU-monitored pressure metric, which combines queue occupancy with traffic rate differentials to accurately assess data path state. This diagnostic signal enables on-demand CPU control: the CPU is invoked only when high pressure is detected, allowing it to formulate and offload control policies without participating in routine data forwarding. This integrated approach results in substantial optimization, demonstrated by a 26% latency reduction over kernel stacks and a 10% increase in effective GPU memory utilization.
Shixuan Zhao, Zhongshu Gu, et al.
CCS 2025
Ilias Iliadis
International Journal On Advances In Networks And Services
Juan Miguel De Haro, Rubén Cano, et al.
IPDPS 2022
Alessandro Pomponio
Kubecon + CloudNativeCon NA 2025