Workshop paper

DPU-based Optimization of GPU Data Paths through Real-time Diagnostics and On-Demand Control

Abstract

The optimization of GPU data paths is critical for application performance, yet is fundamentally limited by the host CPU’s role as a data-plane bottleneck. This paper presents a DPU-based framework that achieves this optimization through two tightly coupled mechanisms: real-time diagnostics and on-demand CPU control. Our approach to real-time diagnostics employs a GPU-monitored pressure metric, which combines queue occupancy with traffic rate differentials to accurately assess data path state. This diagnostic signal enables on-demand CPU control: the CPU is invoked only when high pressure is detected, allowing it to formulate and offload control policies without participating in routine data forwarding. This integrated approach results in substantial optimization, demonstrated by a 26% latency reduction over kernel stacks and a 10% increase in effective GPU memory utilization.