Marcelo Amaral, Tatsuhiro Chiba, et al.
CLOUD 2022
AI systems are often deployed as bare-metal servers in on-prem environments. While bare-metal deployment is performant, it is not flexible in a multi-tenant environment. Instead, public clouds and some private clouds use virtual machines (VMs) to securely partition the system resources interconnect among multiple customers. Major public cloud providers provision AI VM systems using KVM-derived hypervisors (AWS Nitro, GCE KVM, etc). However, the changes and configuration are not published so others cannot reproduce the same performance using open source virtualization stack (KVM/QEMU) on emerging AI systems. In this technical talk, we will discuss the optimizations required to achieve near bare-metal performance: enabling GPU passthrough, 100GbE RoCE over virtual functions and GPUDirect RDMA (GDR) inside VMs. These optimizations include hardware configuration changes to enable topology visibility into the VM, firmware changes, virtual machine configurations to faithfully represent the AI system capabilities inside the VM, and AI training configurations.
Marcelo Amaral, Tatsuhiro Chiba, et al.
CLOUD 2022
Pranjal Gupta, Karan Bhukar, et al.
ICPE 2025
Abhishek Malvankar, Olivier Tardieu
KubeCon EU 2024
Darya Kaviani, Sijun Tan, et al.
RWC 2025