A work-stealing scheduler for X10's task parallelism with suspensionOlivier TardieuHaichuan Wanget al.2012PPoPP 2012
Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU SlicingAbhishek MalvankarOlivier Tardieu2024KubeCon EU 2024
Incremental GPU Slicing in ActionAbhishek MalvankarOlivier Tardieu2024CNCF-hosted Co-located Events North America 2024
Training Foundation Model Workloads on Kubernetes at Scale With MCADOlivier TardieuAbhishek Malvankar2023K8SAIHPCDAY 2023
GPU OPTIMIZATIONS FOR EFFICIENT AND COST-EFFECTIVE ACCESS TO DIVERSE LARGE LANGUAGE MODELS IN RESEARCH CLUSTERChen WangYue Zhuet al.2024MLSys 2024
Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroSys 2024
META: Middleware for Events, Transactions, and AnalyticsMatthew ArnoldDavid Groveet al.2016IBM J. Res. Dev