Efficient Interactive LLM Serving with Proxy Model-based Sequence Length PredictionHaoran QiuWeichao Maoet al.2024ASPLOS 2024
Dynamic Alert Suppression Policy for Noise Reduction in AIOpsKaran BhukarHarshit Kumaret al.2024ICSE 2024
Optimizing IT FinOps and Sustainability through Unsupervised Workload CharacterizationXi YangRohan R. Aroraet al.2024IAAI 2024
Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU SlicingAbhishek MalvankarOlivier Tardieu2024KubeCon EU 2024
CASPIAN: A Carbon-Optimized Multi-Cluster Job SchedulerTayebeh BahreiniAsser Tantawi2024KubeCon EU 2024
Trimaran: Load-Aware Scheduling for Power Efficiency and Performance StabilityAsser TantawiChen Wang2024KubeCon EU 2024
Designing a Lightweight Network Observability agent for Cloud ApplicationsPravein Govindan KannanShachee Mishra Guptaet al.2024PAM 2024
A 12nm Linux-SMP-Capable RISC-V SoC with 14 Accelerator Types Distributed Hardware Power Management and Flexible NoC-Based Data OrchestrationMaico Cassel Dos SantosTianyu Jiaet al.2024ISSCC 2024