Conference paperMind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM InferencePol G. Recasens, Ferran Agullo, et al.CLOUD 2025
Workshop paperTowards Pareto Optimal Throughput in Small Language Model ServingPol G. Recasens, Yue Zhu, et al.EuroSys 2024
PaperThe Bottlenecks of AI: Challenges for Embedded and Real-Time Research in a Data-Centric AgeTarek Abdelzaher, Yigong Hu, et al.Real-Time Systems