Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM InferenceYue ZhuHao Yuet al.2025CLOUD 2025
Securing AI Inference in the Cloud: Is CPU-GPU Confidential Computing Ready ?Apoorve MohanMengmei Yeet al.2024CLOUD 2024