Power-aware Deep Learning Model Serving with µ-Serve
Haoran Qiu, Weichao Mao, et al.
USENIX ATC 2024
Internet-scale web applications are becoming increasingly storage-intensive and rely heavily on in-memory object caching to attain required I/O performance. We argue that the emerging serverless computing paradigm provides a wellsuited, cost-effective platform for object caching. We present INFINICACHE, a first-of-its-kind in-memory object caching system that is completely built and deployed atop ephemeral serverless functions. INFINICACHE exploits and orchestrates serverless functions' memory resources to enable elastic payper- use caching. INFINICACHE's design combines erasure coding, intelligent billed duration control, and an efficient data backup mechanism to maximize data availability and cost effectiveness while balancing the risk of losing cached state and performance. We implement INFINICACHE on AWS Lambda and show that it: (1) achieves 31 - 96× tenant-side cost savings compared to AWS ElastiCache for a large-objectonly production workload, (2) can effectively provide 95:4% data availability for each one hour window, and (3) enables comparative performance seen in a typical in-memory cache.
Haoran Qiu, Weichao Mao, et al.
USENIX ATC 2024
Runyu Jin, Paul Muench, et al.
ICPE 2024
Shiqiang Wang, Mingyue Ji
NeurIPS 2022
Bo Wen, Yan Koyfman, et al.
Middleware/WOC 2022