INFINICACHE: Exploiting ephemeral serverless functions to build a cost-effective memory cache

Ao Wang; Jingyuan Zhang; Xiaolong Ma; Ali Anwar; Lukas Rupprecht; Dimitrios Skourtis; Vasily Tarasov; Feng Yan; Yue Cheng

FAST 2020

Conference paper

24 Feb 2020

INFINICACHE: Exploiting ephemeral serverless functions to build a cost-effective memory cache

Download paper

Abstract

Internet-scale web applications are becoming increasingly storage-intensive and rely heavily on in-memory object caching to attain required I/O performance. We argue that the emerging serverless computing paradigm provides a wellsuited, cost-effective platform for object caching. We present INFINICACHE, a first-of-its-kind in-memory object caching system that is completely built and deployed atop ephemeral serverless functions. INFINICACHE exploits and orchestrates serverless functions' memory resources to enable elastic payper- use caching. INFINICACHE's design combines erasure coding, intelligent billed duration control, and an efficient data backup mechanism to maximize data availability and cost effectiveness while balancing the risk of losing cached state and performance. We implement INFINICACHE on AWS Lambda and show that it: (1) achieves 31 - 96× tenant-side cost savings compared to AWS ElastiCache for a large-objectonly production workload, (2) can effectively provide 95% data availability for each one hour window, and (3) enables comparative performance seen in a typical in-memory cache.

Conference paper

Optimizing GPU Multiplexing for Efficient and Cost-Effective Access to Diverse Large Language Models in GPU Clusters

Yue Zhu, Chen Wang, et al.

MASCOTS 2024

Conference paper

AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems

Haoran Qiu, Weichao Mao, et al.

USENIX ATC 2023

Talk

How to Deploy a High-performance Distributed AI Training Cluster with NVIDIA GPUs and KVM

Apoorve Mohan, Matthew Sheard

NVIDIA GTC 2022

Conference paper

Wukong: A scalable and locality-enhanced framework for serverless parallel computing

Benjamin Carver, Jingyuan Zhang, et al.

SoCC 2020

View all publications

Abstract

Related

Optimizing GPU Multiplexing for Efficient and Cost-Effective Access to Diverse Large Language Models in GPU Clusters

AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems

How to Deploy a High-performance Distributed AI Training Cluster with NVIDIA GPUs and KVM

Wukong: A scalable and locality-enhanced framework for serverless parallel computing