Haoran Qiu, Weichao Mao, et al.
USENIX ATC 2023
AIOps can provide essential value for data lakehouses as lakehouses pose complex operational challenges for Site Reliability Engineers (SRE). This paper proposes that the unified approach of data lakehouses creates a unique opportunity for unified data resiliency management. We focus on AIOps applied to disaster recovery and backup/restore. In particular, we focus on managing data lakehouse hardware resources to ensure that lakehouse data Recovery Point Objectives (RPO) are met with a high degree of accuracy. The goal is to warn an SRE about an impending RPO violation and to suggest adding given amounts of hardware resources before a given time to avoid violation of the lakehouse data's RPO. We claim AIOps can achieve this goal with an ensemble of machine learning and time series analysis.
Haoran Qiu, Weichao Mao, et al.
USENIX ATC 2023
Apoorve Mohan, Matthew Sheard
NVIDIA GTC 2022
Runyu Jin, Paul Muench, et al.
FAST 2024
Marcelo Amaral, Tatsuhiro Chiba, et al.
CLOUD 2022