Orchestra: Guaranteeing Performance SLAs for Cloud Applications by Avoiding Resource Storms
Abstract
This paper presents Orchestra, a cloud-specific framework for managing both foreground applications (e.g., Web, DBMS) and background services (e.g., backup, security check, batch jobs) in the user space. Orchestra is designed to address 'resource storms' caused by sudden executions of the background services on the cloud instances. The resource storms significantly degrade the performance of foreground applications by interfering in the preemption of the shared resources, resulting in frequent SLA violations and poor user experience. Orchestra takes an online approach using lightweight monitoring and creates performance models for multiple cloud applications on the fly. It then optimizes the allocations of shared resources to meet SLAs. We evaluate the performance of Orchestra on a production cloud (Amazon EC2) with a diverse range of SLA requirements. The experiment results show that Orchestra successfully guarantees the foreground application's performance to meet its SLA targets at all times. Moreover, Orchestra maintains the background's performance by minimizing its performance penalty with proper allocation of the shared resources.