Publication
ICAC 2005
Conference paper

Automated and adaptive threshold setting: Enabling technology for autonomy and self-management

View publication

Abstract

Threshold violations reported for system components signal undesirable conditions in the system. In complex computer systems, characterized by dynamically changing workload patterns and evolving business goals, the pre-computed performance thresholds on the operational values of performance metrics of individual system components are not available. This paper focuses on a fundamental enabling technology for performance management: automatic computation and adaptation of statistically meaningful performance thresholds for system components. We formally define the problem of adaptive threshold setting with controllable accuracy of the thresholds and propose a novel algorithm for solving it. Given a set of Service Level Objectives (SLOs) of the applications executing in the system, our algorithm continually adapts the per-component performance thresholds to the observed SLO violations. The purpose of this continual threshold adaptation is to control the average amounts of false positive and false negative alarms to improve the efficacy of the threshold-based management. We implemented the proposed algorithm and applied it to a relatively simple, albeit non-trivial, storage system. In our experiments we achieved a positive predictive value of 92% and a negative predictive value of 93% for component level performance thresholds.

Date

Publication

ICAC 2005

Authors

Share