Srikar Tati, Bongjun Ko, et al.
IEEE TPDS
In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives. © 2012 IEEE.
Srikar Tati, Bongjun Ko, et al.
IEEE TPDS
Wei Gao, Guohong Cao, et al.
ICDCS 2012
Jing Zhao, Xiaomei Zhang, et al.
MASS 2014
Li Qiu, Liang Ma, et al.
ICDCS 2019