Wei Zhang, Jinho Hwang, et al.
CoNEXT 2016
Like any other distributed system, cloud management stacks such as OpenStack, are susceptible to faults whose root cause is often hard to diagnose and may take hours or days to fix. We present GRETEL, a system that leverages nonintrusive system monitoring, to expedite root cause analysis of both operational and performance faults manifesting in OpenStack operations. GRETEL uses unique operational fingerprints to quickly identify faulty operations at runtime. GRETEL is accurate in its diagnosis, and achieves >98% precision in identifying the faulty operation with very few false positives even under conditions of stress. GRETEL is lightweight and orders of magnitude faster than prior work, sustaining a throughput of -77 Mbps.
Wei Zhang, Jinho Hwang, et al.
CoNEXT 2016
Sukrit Kalra, Ayush Goel, et al.
FSE 2016
Kshiteej Mahajan, Rishabh Poddar, et al.
DSN 2016
Ali Munir, Ting He, et al.
CoNEXT 2016