Diagnostic Analysis: Directional Relation Graph
Abstract
System administrators can employ various diagnostic tests to identify failures in high performance computing systems, but manual analysis of the results can be time-consuming. Moreover, the execution of these tests can occupy system resources and individual diagnostic results only represent the instantaneous state of the system. In this paper, we propose the use of a directional relation graph to summarize and visualize diagnostic results over time. The graph is a visual representation of the frequency of different test failures and relations among failures in a specific time range. We demonstrate the directional relation graph using diagnostic results obtained during the execution of synthetic anomalies. Furthermore, we discuss how graph analysis of relations among failures can narrow the suite of tests to reduce overall test time.