Towards establishing causality between change and incident
Abstract
It is common knowledge in the IT service domain that changes to the system configuration are responsible for a major portion of incidents that result in client outages. However, it is typically very difficult to establish a relationship between changes and incidents as proper documentation takes lower priority at change creation time, as well as during incident management, in order to deal with the tremendous time pressure to quickly implement changes and resolve incidents. As a result, it is often not possible to leverage historical data to perform retrospective analysis to identify any emerging trends linking changes to incidents, or to build predictive models for proactive incident prevention at change creation time. In this paper, we present an approach for establishing causality between changes and incidents through an ensemble of statistics, data classification, and natural language processing techniques. We demonstrate our approach with a real world example.