Outlier Impact Characterization for Time Series Data
Abstract
For time series data, certain types of outliers are intrinsi- cally more harmful for parameter estimation and future pre- dictions than others, irrespective of their frequency. In this paper, for the first time, we study the characteristics of such outliers through the lens of the influence functional from ro- bust statics. In particular, we consider the input time series as a contaminated process, with the recurring outliers gen- erated from an unknown contaminating process. Then we leverage the influence functional to understand the impact of the contaminating process on parameter estimation. The in- fluence functional results in a multi-dimensional vector that measures the sensitivity of the predictive model to the con- taminating process, which can be challenging to interpret es- pecially for models with a large number of parameters. To this end, we further propose a comprehensive single-valued metric (the SIF) to measure outlier impacts on future predic- tions. It provides a quantitative measure regarding the outlier impacts, which can be used in a variety of scenarios, such as the evaluation of outlier detection methods, the creation of more harmful outliers, etc. The empirical results on multi- ple real data sets demonstrate the effectivenss of the proposed SIF metric.