A. Gupta, R. Gross, et al.
SPIE Advances in Semiconductors and Superconductors 1990
Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distance-based outlier detection algorithms. Several of the existing distance-based outlier detection algorithms report log-linear time performance as a function of the number of data points on many real low-dimensional datasets. However, these algorithms are unable to deliver the same level of performance on high-dimensional datasets, since their scaling behavior is exponential in the number of dimensions. In this paper, we present RBRP, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets. RBRP scales log-linearly as a function of the number of data points and linearly as a function of the number of dimensions. Our empirical evaluation demonstrates that we outperform the state-of-the-art algorithm, often by an order of magnitude. © 2008 Springer Science+Business Media, LLC.
A. Gupta, R. Gross, et al.
SPIE Advances in Semiconductors and Superconductors 1990
M.J. Slattery, Joan L. Mitchell
IBM J. Res. Dev
Preeti Malakar, Thomas George, et al.
SC 2012
Raymond F. Boyce, Donald D. Chamberlin, et al.
CACM