Improving prediction efficacy through abnormality detection and data preprocessing

Chun-Chen Tu; Pin-Yu Chen; Naisyin Wang

doi:10.1109/ACCESS.2019.2930257

IEEE Access

Paper

01 Jan 2019

Improving prediction efficacy through abnormality detection and data preprocessing

Download paper

Abstract

Abnormal testing data can severely reduce model performance if not processed properly. In this paper, we propose a preprocessing system to handle different types of commonly seen abnormal testing data. The system consists of an aberrant data detector and an aberrant data corrector. The aberrant data detector is responsible for classifying the type of incoming data. Based on the data type, the aberrant data corrector will take different actions to amend testing data. Users can then apply their preferred prediction methods on the corrected testing data. Specifically, corrupted and adversarial images are used as examples of abnormal data. We show that corrupted data can be reconstructed through a Gaussian locally linear mappings method, and the prediction performance of adversarial samples can be improved by using the nearest neighbors as a surrogate. We compare the proposed aberrant data detector and corrector with existing and well-recognized alternatives. These approaches are published individually and do not put two components together as a pre-processing system. The numerical outcomes show that our proposed components, standing alone, are competitive. The proposed system is a generic method that can be applied to different downstream predictive models. We use three existing prediction methods to illustrate the general usage of the proposed system and its capability of improving prediction efficacy.

Conference paper