Large Language Models can Become Strong Self-Detoxifiers
Irene Ko, Pin-Yu Chen, et al.
ICLR 2025
Abnormal testing data can severely reduce model performance if not processed properly. In this paper, we propose a preprocessing system to handle different types of commonly seen abnormal testing data. The system consists of an aberrant data detector and an aberrant data corrector. The aberrant data detector is responsible for classifying the type of incoming data. Based on the data type, the aberrant data corrector will take different actions to amend testing data. Users can then apply their preferred prediction methods on the corrected testing data. Specifically, corrupted and adversarial images are used as examples of abnormal data. We show that corrupted data can be reconstructed through a Gaussian locally linear mappings method, and the prediction performance of adversarial samples can be improved by using the nearest neighbors as a surrogate. We compare the proposed aberrant data detector and corrector with existing and well-recognized alternatives. These approaches are published individually and do not put two components together as a pre-processing system. The numerical outcomes show that our proposed components, standing alone, are competitive. The proposed system is a generic method that can be applied to different downstream predictive models. We use three existing prediction methods to illustrate the general usage of the proposed system and its capability of improving prediction efficacy.
Irene Ko, Pin-Yu Chen, et al.
ICLR 2025
Shengyun Peng, Pin-Yu Chen, et al.
NeurIPS 2025
Tsui Wei Weng, Huan Zhang, et al.
GlobalSIP 2018
Yihao Xue, Siddharth Joshi, et al.
ICML 2023