Publication
PETS 2024
Poster

Privacy-Preserving Verification of Preprocessing in Machine Learning Models

Abstract

Machine learning models, while transformative, face verification challenges, especially when trained on sensitive datasets that cannot be publicized due to privacy concerns. Proper preprocessing is crucial for the accuracy and reliability of these models. This work proposes a framework to verify claims about models trained on such datasets, focusing on the correctness of preprocessing steps. We utilize model explainers like LIME and SHAP to quantify the closeness of different models by comparing the cosine distances of their explanations. This approach allows a verifier to determine if proper preprocessing steps were applied to the training set and if the resulting model is accurate. Our scheme employs two methods: a classifier derived from model explainer outputs and a threshold-based verifier using cosine distances from these explanations. Evaluations on datasets show the scheme's accuracy in identifying slight errors and improper preprocessing during model generation. Additionally, the scheme protects original dataset records against inference attacks by sharing a differentially-private version of the dataset for verification. This work emphasizes the importance of verification in machine learning and promotes transparency and trust in AI research, especially with sensitive data.

Date

Publication

PETS 2024

Authors

Topics

Share