An evaluation of error confidence interval estimation methods
Abstract
Reporting the accuracy performance of pattern recognition systems (e.g., biometrics ID system) is a controversial issue and perhaps an issue that is not well understood [5, 7]. This work focuses on the research issues related to the oft used confidence interval metric for performance evaluation. Using a biometric (fingerprint) authentication system, we estimate the False Reject Rates and False Accept Rates of the system using a real fingerprint dataset. We also estimate confidence intervals of these error rates using a number of parametric (e.g., see [7]) and non-parametric (e.g., bootstrapping [1, 3, 6]) methods. We attempt to assess the accuracy of the confidence intervals based on estimate and verify strategy applied to repetitive random train/test splits of the dataset. Our experiments objectively verify the hypothesis that the traditional bootstrap and parametric estimate methods are not very effective in estimating the confidence intervals and magnitude of interdependence among data may be one of the reasons for their ineffective estimates. Further, we demonstrate that the resampling the subsets of the data samples (inspired from moving block bootstrap [4]) may be one way of replicating interdependence among the data; the bootstrapping methods using such subset resampling may indeed improve the accuracy of the estimates. Irrespective of the method of estimation, the results show that the (1 - α) 100% confidence intervals empirically estimated from the training set capture significantly smaller than (1 - α) fraction of the estimates obtained from the test set.