Confidence Bands for ROC Curves: Methods and an Empirical Study
[postscript]
[pdf]
Appears in Proceedings of the First Workshop on ROC Analasis in AI (ROCAI-2004) at ECAI-2004. August 2004.
Abstract
In this paper we study techniques for generating and evaluating
confidence bands on ROC curves. ROC curve evaluation is rapidly
becoming a commonly used evaluation metric in machine learning,
although evaluating ROC curves has thus far been limited to studying
the area under the curve (AUC) or generation of one-dimensional
confidence intervals by freezing one variable---the false-positive
rate, or threshold on the classification scoring function.
Researchers in the medical field have long been using ROC curves and
have many well-studied methods for analyzing such curves, including
generating confidence intervals as well as simultaneous confidence
bands. In this paper we introduce these techniques to the machine
learning community and show their empirical fitness on the Covertype
data set---a standard machine learning benchmark from the UCI
repository. We show how some of these methods work remarkably well,
others are too loose, and that existing machine learning methods for
generation of 1-dimensional confidence intervals do not translate well
to generation of simultanous bands---their bands are too tight.