This thesis presents novel, robust, analytic and algorithmic methods for calculating Bayesian
posterior intervals of receiver operating characteristic (ROC) curves and confusion
matrices used for the evaluation of intelligent medical systems tested with small amounts
of data.
Intelligent medical systems are potentially important in encapsulating rare and valuable
medical expertise and making it more widely available. The evaluation of intelligent medical
systems must make sure that such systems are safe and cost effective. To ensure systems
are safe and perform at expert level they must be tested against human experts. Human
experts are rare and busy which often severely restricts the number of test cases that may
be used for comparison.
The performance of expert human or machine can be represented objectively by ROC
curves or confusion matrices. ROC curves and confusion matrices are complex representations
and it is sometimes convenient to summarise them as a single value. In the case of
ROC curves, this is given as the Area Under the Curve (AUC), and for confusion matrices
by kappa, or weighted kappa statistics. While there is extensive literature on the statistics
of ROC curves and confusion matrices they are not applicable to the measurement of intelligent
systems when tested with small data samples, particularly when the AUC or kappa
statistic is high.
A fundamental Bayesian study has been carried out, and new methods devised, to provide
better statistical measures for ROC curves and confusion matrices at low sample sizes.
They enable exact Bayesian posterior intervals to be produced for: (1) the individual points
on a ROC curve; (2) comparison between matching points on two uncorrelated curves; .
(3) the AUC of a ROC curve, using both parametric and nonparametric assumptions; (4)
the parameters of a parametric ROC curve; and (5) the weight of a weighted confusion
matrix.
These new methods have been implemented in software to provide a powerful and accurate
tool for developers and evaluators of intelligent medical systems in particular, and to a
much wider audience using ROC curves and confusion matrices in general. This should
enhance the ability to prove intelligent medical systems safe and effective and should lead
to their widespread deployment.
The mathematical and computational methods developed in this thesis should also provide
the basis for future research into determination of posterior intervals for other statistics
at small sample sizes.
Date of Award | 2002 |
---|
Original language | English |
---|
Awarding Institution | |
---|
Evaluation of lntelligent Medical Systems
Tilbury, J. B. (Author). 2002
Student thesis: PhD