Numerous detection systems identify an input by comparing it with a large set of known examples. Such systems are known as classifiers. A variety of different techniques are available for use in classifiers. Several of these techniques involve using a set of known examples to train the classifier to discriminate between inputs that are of interest and those that are not.
One detection system of this type is a neural network. In a neural network the set of known examples is used to train the network; and unknown objects are then processed by the neural network to determine if they are of interest or not. See, for example, D. A. Forsyth et al., Computer Vision A Modern Approach, ch. 22 (Prentice Hall, 2003) which is incorporated by reference herein.
Inevitably, these detection systems are involved in a tradeoff between sensitivity, or the fraction of true positives detected, and specificity or the fraction of false positives detected. This sensitivity/specificity tradeoff is often depicted in the detection system's receiver operating characteristic (ROC) curve such as that shown in FIG. 1. The ROC curve is a plot 100 of the fraction of true positives detected (TPF) as measured on the ordinate or y axis versus the fraction of false positives detected (FPF) as measured on the abscissa or x axis. As the fraction of true positives detected (or sensitivity) increases, so does the fraction of false positives detected, thereby decreasing the specificity. The determination of each fraction is discussed below.
In a relatively simple detection system, the detection process is binary. The data that is analyzed by the detection system can be classified in two groups: one group relates to a set of inputs that are being sought by the detection system and the other group relates to everything else, namely, a set of inputs that are not being sought by the detection system. In some cases, the detection system operates by generating a numerical score for each input and comparing that score with a threshold value developed from a set of training examples. Each input is assigned into one of the two groups depending on whether the input has a score above or below the threshold. For example, those inputs with scores above the threshold may then be the subject of further investigation while those below the threshold will be ignored.
Typically, the scores of the members of the two groups overlap so that some inputs that are being sought by the detection system have scores that are in the same range as the scores of inputs that are not being sought by the detection system. This situation is depicted in FIG. 2 which is a plot of numbers of inputs versus score for the inputs being sought and for the inputs not being sought. Envelope 210 depicts the distribution of the number of inputs being sought versus score and envelope 230 depicts the distribution of the number of inputs not being sought versus score.
If the threshold (TH) is set in the region where the scores of the two groups overlap, some inputs that are not being sought will be classified with those being sought. Such inputs are called false positives (FP) and are identified by region 240 in envelope 230 in FIG. 2. The remaining inputs in envelope 230 which are not being sought are referred to as true negatives (TN). Similarly, some inputs that are being sought will be classified with those not being sought. Such inputs are called false negatives (FN) and are identified by region 220 in envelope 210 in FIG. 2. The remaining inputs in envelope 210 which are being sought are called true positives (TP). The fraction of true positives detected that is measured on the y axis of FIG. 1 is the number of true positives detected divided by the total number of inputs under envelope 210 or #TP/(#TP+#FN). The fraction of false positives detected that is measured on the x-axis of FIG. 1 is the number of false positives detected divided by the total number of inputs under envelope 230 or #FP/(#FP+#TN). The fraction of true positives detected is also the probability of detecting a true positive and the fraction of false positives detected is also the probability of detecting a false positive.
As will be apparent, the location of the threshold has a substantial impact on the numbers of true positives, true negatives, false positives and false negatives. If the threshold is shifted so as to make more stringent the test for identification of an input being sought, both the number of true positives and the number of false positives identified will be reduced. As shown in FIG. 2, this is represented by a shift of the threshold to position A which reduces both the number of true positives and the number of false positives. Conversely, if the threshold is shifted so as to relax the test for identification of an input being sought, both the number of true positives identified and the number of false positives identified will be increased. This is represented in FIG. 2 by a shift of the threshold to position B which increases both the number of true positives and false positives. Reducing the numbers of true positives and false positives identified by making the identification test more stringent also reduces the fractions of true positives detected and false positives detected since the denominators of these fractions are unchanged and shifts the operating point of the detection system so that it is nearer the bottom left hand corner of the ROC curves of FIG. 1. Conversely, increasing the numbers of true positives and false positives by relaxing the identification test also increases the fractions of true positives detected and false positives detected and shifts the operating point of the detection system nearer the upper right hand corner of the ROC curve of FIG. 1.
In the medical arts, the trade-off between sensitivity and specificity that is represented by the ROC curve is always a concern. If the detection system is not sensitive enough, it may report too few true positives (i.e., more false negatives) which typically represent missed opportunities to detect some sort of problem that may well be life-threatening. On the other hand, if the detection system is not specific enough, it may report too many false positives which typically will result in the performance of additional medical procedures to establish the true nature of the false positive and, in many cases, considerable emotional stress on the part of the patient. Faced with this trade-off, the medical practitioner is usually forced to set the threshold of his/her detection system by trial-and-error at some value that assures the detection of significant numbers of true positives at the cost of some false positives.