Training or developing a classifier is a process of producing a rule that can be used to figure out how to classify examples not necessarily used in training. For example, a classifier may be developed to determine which medicine to prescribe to a cancer patient. The classifier may be developed based on training examples, each consisting of a set of medical data about a patient and which drug was most effective for the patient. Then the classifier may be applied to a set of working examples, each consisting of a set of medical data about a patient for whom no drugs have been tried. (A “class” in this scenario is a set of patients for which the same drug offers the most benefit.) The goals are (1) to train a classifier that effectively determines which drug to administer to which patient and (2) to evaluate how effective that classifier is likely to be. This second goal is called validation, or producing an error bound for a classifier. Validation is the focus of this invention.
Now we discuss prior art. An error bound based on VC dimension (Vapnik and Chervonenkis 1971; Vapnik 1998) uses uniform bounds over the largest number of assignments possible from a class of classifiers, based on worst-case arrangements of training and working examples. However, as the number of training examples grows, the probability that training error is a good approximation of working error is so great that the VC error bound succeeds in spite of using uniform bounds based on worst-case assumptions about examples. Also, it is easy to compute VC bounds for any number of examples, assuming the VC dimension for the class is known. This makes VC bounds useful and convenient for large data sets, i.e., data sets having thousands of examples. However, VC error bounds have some drawbacks: they are ineffective for smaller data sets, and they do not apply to some classifiers, such as nearest neighbor classifiers.
Transductive inference (Vapnik 1998) is a training method that uses information provided by inputs of working examples in addition to information provided by training examples. The idea is to develop the best classifier for the inputs of the specific working examples at hand rather than develop a classifier that is good for general inputs and then apply it to the working examples. Transductive inference improves on general VC bounds by using the actual working example inputs, instead of a worst-case arrangement of inputs, to find the number of different assignments that classifiers in each training class can produce. The bounds are then used to select among classes, mediating a tradeoff between small classes that are more likely to have good generalization and large classes that are more likely to capture the dynamics of the training data.