Pattern classification is well known in a number of real-world applications, such as, for instance, speech recognition, vehicle seat occupant classification, data mining, risk prediction, diagnosis classification, etc. The primary goal of a pattern classifier is to assign a test pattern to one or more classes of a predefined set of classes. The test pattern may be considered as a vector of features or, more precisely, numbers quantifying these features. A statistical classifier computes the conditional probability of different classes for a given input pattern (hereinafter also referred to as “class membership probability”). The deviation of these class membership probabilities from 1 are often interpreted as a risk of a false classification.
A challenge in pattern classification is the reduction of misclassifications. As a first approach to this problem, it is known to provide the classifier with a “reject” option. A classifier may exercise the reject option whenever none of the conditional probabilities of the different classes for a given input pattern exceeds a required minimum threshold. Otherwise, the classifier assigns the input pattern to the class with the highest conditional probability. As a consequence, a test pattern close to a decision boarder implicitly defined by the classifier is prone to be rejected, while a test pattern far away from the boarder will be assigned to a class. For a detailed description of this technique, the interested reader is referred to the article “On Optimum Recognition Error and Reject Tradeoff” by C. K. Chow, IEEE Transactions on Information Theory, Vol. IT-16, No. 1, January 1970.
Another aspect of the misclassification problem is the estimation of the uncertainty of the class membership probability. A classifier is usually trained, during a training process, by means of training patterns. These training patterns are preferably chosen according to different types (classes) of situations the classifier shall be able to distinguish. The class membership probabilities of a test pattern to be classified are based upon the training patterns used in the training process. Ideally, one would prepare the classifier for all types of situations that can occur. In real-world applications, this is most often impossible to achieve, e.g. +because of “unforeseeable” situations or limited resources. As a result, the feature space, i.e. the space spanned by all possible patterns, is not homogeneously populated with training patterns. Intuitively, uncertainty of a class membership probability outputted by the classifier in response to a given test pattern will be small if the density of training patterns around the test pattern is high. Likewise, the uncertainty will be high if the density of training patterns around the test pattern is low. The idea behind this approach is explained in detail in U.S. Pat. No. 5,335,291 (Kramer et al.), which describes a neural network taking into account the local amount of training data in the vicinity of the test pattern to be classified for verifying that the classification is reliable. The goodness of the neural network output is expressed as a confidence interval.
A classifier that provides the certainty (or the uncertainty) of a class membership probability is attractive in a safety critical context, such as e.g. vehicle seat occupant classification, diagnosis classification, etc., since it allows labelling a test pattern as “unknown” and/or exercise the reject option if the uncertainty of the class membership probability is too high.