Pattern classification is used in many practical applications, such as visual pattern and speech recognition. In pattern classification, pertinent features or attributes of a measured signal are identified, and information about these features is extracted. Features can include shape, color, texture, motion, depth for visual signals, and pitch and amplitude for audio signals. These features are then associated or correlated with feature vectors. A large number of pattern classification systems are known. A small set of examples are described in U.S. Pat. No. 6,058,205, “System and method for partitioning the feature space of a classifier in a pattern classification system,” issued to Bahl, et al. on May 2, 2000; U.S. Pat. No. 5,870,729, “Self-organizing neural network for pattern classification,” issued to Toda on Feb. 9, 1999; U.S. Pat. No. 5,664,068, “Method and apparatus for pattern classification using distributed adaptive fuzzy windows,” issued to Huang, et al., on Sep. 2, 1997; U.S. Pat. No. 5,505,057, “Pattern classification system,” issued to Sato, et al., on Apr. 9, 1996; U.S. Pat. No. 5,337,371 “Pattern classification system,” issued to Sato, et al., on Aug. 9, 1994; U.S. Pat. No. 5,181,259, “General method of pattern classification using the two domain theory,” issued to Rorvig on Jan. 19, 1993; U.S. Pat. No. 5,060,277, “Pattern classification means using feature vector regions preconstructed from reference data,” issued to Bokser on Oct. 22, 1991; U.S. Pat. No. 4,773,099, “Pattern classification means for use in a pattern recognition system,” issued to Bokser on Sep. 20, 1998.
In pattern classification, it is generally required to obtain class probabilities for a particular feature vector to determine information, such as the number of occurrences of a particular feature in a signal and the time and place of each occurrence of the feature. For many applications, this is often done by modeling the marginal density of the feature space of a classifier and characterizing each class with a model. The class probabilities of the particular feature vector are then determined using a model for each class.
Pattern classification methods can be broadly categorized into two categories: The first category requires explicit class-conditional probability values of the signal being classified, and the second category does not. The first category is sometimes referred to as the sampling approach, while the second category is referred to as the diagnostic paradigm.
The second category of methods, i.e., methods that do not require explicit determination of class conditional probability values, typically determine discriminant functions of the signal being classified, and classify the signal on the values taken by these functions. The functions used may be diverse, ranging from simple linear functions, to complex structures such as classification and regression trees. These can be referred to as discriminant-based methods.
Methods in the first category require explicit representations of the probability distributions of classes. These distributions are usually estimated either using non-parametric kernel methods, e.g., Parzen windows, or parametric methods that assume specific parametric forms for the distributions, e.g., Gaussian mixtures. Class-conditional probabilities are used to estimate a posteriori class probabilities, which form the basis for classification. These methods can be referred to as distribution-based methods.
The dichotomy between the two categories of methods is not complete. Methods that use explicit representations of class probability distributions are effectively based on discriminant functions. For instance, the classification rule of a distribution-based two-class classifier is based on the comparison of the ratio of the a posteriori probabilities of the classes against a threshold. In that case, the ratio is the discriminant function. Multi-class classification can be expressed similarly as the successive application of a series of such two-class discriminants.
In order to impart conceptual clarity to the subject matter of the present invention, the distinct categorization of pattern classification methods is maintained.
Distribution-based classifiers are widely used for classification tasks in diverse disciplines, and are particularly useful in classifying real-valued data. However, the performance of these classifiers is dependent on obtaining good estimates of the class-conditional distributions of the various classes. While it is relatively easy to determine the best set of parameters for a given parametric model of distributions, determining the most appropriate parametric form is frequently a difficult problem. Inaccurate models can lead to reduced classification accuracies.
Therefore, it is desired to improve the performance of distribution-based classifiers under this scenario.