The design of statistical pattern recognition systems is important for a wide variety of statistical classification problems including, but not limited to: seismic signal analysis for geophysical exploration, radar signal analysis for weather radar systems and military applications, analysis of biomedical signals for medical and physiological applications, classification of objects in images, optical character recognition, speech recognition, handwriting recognition, face recognition, and fingerprint classification.
The statistical pattern recognition problem involves classifying a pattern into one of several classes by processing features associated with the pattern, wherein a pattern is determined by numerical features that have been extracted from a digital signal associated with one of the problems similar to those outlined above. Numerical features can be extracted from a variety of digital signals, e.g., seismic signals, radar signals, speech signals, biomedical signals, images of objects, hyperspectral images or multispectral images. For a given type of digital signal, thousands of numerical features are available, wherein numerical features are extracted by computer-implemented methods.
An important attribute of statistical pattern recognition systems involves learning from a set of training patterns, wherein a training pattern is represented by a d-dimensional vector of numerical features. Given a set of training patterns from each pattern class, the primary objective is to determine decision boundaries in a corresponding feature space that separate patterns belonging to different pattern classes. In the statistical decision theoretic approach, the decision boundaries are determined by the probability distributions of the feature vectors belonging to each category, wherein the probability distributions determine the structure of a discriminant function and the probability distributions must be specified or learned.
In the discriminant analysis-based approach, a parametric form of the decision boundary is specified, e.g., a linear or quadratic form, and the best decision boundary of the specified form is found based on the classification of the training patterns. For example, support vector machines learn decision boundaries from training patterns, wherein the capacity of a linear or nonlinear decision boundary is regulated by a geometric margin of separation between a pair of margin hyperplanes.
The computer-implemented design of a discriminant function of a classification system involves two fundamental problems: (1) the design of numerical features of the objects being classified for the different classes of objects, and (2) the computer-implemented design of the discriminant function of the classification system.
For M classes of feature vectors, the feature space of a classification system is composed of M regions of feature vectors, wherein each region contains feature vectors that belong to one of the M classes. The design of a computer-implemented discriminant function involves designing a computer-implemented method that uses feature vectors to determine discriminant functions which generate decision boundaries that divide feature spaces into M suitable regions, wherein a suitable criterion is necessary to determine the best possible partitioning for a given feature space.
The no-free-lunch theorem for supervised learning demonstrates that there is a cost associated with using machine learning algorithms to determine discriminant functions of classification systems. Criteria of performance for a classification system must be chosen, and a class of acceptable classification systems must be defined in terms of constraints on design and costs. Finally, a classification system can be determined within the specified class—which is best in terms of the selected criteria—by an extremum of an objective function of an optimization problem that satisfies the criteria of performance and the constraints on the design and costs.
Suppose that a theoretical model of a discriminant function of a classification system can be devised from first principles, wherein the structure and the properties of the theoretical model satisfy certain geometric and statistical criteria. The no-free-lunch theorem for supervised learning suggests that the best parametric model of the classification system matches the theoretical model, wherein the structure and the properties of the parametric model are determined by geometric and statistical criteria satisfied by the theoretical model.
What would be desired is to (1) devise a theoretical model of a discriminant function of a binary classification system, wherein the discriminant function of the binary classification system exhibits certain geometric and statistical properties and is represented by a geometric and statistical structure that satisfies certain geometric and statistical criteria, and (2) devise a parametric model of a discriminant function of a binary classification system that matches the theoretical model, wherein the structure and the properties of the parametric model satisfy fundamental geometric and statistical criteria of the theoretical model, wherein the discriminant function is represented by a geometric and statistical structure that matches the structure exhibited by the theoretical model and also exhibits fundamental geometric and statistical properties of the theoretical model, and (3) discover or devise an algorithm for which criteria of performance satisfy fundamental geometric and statistical criteria of the theoretical model of a discriminant function of a binary classification system, wherein a class of discriminant functions of binary classification systems are defined in terms of an objective function of an optimization problem that satisfies fundamental geometric and statistical conditions and costs.
In particular, it would be advantageous to devise a computer-implemented method for using feature vectors and machine learning algorithms to determine a discriminant function of a minimum risk linear classification system that classifies the feature vectors into two classes, wherein the feature vectors have been extracted from digital signals such as seismic signals, radar signals, speech signals, biomedical signals, fingerprint images, hyperspectral images, multispectral images or images of objects, and wherein the minimum risk linear classification system exhibits the minimum probability of error for classifying the feature vectors into the two classes.
Further, it would be advantageous if discriminant functions of minimum risk linear classification systems can be combined additively, wherein M ensembles of M−1 discriminant functions of M−1 minimum risk linear classification systems determine a discriminant function of an M−class minimum risk linear classification system that classifies feature vectors into M classes. It would also be advantageous to devise a method that determines a fused discriminant function of a fused minimum risk linear classification system that classifies different types of feature vectors into two classes, wherein different types of feature vectors have different numbers of vector components and may be extracted from different types of digital signals. Further, it would be advantageous to extend the method to M classes of feature vectors. Finally, it would be advantageous to devise a method that uses a discriminant function of a minimum risk linear classification system to determine a classification error rate and a measure of overlap between distributions of feature vectors for two classes of feature vectors, wherein the distributions of feature vectors have similar covariance matrices. A similar method could be used to determine if distributions of two collections of feature vectors are homogenous distributions.