Discriminant feature extraction plays a central role in recognition and classification. Principal component analysis (PCA) is a classic linear method for unsupervised feature extraction. PCA learns a kind of subspaces where the maximum covariance of all training samples is preserved. More specifically, PCA is mathematically defined as an orthogonal linear transformation that transforms given data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. PCA is theoretically the optimum transform for given data in least square terms.
To facilitate explanation of various techniques, consider face recognition where data are presented in the form of image data. The ability to perform face recognition can be tested according to standards of the Face Recognition Grand Challenge (FRGC). For example, a FRGC version 2.0 test consists of three components: (i) a data set of images of a person (i.e., a face); (ii) a Biometric Experimentation Environment (BEE) distribution that includes all the data sets for performing and scoring trials; and (iii) a set of baseline algorithms for performing trials. With all three components, it is possible to run trials by processing raw images to producing Receiver Operating Characteristics (ROCs) where performance can be judged based on ROCs.
A conventional approach involves so-called “eigenfaces”, which are a set of eigenvectors used in the computer vision problem of human face recognition. To explain an eigenvector, consider that a linear transformation may operate on a vector to change it, for example, by changing its magnitude and its direction. An eigenvector of a given linear transformation is a non-zero vector which is multiplied by a constant called the eigenvalue as a result of that transformation. The direction of the eigenvector is either unchanged by that transformation (for positive eigenvalues) or reversed (for negative eigenvalues). In general, linear transformations of a vector space, such as rotation, reflection, stretching, compression, shear or any combination of these, may be visualized by the effect they produce on vectors. In other words, linear transformations are linear vector functions. Eigenfaces, which are a set of eigenvectors, are derived from the covariance matrix of a probability distribution of a high-dimensional vector space of possible faces of human beings.
To generate a set of eigenfaces, a large set of digitized images of human faces, taken under similar lighting conditions, can be normalized to line up the eyes and mouths. The images can then be resampled at the same pixel resolution. Eigenfaces can be extracted out of the image data by PCA. For example, the following steps can convert an image of a face into eigenfaces: (i) prepare a training set “T”; (ii) subtract the mean where the average matrix “A” is calculated and subtracted from the original in “T” and the results stored in variable “S”; (iii) calculate the covariance matrix; (iv) calculate the eigenvectors and eigenvalues of the covariance matrix; and (v) choose the principal components.
In step (iv), there will be a large number of eigenfaces and, in general, far fewer are needed. To reduce the number, one can select those that have the largest eigenvalues. For instance, a set of 100 pixel by 100 pixel images will create 10,000 eigenvectors. Since most individuals can be identified using a database with a size between 100 and 150, most of the 10,000 eigenvectors can be discarded.
In a typical example, the eigenfaces created will appear as light and dark areas that are arranged in a specific pattern. This pattern represents how different features of a face can be singled out to be evaluated and scored. Often patterns exist to evaluate symmetry, style of facial hair, hairline position, nose size or mouth size. Other eigenfaces can have patterns that are less simple to identify and the image of the eigenface may look very little like a face.
Techniques used in creating eigenfaces may find use outside the realm of facial recognition. For example, the foregoing technique has also been used for handwriting analysis, lip reading, voice recognition, sign language/hand gestures and medical imaging. Therefore, some prefer use of “eigenimage” instead of eigenfaces.
As mentioned, the so-called eigenfaces method for face recognition applies PCA to learn an optimal linear subspace of facial structures. PCA also plays a fundamental role in face sketch recognition. Locality Preserving Projections (LPP) is another typical approach for un-supervised feature extraction. LPP is the linearization of Laplacian Eigenmaps, which can find underlying clusters of samples. LPP shows superiority in terms of image indexing and face recognition.
The “Laplacian faces” face recognition method is based on the combination of PCA and LPP, in the sense that LPP is performed in the PCA-transformed feature space. However, un-supervised learning cannot properly model underlying structures and characteristics of different classes.
Discriminant features are often obtained by class supervised learning. Linear discriminant analysis (LDA) is the traditional approach to learning discriminant subspaces where the between-class scatter of samples is maximized and the within-class scatter is minimized at the same time. The so-called Fisherfaces algorithm and many variants of LDA have shown good performance in face recognition in complex scenarios.
By defining representations of intra-personal and extra-personal differences, Bayesian face recognition proposes another way to explore discriminant features via probabilistic similarity measure. In one study, the inherent connection between LDA and Bayesian faces was unified in a more general form.
LDA algorithm has the advantages of being reasonable in principle and simple in form. The conventional LDA algorithm is formulated by the ratio of between class scatter and the within-class scatter which are represented by norms measured with Euclidean metrics. So there is an underlying assumption behind LDA that it works in Euclidean spaces. However, there are many scenarios where sample spaces are non-Euclidean in computer vision. For instance, distances between feature vectors yielded by histograms cannot be measured by Euclidean norms. In this case, some non-Euclidean measures are usually applied, such as the Chi squares statistic, the log-likelihood statistic, and the histogram intersection. The primary formulation of LDA does not hold in non-Euclidean spaces. As a consequence, LDA fails to find the optimal discriminant subspace.
As described herein, various exemplary techniques can be applied to high dimensional spaces that may have non-Euclidean metrics. While the foregoing discussion mentions face recognition, various exemplary techniques can be applied in areas other than face recognition and in areas where data are other than “image” data.