Appearance-based face recognition is often formulated as a problem of comparing labeled example images with unlabeled probe images. Viewed in terms of conventional machine learning, the dimensionality of the data is very high, the number of examples is very small, and the data is corrupted with large confounding influences such as changes in lighting and pose. As a result, conventional techniques such as nearest neighbor classification are not very effective.
A predominant conventional solution is to find a projective embedding of the original data into a lower dimensional space that preserves discriminant information and discards confounding information. These conventional solutions must address three challenges: high dimensionality, learning capacity, and generalization ability. Learning capacity, sometimes called inductive bias or discriminant ability, is the capacity of an algorithm to represent arbitrary class boundaries. Generalization ability is a measure of the expected errors on data outside of the training set, e.g., as measured by classification margin. While tradeoffs of these factors apply in any practical machine learning approach, face recognition presents extreme challenges.
The conventional face recognition technologies can be categorized into two classes: biometric-based methods and learning-based methods. The biometric-based methods match invariant geometrical facial metrics such as the relative distances between the eyes and nose. Learning-based methods use machine learning techniques to extract discriminant facial features for recognition.
In general, complex models with more parameters (e.g., neural networks) have higher learning capacity but are prone to over-fit and thus have low generalization ability. When available, a large quantity of diversified training data can be used to better constrain the parameters. Simpler models with fewer parameters tend to yield better generalization, but have limited learning capacity. The tradeoff in implementing these issues, especially with high dimensional visual data, remains an open issue.
Many discriminant learning methods treat image data as vectors. These approaches have difficulty with high dimensionality, a matter made worse when there is only a small set of training data. Many conventional methods involve solving an eigenvalue problem in the high dimensional input vector space (i.e., 1024 dimensions for 32×32 pixel images). Solving an Eigen decomposition in high dimensions is not only computationally intensive, but also prone to numerical difficulties in which the best discriminative projections may be discarded. Vector-based representations also ignore the spatial structure of image data which may be very useful for visual recognition.