It is not an easy task to estimate the age of a person solely from his/her facial appearance. The notion of physical age of people is well defined, and there is some general way of a person's facial appearance is affected by age. However, there is a great deal of ambiguity in the recognition of age by facial appearance, and the recognition is also subjective and error-prone.
The age recognition can be solved by fundamentally the same approach typically used in face recognition: the supervised learning technique. To train a supervised learning machine to recognize age, it is necessary to have a training set of facial images along with annotated ages. However, it is hard to have a face dataset with a reliable age annotation. Because of the age-appearance ambiguity, the human annotator will make subjective judgment of the age based on his/her experience. As a result, the trained classifier will attain the same degree of ambiguity.
The main idea behind the present method is that it is much easier to judge whether one person is older than the other than to determine individual age. It is also much easier to judge whether two people belong to the same age group or not than to estimate actual ages. The determined ‘relative age’ is also more accurate and meaningful when the pair belongs to the same demographics group or when their facial appearance is similar.
Based on these observations, we train learning machines to estimate the relative age of a pair of images and the facial similarity (in terms of the face-based class membership) between the images in the pairs. We call the pair a ‘pairwise facial image’, and regard it as a single data entity. Manual annotation is performed on the pairwise facial images to determine the relative ages; the pairwise facial images along with the relative ages comprise the training data. Given an input query facial image, it is paired with a number of reference facial images, whose ages are known, to form pairwise facial images. These images are fed to the trained learning machine(s) to estimate the relative ages between the input face and the reference faces. The age of the input face is estimated based on these comparisons to the reference faces (the relative ages).
There have been prior attempts for doing demographics classification based on facial images of people.
In U.S. Pat. No. 5,781,650 of Lobo, et al. (hereinafter Lobo), the problem of age classification is handled by focusing on local features that are relevant to aging. The approach is both local feature based and also per-image classification. While Lobo aims to solve the same problem as the present invention do, the approach is vastly different. The proposed invention makes use of holistic image feature, and compares the pair of facial images to estimate the relative age.
U.S. Pat. No. 6,990,217 of Moghaddam, et al. (hereinafter Moghaddam) proposes to employ SVM to find the optimal separating hyperplane in feature space to solve the gender recognition problem. This is a typical approach to solve the demographics recognition problem, by estimating the direct relation from facial images to demographics labels (such as male, female, etc). While the age estimation problem can be solved in the same manner, the success of the approach still depends on the reliability of the provided age labels of the training data. The proposed invention solves the issue by using an implicit relation among the data—relative age measure, which is more accurate and reliable. Unlike Moghaddam, the proposed invention also makes use of other class information; it proposes the use of other face-based class information (such as demographics classes or appearance-based clusters) to make the age estimation problem more specialized. In U.S. Pat. No. 7,848,548 of Moon, et al. (hereinafter Moon), a comprehensive approach to perform demographics classification from tracked facial images has been introduced. The method to carry out the demographics classification, including the age classification, also utilizes conventional machine learning approach to find a mapping from the facial image data to the class labels. The present invention introduces a notion of the relative age of a pairwise image, where similar machine learning approach is used to find the mapping from the pairwise facial image to the relations, instead of the mapping from the set of single images to the set of labels.
There have been prior attempts for utilizing the pairwise relation among data to represent the structure in data, more specifically, for the purpose of clustering or classifying data.
Learning Visual Similarity Measures for Comparing Never Seen Objects, IEEE Conference on Computer Vision & Pattern Recognition 2007, of Nowak and Jurie (hereinafter Nowak) handles the problem of object recognition by using pairwise local feature similarity measure. While the fundamental ideas of the method—of using the relative measure of visual similarity—is shared by the proposed invention, Nowak mainly concerns the problem of generic object recognition, not the age estimation. Their use of local feature comparison is very different from the holistic facial image pair learning of the disclosed invention; the proposed invention aims to solve the age estimation by employing the pairwise annotation and training. Enhancing Image and Video Retrieval: Learning via Equivalence Constraints, IEEE Conference on Computer Vision & Pattern Recognition 2003, of Hertz, Shental, Bar-Hillel, and Weinshall (hereinafter Hertz) introduces a framework using the equivalence relation among data for the processing of visual data. The method is used to handle clustering and classification of facial or video data. While the method shares the same principle of exploiting the relation among data, the present invention specifically makes use of the age order information between facial images; while it is hard to determine actual ages by facial appearance, it is much easier and more reliable to determine which face is older/younger between the two. The present invention employs pairwise training for actual classification. Except for the shared fundamental concept, the method is very different from the disclosed invention in terms of application and method of classification. U.S. Pat. No. 6,453,246 of Agrafiotis, et al. (hereinafter Agrafiotis) introduces a method to build or refine data representation in multi-dimensional space from random, partial, or human observed pairwise relation among data points. The method also shares a common principle (of using pairwise relation) with the present invention; However, Agrafiotis proposes a way to represent and clean up data using any available observation of pairwise relations while the present invention proposes a way to exploit the observable pairwise relation to estimate ages from facial image data.
There have been prior attempts for finding class information of data by utilizing another class information or the data attributes in another dimension.
The present invention employs a class determination method similar to U.S. Pat. No. 5,537,488 of Menon, et al. (hereinafter Menon) for using the face-based class similarity score. However, the present invention simply utilizes the class-determination to weight the relative age between a pair of faces according to the confidence level. U.S. Pat. Pub. No. 20020169730 of Lazaridis (hereinafter Lazaridis) proposed approaches to identifying one or more latent classes among data by utilizing the class information or data attributes in another dimension. To extract more reliable relative age information (class information or in another dimension), the present invention makes use of the class similarity score (class information or data characteristics in another dimension). The present invention shares its very broad framework with Lazaridis; it proposes a novel approach to utilize the relation among the data to combine the class information to extract age information, using the fact that the age comparison is more meaningful within the same class. U.S. Pat. Pub. No. 20030210808 of Chen, et al. (hereinafter Chen) introduced a facial image clustering method where the clustering is based on the similarity score from face recognizer. The present invention utilizes one of such methods to compute the similarity score, to weight the relative age estimation; however, the notion of similarity score in the present invention is broader than this particular method. It can be continuous similarity scores, or class memberships.
In summary, while there have been prior attempts to solve the problem of age estimation (or, more general demographics classification), to find structure in data by utilizing the pairwise relation, and to find the structure of data in one dimension by exploiting the feature in another dimension, the present invention proposes a novel comprehensive approach to solve the problem of age estimation. It utilizes the age relation (relative age) between the pair of facial images (pairwise facial image), and the ease of annotating the age relation. It also employs the pairwise facial image training to find the mapping from the pairwise data to the set of relations. Other facial class information (face-based class similarity) is also used to achieve more reliable age estimation.