A hurdle encountered in human face recognition problems is how to deal with variations in facial images due to facial pose changes. Consequently, while limited desirable facial recognition results have been obtained with frontal facial images, recognition performance degrades quickly with variations in facial pose. Accordingly, accurate measurements of facial pose may facilitate facial recognition if some form of the measurement data, such as 3D models of the face or sampled facial images across pose(s)—which may generate the facial image—is available.
Intuitively, the pose of an object can be estimated by comparing the positions of salient features of the object. For a human face, the positions, for example, of the eyes, eyebrows and mouth are usually visible and prominent. And while a global shape of the human face is highly variable from person to person according to age, gender, hairstyle, the size and shape of these facial features generally vary within predictable ranges. As a result, these and other features may be used to render distinct gradient signatures in images that are distinguishable from one another.
Prior art methods of determining head pose estimation, while advantageously employing learning approaches, have met with limited success. In particular, a method disclosed by N. Kruger, M. Patzsch, and C. Van der Malsberg, in an article entitled “Determination of face position and pose with a learned representation based on labeled graphs”, which appeared in Image and Vision Computing, Vol. 15, pp. 665-673, 1997, performs the estimation of position and pose by matching the facial image to the learned representation of bunch graphs. In addition, a method disclosed by Y. Li, S. Gong, and H. Liddell in an article entitled “Support vector regression and classification based on multi-view face detection and recognition”, which was presented at FG2000, also employed Support Vector Regression (SVR) learning on the PCA subspace of the outputs from directional Sobel filters. Lastly, S. Li, J. Yan, X. W. Hou, Z. Y. Li and H. Zhang disclose a method utilizing two stages of the support vector learning, by first training an array of SVR's to produce desired output signatures, then training the mapping between the signatures and the facial poses, in “Learning Low Dimensional Invarient Signatures of 3-D Object under Varying View and Illumination from 2-D Appearances,” which appeared in ICCV 2001.
Despite such progress, more efficient, expedient approaches are required. It is therefore the object of the present invention to provide a method of determining the pose of a human head in natural scenes such that our method may facilitate the development of human recognition systems.