1. Field of the Invention
The present invention relates in general to object detection and tracking, and in particular to a system and method for estimating the orientation of an object.
2. Related Art
Facial gaze, i.e., the orientation of a person""s head, gives cues about a person""s intent, emotion, and focus of attention. As such, head orientation can play an important role in vision-based interfaces, where it can provide evidence of user action and lead to more detailed analysis of the face. A substantial part of facial image processing is concerned with determination of head pose. There are techniques based on tracking blobs of color, tracking particular facial features, tracking point features, following optic flow, and fitting textures.
Although many of these systems can be used for applications such as graphical avatar puppetteering and hands-free cursor control, they have constraints that limit them for other applications. For example, many of these systems are based on tracking image features or computing dense optic flow, and therefore require high-resolution images of the subject to succeed. Many systems also require significant restrictions on operation, such as per-user initialization, stable illumination conditions, or approximately frontal facial poses.
Some systems have attempted alternative approaches to overcome some of these limitations. One such system builds an ellipsoidal texture model of the head and determines pose by matching model projections to live images. This avoids dependency on high-resolution images and tracks the full range of orientations, but nevertheless requires initialization for each subject and static illumination conditions. Another system uses Gabor wavelet transforms. These systems take an xe2x80x9ceigenfacexe2x80x9d approach to construct a linear image space of poses and use PCA-based techniques for representing pose changes. Because of the known limitations of PCA-based techniques for representing pose changes, it is not clear whether this system generalizes well to recovering more than the single rotational parameter that they consider. Yet another technique develops an example-based system which trains a neural network from example poses. However, pose estimation is treated as a brute-force recognition task and does not take advantage of known geometry. Lastly, another system uses elastic bunch graphs of wavelet feature vectors to determine head pose. Although this technique is relatively insensitive to person and illumination, it depends on good resolution.
Therefore, what is needed is a system for coarse head-orientation estimation that is insensitive to skin color, to glasses or facial hair, and to other common variations in facial appearance. What is also needed is a head orientation system that can handle large variations in illumination and side and back views. What is additionally needed is a head orientation system that works under a significant range of image scales and resolutions and does not require per-user initialization. Whatever the merits of the above mentioned systems and methods, they do not achieve the benefits of the present invention.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention is embodied in a system and method for automatically estimating the orientation or pose of an object, such as a human head, from any viewpoint. The system is insensitive to illumination changes and is applicable to people having varying appearances without requiring initialization for each user. In addition, the system and method of the present invention operates over a wide range of head orientations, such as, side and back views.
In general, the present invention is a system for estimating the orientation of an object and includes training and pose estimation modules. The training module receives training data and extracts unique features of the data, projects the features onto corresponding points of a model and determines a probability density function estimation for each model point to produce a trained model. The pose estimation module receives the trained model and an input object and extracts unique input features of the input object, projects the input features onto points of the trained model and determines an orientation of the input object that is most likely given the features extracted from input object. In other words, the training module uses known head poses for generating observations of the different types of head poses and the pose estimation module receives actual head poses of a subject and uses the training observations to estimate the actual head pose.
Specifically, the training module first builds a model of the head, such as a 3D ellipsoidal model, where points on the model maintain probabilistic information about local head characteristics. Any suitable characteristic can be used, but feature vectors based on edge density are preferred. Data is then collected for each point on the model by extracting local features from previously given annotated training images and then projecting these features onto the model. Each model point then learns a probability density function from the training observations. Once training is complete, the pose estimation module is then able to process new input images. The pose estimation module extracts features from input images, back projects these extracted features onto the model, and then finds the pose that is most likely given the current observation, preferably by using the maximum a posteriori criterion.
The present invention as well as a more complete understanding thereof will be made apparent from a study of the following detailed description of the invention in connection with the accompanying drawings and appended claims.