Face tracking techniques are used to find features on a user's face, such as eyes, mouth, nose, and so forth. A camera captures frames containing the face, and the techniques examine the face to find the facial features. With some techniques, various two-dimensional (2D) feature points first are found on the face. Once the 2D feature points have been found, then 3D shape parameters can be found. These 3D shape parameters may include the 3D shape of a user's head, the 3D head pose, and the 3D position of the 2D points observed on the image frame.
One application of these face tracking techniques is to take the frames containing the user's face and model the facial expressions of the user. In other words, the way in which the user moves his face, such as lifting his eyebrows, smiling, and moving his mouth, is modeled. This modeling involves converting what is seen in the frames captured by the camera into movements of the face. This face model can be used to animate the user's face in a computing environment or deduce the meaning of a user's facial expression.
One difficulty, however, is that different people have different facial features. For example, some people have larger mouths than others, some have eyes that are far apart, and some have eyes that are close together. If the actual shape of the user's head is not taken into account, then the facial features of the user can easily be misread.
A 2D face alignment approach typically uses a face tracking system to obtain the 2D feature points from the captured frames. This system fits a model to the user's face to find its feature points. The model includes a base shape (or neutral face) plus a linear combination of deformations that represent head shape variations and facial expressions of the face.
Typical face tracking systems use a calibration step to compute the base shape and head shape deformation vectors that represent a user's specific head. The process involves asking a user to stare at the camera in certain position while keeping still, and the user is asked to look at the camera with a countenance that is devoid of expression. If the captured image of the user's face has any expression on it, then the user may be asked again to provide an expressionless image of his face. This process can be burdensome to the user.
One type of 2D face alignment approach is called an active appearance model (AAM). An AMM is a computer vision technique for tracking facial features in two dimensions. The AAM technique matches a statistical model of a shape and appearance of an object to a new image. The AAM is widely used for face recognition and tracking and for medical image interpretation. The AAM technique uses a difference between a current estimate of appearance and a target image to drive an optimization process.
In order to improve performance, the AAM technique can be constrained by a 3D face and head model. In this situation, the 3D face mask is represented as a linear combination of a neutral mask, face shape deformations and facial features deformations (for example mouth movements). This representation is given mathematically as:
                              S                      3            ⁢                                                  ⁢            DMask                          =                              S            0                    +                                    ∑                              j                =                1                            R                        ⁢                                                  ⁢                                          SU                i                            ⁢                              s                i                                              +                                    ∑                              j                =                1                            Q                        ⁢                                                  ⁢                                          AU                i                            ⁢                              a                i                                                                        (        1        )            where S0 is a neutral 3D mask, SUj and Sj are j-th shape unit coefficient (SUs or 3D head shape parameters), AUj are its corresponding shape deformation basis vector (animation units (AUs) or facial expression parameters)and aj are j-th animation unit coefficient and its corresponding animation deformation basis vector.
Equation (1) is a linear combination of the “average” 3D mask for the entire human race and the deformation vectors that are learned statistically from Principal Component Analysis. This is based a theory that any faces can be represented as a linear combination of an average face plus some deformation basis vectors. The term, S3DMask, represents a particular facial expression for a particular person. It is the sum of the average human face shape plus shape deformations plus animation deformations, expression deformations, or both.
Shape units (or 3D head shape parameters) and shape basis vectors represent variations in human head shapes. Animation units (or facial expression parameters) and animation basis vectors represent facial movements. Neutral 3D mask, its shape and animation deformation basis vectors are known and constant. They can be manually created by an artist to represent a set of 3D faces, or may be learned statistically from a training set by using algorithms like Principal Component Analysis.
A 2D to 3D AAM may use a 3D model to constrain energy minimization, produce realistic results for human faces, and to produce 3D tracking parameters (such as 3D head pose, 3D head shape parameters (SUs) and 3D facial expression parameters (AUs). This fitting process determines a set of SUs, AUs and the 3D head pose in addition to the 2D face parameters. Unfortunately, some 3D head shape and animation basis vectors may not be orthogonal and may be correlated with head pose changes.
For example, moving head up or down (also known as “pitching”) may be explained by changing a pitch angle a little bit and moving eyebrows or mouth up or down or it can be explained only by changes in the head pose. This may lead to non-unique and incorrect results when the 2D to 3D AAM computes both the 2D and 3D (SU, AU, head pose) parameters in one combined energy minimization. Incorrectly computed 3D shape parameters are fed back to the AAM, which can contribute to a bad 2D fitting on subsequent video frames.
This problem can be greatly reduced if the correct 3D head and face shape parameters (SUs and scale) are known beforehand. In this case, the AAM fitting process uses fixed face shape parameters and computes only 2D face parameters, 3D head pose and 3D facial expression parameters.