This disclosure relates generally to recognizing facial positioning and alignment in an image, and more specifically to localizing positions of facial anchor points on a face.
Face alignment localizes facial anchor points on a face in an image. Facial anchor points are contour points around facial features such as eyes, noses, mouth, and jaw lines. Features (e.g., shape and texture) extracted from the localized facial anchor points provide fundamental information for many face processing applications, such as face tracking, face modeling, face recognition, facial expression analysis, and face synthesis. A number of different approaches exist for face alignment. Examples of these approaches include a cascade of gradient boosted decision trees (GBDTs), a cascade of Gaussian process regression trees, or other cascade learning framework. However, many of these approaches currently suffer from drawbacks.
One drawback is that these approaches drive up cost and power consumption. For example, some approaches need to calculate a transformation matrix between current coordinates defined in an image and coordinates defined in a default shape for each facial anchor point. This calculation is computationally intensive. In another example, prediction models generated by these approaches may be too large to be stored on a mobile device or quickly downloaded, which can prevent any required updates.
Another drawback is that these approaches may not provide accurate prediction models. For example, these approaches train prediction models globally without considering large variations on facial pose, facial lighting, facial expression, and occlusion. In another example, over-fitting occurs when there is a discrepancy between learning rate and prediction. Without considering variations on different levels of a cascade, these approaches apply a global learning factor (also referred to as shrinkage factor) to all levels of the cascade to reduce over-fitting. This may result in inaccurate shape prediction.