The problem addressed by the present invention is real-time, accurate facial feature detection and tracking in unconstrained images and videos. There are existing algorithms for detecting landmarks in images however; the prior art methods lack accuracy, robustness and can be too slow for real-time applications.
U.S. Pat. No. 8,121,347 (“System and Method for Detecting and Tracking Features in Images”) proposes the formation of clustered shape subspaces corresponding to a set of images so that shape changes nonlinearly due to perspective projection and complex 3D movements. Then a traditional landmark localization algorithm is used on each of the clusters. There are several drawbacks of the proposed method: (1) it uses a model, and cannot represent asymmetric expressions well; (2) the method is slow because it has to search for each landmark along the diagonal; (3) the method to create the model has to cluster the data and hence is prone to local minima; and (4) it is unclear that richer and more discriminative features can be added. Instead the proposed method uses richer features and proposes a new and better algorithm for landmark localization.
U.S. Pat. No. 8,103,058 (“Detecting and Tracking Objects in Digital Images”), presents a solution for detecting and tracking objects in digital images which includes selecting a neighborhood for each pixel under observation, the neighborhood being of known size and form, and reading pixel values of the neighborhood. However, the described method does not work well for deformable objects, such as faces.
The problem of facial feature detection and tracking can be formulated as a non-linear least square problem. To solve non-linear least squares problems it is generally accepted that 2nd order descent methods are the most robust, fast and reliable approach for nonlinear optimization of a general smooth function. However, in the context of facial feature detection and tracking, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable (e.g., when using histogram of gradients (HoG) features) and numerical approximations are impractical; and (2) The Hessian might be large and not positive definite.
It would therefore be desirable to provide a method of detecting and tracking facial features that addresses the identified deficiencies of the prior art methods.