Any discussion of the prior art throughout the specification should in no way be considered as an admission that such prior art is widely known or forms part of common general knowledge in the field.
The tracking of faces and facial features, such as a person's eyes, has attracted considerable interest over the past decade, as computers have become sufficiently powerful to enable practical solutions to this problem.
There are two known approaches that have been proposed for solving this problem. The first approach is a geometric approach utilising three-dimensional point features in the face and geometric reasoning to derive the three-dimensional pose. The second is a non-linear optimisation of the parameters of an appearance model.
The first approach using point features has an advantage that it is a deterministic. Non-iterative approaches provide for short and predictable time to calculate the solution, and have been popular for real-time systems. Edwards et al (U.S. Pat. No. 7,043,056) disclose a typical proponent of this methodology.
The second approach is an iterative, non-linear optimisation problem, which in general is computationally expensive. Trade-offs in terms of predictable convergence accuracy are required to achieve predictable computation times. Depending on the parameters of the appearance mode, an advantage of this approach is that a better fidelity of tracking can be achieved. It is understood that because the appearance of the observed object can be modelled and predicted more accurately than with the point feature approach. Cootes et al 2001 (T. Cootes, G. Edwards, C. Taylor, “Active appearance models”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):681-685, 2001), Cootes et al 2002 (T. Cootes, G. Wheeler, K. Walker, C. Taylor, “View-based active appearance models”, Image and Vision Computing, 20:657-664, 2002) and Matthews et al (I. Matthews and S. Baker, “Active appearance models revisited”, International Journal of Computer Vision, Vol. 60, No. 2, November, 2004, pp. 135-164) discloses a typical implementation of this methodology.
De la Torre et al (F. De la Torre, M. Black, “Robust parametrized component analysis: theory and applications of 2D facial appearance models”, Computer Vision and Image Understanding 91 (2003) 53-71) discloses the use of a person-specific two-dimensional active appearance model, which is not capable of tracking a person in three dimensions.
Dornaika et al (F. Dornaika, J. Ahlberg “Face model adaptation using robust matching and active appearance models”, Proceedings of Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002), 3-7) and Hu et al (C. Hu, R. Feris, M. Turk, “Active Wavelet Networks for Face Alignment”, Proceedings of British Machine Vision Conference, Norwich, 2003) disclose splitting a rendered face into sub-features, but fail to model and exploit the overlapping nature of facial features. A typical example for these occlusions occurs when the face is seen in a semi-profile view, where the ridge of the nose forms an edge over the far cheek. Previous systems are limited to the non-occluded view to features, and fail as soon as features start overlapping due to the projection of the facial features in the image.
It would be appreciated that technical challenges associated with this problem are considerable, in particular since an object such as the human face has high inter-individual variations, and the face is a highly articulate object.
There is a need in the art for automatic tracking of human faces in video sequences.