Interpretation of a video sequence for human-machine interaction purposes is a problem often encountered in the image processing industry. Face tracking is an important aspect of the interpretation of video sequences, and can be classified as a high level problem. However, merely tracking the face is often not enough in certain types of applications. For example, in face recognition applications, the positions of the eyes and the mouth are needed to enable a recognition algorithm to operate. Another example of where the positions of the eyes and the mouth are needed is an eyes/head control interface. In such an interface the movement of the head is used for control of a cursor, instead of a mouse for example. Further examples include virtual reality applications and video games.
A number of techniques have been proposed for tracking facial features. The most common technique uses skin colour to detect and track a face segment. The eyes and the mouth are then detected inside this skin coloured segment. Eye blinking, valley detection, and pattern matching using principal component analysis are some of the techniques used to locate the eyes. Most of these techniques rely on a frame by frame basis feature detection. Furthermore, skin detection based on the colour of skin is known to be highly unreliable under changing lighting conditions.
Many techniques use luminance block matching in order to track the features. Such techniques give good results as long as the features are visible throughout the video. If one or more feature disappear, it is not possible to detect these features again when they reappear in the video.
Other prior art techniques use deformable wireframe models and motion information to track the whole face and its features. Some of those techniques are able to track the features even in a side view of the face, but require a large amount of computation power.
Yet another technique is based on the “red eye” effect, often seen in a photograph where a flash was used and is caused by the reflection of light from the retina of the eye. Because the reflection only occurs when the person more or less faces the camera, this technique has only limited application.