Interpretation of a video sequence for human-machine interface purposes is a difficulty often encountered in the image processing industry. Face tracking in particular is one of the most important aspects of such interpretation of video sequences and may be classified as a high level problem, and is often an important initial step in many other applications, including face recognition. Another application is content summarisation, in which an object-based description of the video content is compiled for indexing, browsing, and searching functionalities. Yet another application is active camera control, in which the parameters of a camera may be altered to optimise the filming of detected faces.
Typically, face tracking is divided into two separate steps. First, frames of the video sequence are analyzed to detect the location of one or more faces. When a face is detected, that face is then tracked until its disappearance.
A cue often used in the detection of faces is skin colour. Some known face detection and tracking methods proceed by labelling a detected object having the colour of skin as being a face, and track such objects through time. More sophisticated techniques further analyse each detected object having the colour of skin to determine whether the object includes facial features, like eyes and mouth, in order to verify that the object is in fact a face. However, whilst this technique is fast, such is unreliable. The reason for the unreliability is that skin colour changes under different lighting conditions, causing the skin detection to become unstable.
Other techniques use motion and shape as the main cues. Whenever an elliptical contour is detected within a frame, the object is labelled as a face. Hence, these techniques use a very simple model of the face, that being an ellipse, and assume that the face is moving through the video sequence. Static faces would therefore not be detected.