Tracking faces in a video sequence is an important module for automated video surveillance. It is a precursor to many applications such as video based face recognition, tag and track of faces and multi-camera indexing. Face tracking in a video has been a long studied problem using many features, such as skin color and edge based face structure features. Tracking faces poses unique set of problems when compared with tracking other objects, such as people, cars, or other objects of interest.
Faces are approximately uniform in their color which makes the tracking solution possible using color as an appearance model. Many researchers have used features derived from the skin color of the face, such as color histogram, for face tracking. Using the face color as an appearance model for tracking provides invariance to different head pose variations. However, the problem of face tracking using color is challenging when the background is of similar color or in the presence of ambient illumination variations. Using edge information of faces as an appearance model for face tracking proved robust to illumination variations. However, the out-of-plane variations of the face pose worsens 2D edge model matching. A generalized tracking algorithm has also been used that models the appearance using a mixture of Gaussians. It may be used for tracking a face with pose changes, typically in-plane pose changes. Others may use an appearance model and embed the tracking in the particle filter framework.
Two main approaches have been used to track faces in videos. In one approach, local features of the face are detected (or manually marked) and the features are tracked over time. This is useful if the orientation of the face needs to be computed along with the face position (as used in Human Computer Interaction applications). Another approach utilizes global features of the face, such as a color histogram, that distinguishes the face from the background.
In surveillance videos, multiple faces need to be tracked with face sizes as small as 24×24 pixels, making it difficult to identify and track local features. The faces can undergo illumination changes (because of shadows and indoor lighting), can have partial occlusions, and have large pose changes. The background may also be cluttered, depending on the setup. These challenges need to be overcome for effective face tracking in a surveillance setup.