The explosion of video media, such as video clips on the World Wide Web, digitized movies, recordings of television (TV) programs on personal video recorders, and home videos, has generated an increasing demand for video mining and video indexing. For example, semantic based video mining techniques, such as news abstraction, sports highlights detection, indexing, and retrieval, are commonly sought after by owners of the media. People often want to index the content of such video data, such as indexing the different characters, or cast of characters, in videos. By cast indexing, owners and viewers of the videos can discover and refer to characters in the videos. For example, a person who may desire to view a video on the World Wide Web may first determine who appears in the video, how frequently they appear, in which scenes they appear, with whom they appear, etc. In other words, indexing characters of the video may allow one to more efficiently browse video clips and other video media.
For detecting characters and cast indexing videos, the human face is usually an important visual cue, often more important than auxiliary cues such as voice or speech, and clothing. Automatic face detection and recognition techniques can be employed as main ways and means for cast indexing. However faces in videos, especially films, sitcoms, and home videos, usually have large variations of pose, expression and illumination which help explain why reliable face recognition is still a very challenging problem for computers.
To reduce the adverse effect of variations in image for video-based face recognition, a lot of methods have been attempted with varying degrees of success. Some people have applied affine warping and illumination correction for face images in an attempt to alleviate the adverse effects induced from pose and illumination variations. However, affine warping and illumination correction are unable to adequately handle out-of-plan face rotation. Others have attempted face recognition based on manifold analysis. Unfortunately, the manifolds of faces and relationships among them in real videos are too complex to be accurately characterized by simplified models. Although some people employ three-dimensional face models to enhance the video-based face recognition performance, three-dimensional face modeling techniques encounter difficulty when trying to accurately recover head pose parameters, even when using state-of-the-art registration techniques. Further, such three-dimensional face modeling techniques are often not practical for real-world applications. In a word, it is very hard to build a robust cast indexing system based only on face recognition techniques.