The present invention relates to a system for locating a human face within an image, and more particularly to a system suitable for real-time tracking of a human face in video sequences.
Numerous systems have been developed for the detection of a target with an input image. In particular, human face detection within an image is of considerable importance. Numerous devices benefit from automatic determination of whether an image (or video frame) contains a human face, and if so where the human face is in the image. Such devices may be, for example, a video phone or a human computer interface. A human computer interface identifies the location of a face, if any, identifies the particular face, and understands facial expressions and gestures.
Traditionally, face detection has been performed using correlation template based techniques which compute similarity measurements between a fixed target pattern and multiple candidate image locations. If any of the similarity measurements exceed a threshold value then a "match" is declared indicating that a face has been detected and its location thereof. Multiple correlation templates may be employed to detect major facial sub-features. A related technique is known as "view-based eigen-spaces," and defines a distance metric based on a parameterizable sub-space of the original image vector space. If the distance metric is below a threshold value then the system indicates that a face has been detected.
An alternative face detection technique involves using spatial image invariants which rely on compiling a set of image invariants particular to facial images. The input image is then scanned for positive occurrences of these invariants at all possible locations to identify human faces.
Yang et al. in a paper entitled A Real-Time Face Tracker discloses a real-time face tracking system. The system acquires a red-green-blue (RGB) image and filters it to obtain chromatic colors (r and g) known as "pure" colors, in the absence of brightness. The transformation of red-green-blue to chromatic colors is a transformation from a three dimensional space (RGB) to a two dimensional space (rg). The distribution of facial colors within the chromatic color space is primarily clustered in a small region. Yang et al. determined after a detailed analysis of skin-color distributions that the skin color of different people under different lighting conditions in the chromatic color space have similar Guassian distributions. To determine whether a particular red-green-blue pixel maps onto the region of the chromatic color space indicative of a facial color, Yang et al. teaches the use of a two-dimensional Guassian model. Based on the results of the two-dimensional Guassian model for each pixel within the RGB image, the facial region of the image is determined. Unfortunately, the two-dimensional Guassian model is computationally intensive and thus unsuitable for inexpensive real-time systems. Moreover, the system taught by Yang et al. uses a simple tracking mechanism which results in the position of the tracked face being susceptible to jittering.
Eleftheriadis et al., in a paper entitled "Automatic Face Location Detection and Tracking for Model-Assisted Coding of Video Teleconferencing Sequences at Low Bit-Rate," teaches a system for face location detection and tracking. The system is particularly designed for video data that includes head-and-shoulder sequences of people which are modeled as elliptical regions of interest. The system presumes that the outline of people's heads are generally elliptical and have high temporal correlation from frame to frame. Based on this premise, the system calculates the difference between consecutive frames and thresholds the result to identify regions of significant movement, which are indicated as non-zero. Elliptical non-zero regions are located and identified as facial regions. Unfortunately, the system taught by Eleftheriadis et al. is computationally intensive and is not suitable for real-time applications. Moreover, shadows or partial occlusions of the person's face results in non-zero regions that are not elliptical and therefore the system may fail to identify such regions as a face. In addition, if the orientation of the person's face is away from the camera then the resulting outline of the person's head will not be elliptical and therefore the system may fail to identify the person's head. Also, if there is substantial movement within the background of the image the facial region may be obscured.
Hager et al. in a paper entitled, Real-Time Tracking of Image Regions with Changes in Geometry and Illumination, discloses a face tracking system that analyzes the brightness of an image within a window. The pattern of the brightness within the window is used to track the face between frames. The system taught by Hager et al. is sensitive to face orientation changes and partial occlusions and shadows which obscure the pattern of the image. The system is incapable of initially determining the position of the face(s).
What is desired, therefore, is a face tracking system that is insensitive to partial occlusions and shadows, insensitive to face orientation and/or scale changes, insensitive to changes in lighting conditions, easy to calibrate, and can determine the initial position of the face(s). In addition, the system should be computationally simple so that it is suitable for real-time applications.