This invention relates to a real-time face tracking technique which may be used in various applications, such as video communication, advanced human-computer interface, digital libraries, and object-based video coding.
A number of techniques are known for detecting areas of interest in an image, such as a face or other identified object of interest. Face detection is an area of particular interest, as face recognition has importance not only for image processing, but also for identification and security purposes, and for human-computer interface purposes. A human-computer interface not only identifies the location of a face, if a face is present, it may also identify the particular face, and may understand facial expressions and gestures.
Traditional face detection techniques incorporate a correlation template which computes similarity measurements between a fixed target pattern and multiple candidate image locations. If any part of the similarity measurement exceeds a threshold value, a xe2x80x9cmatchxe2x80x9d is declared, which indicates that a face has been located. Location information for the detected face is then provided. Multiple correlation templates may be used to detect major facial sub-features, such as eye shape, nose shape, etc. A related technique is view-based eigen-spaces, which defines a distance measurement based on a parameterizable sub-space of the original image vector space. A face is considered to be detected if the distance measurement is below a predetermined threshold value.
Another face detection technique uses spatial image invariants which compile a set of image invariants peculiar to facial images. The input image is scanned for a positive occurrence of these invariants at all possible locations to determine if a human face is present at any location.
Examples of existing face tracking techniques include techniques that use (1) correlation between adjacent frames; (2) 2D/3D geometrical face models; and (3) illumination models. The existing face tracking algorithms have one or more of the following disadvantages: they are: (1) sensitive to partial occlusions and shadows; (2) sensitive to faceorientation and/or scale changes; (3) sensitive to lighting condition changes; (4) computationally intensive, and therefore difficult to apply to real-time applications; and (5) may require initial positions of the faces.
Specific techniques for face tracking methodologies are disclosed in J. Yang and A. Waibel, Tracking humanfaces in real-time, Proc. IEEE Workshop on Applications of Computer Vision, 1996, which discusses a system that acquires a red-green-blue (RGB) image, and processes the image by filtration to generate a chromatic image (red and green) of pure colors in the absence of intensity, or brightness. This transformation from RGB to RG is a transformation from a three-dimensional space to a two-dimensional space. Distribution of facial colors within the chromatic color space is presumed to be clustered in a small region. The work describes a finding that skin color in chromatic color space has a similar Guassian distribution, regardless of the skin color of an individual and regardless of lighting conditions. A two-dimensional Guassian model is used to map the RGB pixel map onto a chromatic color space (r, g), which is indicative of facial color. Based on the results of the 2D Guassian model, for each pixel within the RGB image, the facial region of the image is determined. The 2D Guassian model is, however, computationally intensive, and therefore too expensive for real-time systems, in spite of the title of the paper. Additionally, the technique uses a very simple tracking mechanism, which may cause the tracked face to become xe2x80x9cjitteryxe2x80x9d during processing;
A Elefheriadis and A. Jacquin, Automatic face location detection and tracking for model-assisted coding of video teleconferencing sequences at low bit-rates, Signal Processing: Image Communication, No. 7, 1995, describe a system for face detection and tracking that is designed for video processing and is suitable for detecting xe2x80x9ctalking heads,xe2x80x9d i.e., a head-and-shoulder shot, wherein the person in an image is modeled as an elliptical region of interest. The system presumes that an outline of a human head is generally elliptical and that there is a high temporal correlation from frame to frame. The system determines the difference between the positions of objects in consecutive frames and sets thresholds to identify regions of significant movement, which regions are indicated as non-zero. Regions that are both elliptical in shape and are indicated as non-zero are located and identified as non-zero regions.
G. D. Hager and P. N. Belhumeur, Real-time tracking of image regions with changes in geometry and illumination, Proc. Computer Vision and Pattern Recognition, 1996, discuss a face-tracking system that defines a window and analyzes the brightness, or intensity, of an image in the window. The pattern of the brightness, or intensity, within the window is used to track the object in the window, such as a face, between frames of the sequence. This system is sensitive to face orientation and changes, including partial occlusions and shadows, both of which obscure the pattern of the image, however, the system is unable to initially locate the position of a face in an image.
U.S. Pat. No. 5,642,431 to Poggio et al., granted Jun. 24, 1997, for Network-based system and method for detection of faces and the like, discusses an imaging system that captures an image and classifies the captured image in accordance with patterns generated by a pattern prototype synthesizer.
U.S. Pat. No. 5,450,504 to Calia, granted Sep. 12, 1995, for Method for finding a most likely matching of a target facial image in a data base of facial images, discusses an identification system used to match a subject image with images stored in a data base.
U.S. Pat. No. 5,430,809 to Tomitaka, granted Jul. 4, 1995, for Human face tracking system, discusses a technique for identifying and tracking an object, such as a human face, in an image captured by a video camera.
U.S. Pat. No. 5,280,530 to Trew et al., granted Jan. 18, 1994, for Method and apparatus for tracking a moving object, discusses a technique for face tracking in a videophone application wherein the face is masked to form a template and divided into sub-templates. The next frame of video is analyzed to detect any displacement of the template and sub-templates. Displacements are used to determine affine transform coefficients, ultimately resulting in an updated template and mask.
U.S. Pat. No. 5,187,574 to Kosemura et al., granted Feb. 16, 1993, for Method for automatically adjusting field of view of television monitor system and apparatus for carrying out the same, discusses a security system wherein the size of the head and shoulders of a subject is manipulated to maintain a relatively constant representation on a video monitor.
U.S. Pat. No. 5,164,992 to Turk et al., granted Nov. 17, 1992, for Face recognition system, discusses comparison of members of a group with images in a stored database.
U.S. Pat. No. 5,103,484 to Stafford et al., granted Apr. 7, 1992, for Target aimpoint location, discusses a system in which an image is depicted in skeletal form for purposes of identification.
U.S. Pat. No. 4,991,223 to Bradley et al., granted Feb. 5, 1991, for Apparatus and Method for Recognizing image features using color elements, discusses evaluating a scanned video image to determine, pixel-by-pixel, whether a given pixel is a member of one of a number of image features of interest.
U.S. Pat. No. 4,975,969 to Tal, granted Dec. 4, 1990, for Method and apparatus for uniquely identifying individuals by particular physical characteristics and security system utilizing the same, discusses a system which compares distances between predetermined facial features on a subject, and compares the data to that stored in a data base.
U.S. Pat. No. 4,975,960 to Petajan, granted Dec. 4, 1990, for Electronic facial tracking and detection system and method and apparatus for automated speech recognition, discusses a system for determining speech patterns by analyzing facial movements.
A method for robust human face tracking in the presence of multiple facial images includes taking a frame from a color video sequence as a current input image; filtering the current input image to form a filtered image; estimating the locations and sizes of faces in the filtered image based on a projection histogram of the filtered image; estimating face motion in the filtered image; and outputting the location and size of the tracked faces within the filtered image.
It is an object of this invention is to provide a face tracking technique that is computationally efficient and may be performed in real-time.
Another object of the invention is to provide a face tracking technique that is robust against partial occlusions and shadows.
A further object of the invention is to provide a face tracking technique that is invariant to facial orientation and changes in scale.
Another object of the invention is to provide a face tracking technique that is less sensitive to lighting condition changes.
Still another object of the invention is to provide a face tracking technique that does not require determination of the initial positions of the tracked faces.
An additional object of this invention is to develop a face tracking technique that is able to track a dominant face when more than one face or other skin-color-like objects occur in the scene.
These and other objects and advantages of the invention will become more fully apparent as the description which follows is read in conjunction with the drawings.