The invention relates generally to image pattern detection and recognition and more particularly to a system and a method for face detection.
Vast amounts of digital information, including images, are currently available via the Internet or other electronic databases. Unlike text information, content-based searches of images containing a picture of a target object from these databases is a challenging task. The difficulty arises from the fact that the pictures of the target objects in the stored images, in general, are not identical. Moreover, the target object depicted in one image may have a different orientation and size from the identical target object depicted in another image.
Face detection technology is being advanced, in part, to assist in the development of an image retrieval system that can overcome the above-described difficulty. Face detection is a process of determining whether a picture of a human face is present in an input image and, if so, accurately determining the position(s) of the face(s) within the input image. A face detector is designed to scan the input image to detect human faces that may be depicted in the image, regardless of the size of the human faces. There are two prominent approaches to face detection, a xe2x80x9cneural network-basedxe2x80x9d approach and an xe2x80x9ceigenfacexe2x80x9d approach.
The neural network-based approach utilizes, as the name suggests, a neural network to detect a human face in an input image. The fundamental ideal of the neural network-based approach is to design a neural network that accepts an Nxc3x97M image block and outputs a binary answer indicating a positive or a negative detection of a human face within the image block. The neural network is trained using a large database of training image blocks. The training image blocks are a mixture of face images and non-face images. The training image blocks are typically preprocessed before being input to the neural network. The preprocessing may include removing the DC component in the image block and normalizing the image block. After the neural network has been trained, an input image can be analyzed by the neural network during an on-line detection procedure in which Nxc3x97M image blocks of the input image are preprocessed in the same manner as the training image blocks.
A publication by Henry A. Rowley, Shumeet Baluja and Takeo Kanade, entitled xe2x80x9cNeural Network-Based Face Detection,xe2x80x9d IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, pages 23-28, January 1998, describes a neural network-based face detection system. The face detection system of Rowley et al. utilizes a modified version of the standard neural network-based face detection approach. In particular, the Rowley et al. face detection system uses the training image blocks that have been erroneously detected as containing a human face to further train the neural network in a xe2x80x9cbootstrapxe2x80x9d manner. The publication asserts that the bootstrapping technique reduces the number of training image blocks that are required to sufficiently train the neural network. In addition, the Rowley et al. face detection system includes neutralization of illumination differences in image blocks during the preprocessing procedure by removing the best-fit linear function from the image blocks.
The eigenface approach involves calculating the best linear basis, or principal eigenvector components, called xe2x80x9ceigenfacesxe2x80x9d to approximate a set of training faces. These basis vectors are then used as convolution kernels for matched filtering to detect human faces in an input image. U.S. Pat. No. 5,710,833 to Moghaddam et al. describes an apparatus for detection and recognition of specific features in an image using an eigenface approach. The Moghaddam et al. apparatus utilizes all eigenvectors, not just the principal eigenvector components. The use of all eigenvectors is intended to increase the accuracy of the apparatus to detect complex multi-featured objects.
Although the conventional face detection systems operate well for their intended purposes, what is needed is a face detection system and a method of detecting faces that increase face detection performance in terms of speed and accuracy.
A face detection system and a method of pre-filtering an input image for face detection utilize a candidate selector that selects candidate regions of the input image that potentially contain a picture of a human face. The candidate selector operates in series with an associated face detector that verifies whether the candidate regions do contain a human face. The pre-filtering operation performed by the candidate selector screens out much of the input image as regions that do not contain a human face. Since only the candidate regions are then processed by the face detector, the operation of the candidate selector reduces the amount of computational processing that must be performed by the face detector.
In the preferred embodiment, the candidate selector includes a linear matched filter and a non-linear filter that operate in series to select the candidate regions of the input image. The linear matched filter operates to select image regions that have highly similar image patterns when compared to a face template. The linear matched filter includes a linear correlator and a processing module. The linear correlator performs a linear correlation on the input image using a filtering kernel to derive a correlation image. The filtering kernel is a numerical representation of the face template. The filtering kernel is calculated during a training period, or a non-face detecting period, by a filtering-kernel generator. Preferably, the linear correlation is performed in the discrete cosine transform (DCT) domain, but other approaches are available. The correlation image is then examined by the processing module. The processing module is configured to select temporary candidate regions of the input image using a decision rule. The decision rule dictates that only image regions that are positioned about a local maximum in the correlation image and have pixel correlation values that are greater than a threshold correlation value are to be selected. The temporary candidate regions are then transmitted to the non-linear filter.
The non-linear filter operates to determine whether the temporary candidate regions should be deemed as the candidate regions. The non-linear filter examines contrast values within certain regions of a temporary candidate region, seeking a contrast pattern that is characteristic of eyes of a human face. High contrast values at these regions equate to a likelihood that an image region contains a human face. In one embodiment, the non-linear filter includes three contrast calculators and a decision module. The contrast calculators compute contrast values for particular upper segments of an image region. The first contrast calculator computes a contrast value for an upper-half segment of the image region. The second contrast calculator computes contrast values for two upper quadrants of the image region. Thus, the first and second contrast calculators are dedicated to the top fifty percent of a temporary candidate region. The third contrast calculator computes contrast values for three adjacent segments that define a portion of the upper-half segment, e.g., the top thirty-three percent of the temporary candidate region. These contrast values are transmitted to the decision module. The contrast values computed by the second and third contrast calculators are compared to a threshold contrast value. If these values exceed the threshold contrast value, the image region is deemed to be a candidate region and is transmitted to the face detector. In an alternative configuration, the three contrast calculators may be embodied in a single contrast calculator.
The face detector of the system may utilize a neural network-based approach, an eigenface approach or any other known technique to detect a human face in the candidate regions. In the preferred embodiment, the face detector is a face detection system of Rowley et al., utilizing the original face detection scheme. In the most preferred embodiment, the face detector is the face detection system of Rowley et al., utilizing a fast version of the original face detection scheme. The face detector operates to receive the candidate regions from the candidate selector and determine whether the one or more of candidate regions contain a human face. The determination by the face detector may be displayed on a display device in which a verified candidate region is identified by superimposing an outline of the region over the original input image.
In order to detect faces of different sizes, the face detection system includes an image scaler that modifies the scale of the input image. The image scaler receives the input image and sequentially transmits the input image in smaller scales to the candidate selector. The first transmitted input image may be the input image in the original scale. In the preferred embodiment, the image scaler decreases the scale of the input image by a factor of 1.2. However, other reductions may be utilized.
The system may include a filtering-kernel generator that provides the filtering kernel for the linear matched filter of the candidate selector. The filtering-kernel generator is configured to calculate the filtering kernel from a large database of sample face images. The filtering-kernel generator does not operate during an on-line operation, i.e., a face detecting procedure. Instead, the calculation of the filtering kernel is performed during an off-line operation, i.e., a training procedure. The filtering kernel is calculated prior to the face detecting procedure.
The filtering-kernel generator includes an averaging unit, a DCT operator, a masker and an inverse discrete cosine transform (IDCT) operator. The calculation of the filtering kernel begins when the large database of sample face images of a fixed size is input to the averaging unit. Preferably, the face images are 8xc3x978 pixel images. The averaging unit averages the face images to derive an averaged image and outputs the averaged image to the DCT operator. The DCT operator transforms the averaged image from the spatial domain to the DCT domain. The transformed image is then transmitted to the masker. The masker removes DC, illumination and noise frequency components from the transformed image. Next, the averaged image is transformed back to the spatial domain by the IDCT operator. The resulting image is the filtering kernel. This filtering kernel is stored in a memory of the system, until requested by the linear matched filter. When used in a linear correlation, the filtering kernel also removes the components of the input image that are associated with the DC, illumination and noise influences.
In an alternative embodiment, the training face images are first transformed from the spatial domain to the DCT domain, masked, transformed back to the spatial domain, and then averaged to derive the filtering kernel. In this alternative embodiment, the DCT operator initially receives the training images. The DCT operator then transforms each received training face image from the spatial domain to the DCT domain. Next, the masker discards the DC, illumination and noise components from the transformed face images. The IDCT operator transforms the masked face images back to the spatial domain. The face images are then averaged by the averaging unit to derive the filtering kernel.
An advantage of the invention is that the candidate selector is not operatively limited to a particular type of face detectors and may be used in conjunction with a variety of conventional face detectors. Another advantage is that the speed of face detection can be significantly increased by the use of the candidate selector, depending on the particular type of face detector utilized. Furthermore, the accuracy, in terms of mis-detections, of certain face detectors can be increased by the use of the candidate selection.