The contemporary human face detection algorithm usually uses pattern matching to determine the face location and size, mainly including sub-image capturing and sub-image face detection. The sub-image capturing allows the windows of different sizes to use overlapping to divide the source image into a plurality of sub-images, as shown in FIG. 1. The table in FIG. 1 shows the number of sub-images generated for different window size when the source image is 320×240.
Because the computer cannot predict the human face location, the sampling approach by overlapping method is to capture the sub-image sequentially separated by a plurality of pixels. For example, when window size is 24×24, with 2-pixel separation, 149×109 sub-images of 24×24 size will be obtained. Because computer cannot predict the size of human face either, patterns of different scales are used to scan the entire image. For example, first the window size is set to 24×24, and the window size is gradually expanded the window size by a factor of 1.2. Therefore, the window sizes would be 24×24, 28×28, 34×34, 41×41, 49×49, . . . , 213×213, and so on. Take FIG. 1 as an exemplar, the sequential capturing and the patterns of different scales will generate a large amount of data for comparison.
The face verification of sub-image is usually performed through capturing a plurality of features in the sub-image and using a classifier to analyze, for example, Haar-like based features in combination with cascade multiple-classifier algorithm is commonly used. The computation of Haar-like features is to obtain the difference between the sums of pixels in neighboring rectangles at special location within the detection window. FIG. 2 shows four common Haar-like feature types. Adaboost (adaptive Boosting) algorithm is an algorithm combining a plurality of weak classifiers to achieve a strong classifier.
The strategy of cascading of a plurality of classifiers is to design a plurality of Adaboost classifiers of different scales, and then cascade together from lower scale (using small number of Haar-like features) to high scale (using large number of Haar-like features). For example, as shown in FIG. 3A and FIG. 3B. Through capturing a plurality of features of different scales of sub-images (as shown in FIG. 3A), and using n cascaded Adaboost classifiers (as shown in FIG. 3B). When a sub-image is verified as a human face, the sub-image must have passed verification by all the Adaboost classifiers. When classifier of any scale determines that the sub-image is non-face, no more verification through Adaboost classifiers of subsequent higher scales is required.
The current human face detection speedup schemes are mostly based on skin color. By using skin filter to divide the inputted image into skin and non-skin pixels, these algorithms then performs face detection in skin mask. For example, a U.S. patent disclosed a technique for face detection through the detection of eye location. This technology uses a plurality of scales of eye modules to search sequentially the entire image, and then uses Haar-like based Adaboost classifiers to find the possible locations of eyes. Through template matching, this technology first verifies the detected eye location candidates and then uses classifiers to analyze the detection result to find the final face region. Another speedup algorithm based on skin color for face detection discloses a technique to label the non-skin region and then uses sequential search of different scales to verify all the sub-images. When a sub-image is found to have a low score, the neighboring sub-images are omitted and all the results are merged to obtain the final result for face detection.
Other skin-based face detection speedup schemes use wavelet transform to obtain features after labeling non-skin regions, and then use different scales of T-shaped template to sequentially search and verify all the sub-images. The results of all sub-images are merged to obtain the final result for face detection. Or, after labeling non-skin regions, these algorithms estimate the skin possible set for a single frame and assume that the maximum set is human face region and then use the skin color information obtained from a single frame for face tracking on the video image streams.
Basically, the skin-based face detection speedup schemes are time-consuming since they use different size templates to match the entire image sequentially. In general, the skin-based face detection speedup schemes are not suitable for grayscale images. How to solve the existing problem and provide a technique applicable to both color or gray-scale image to reduce unnecessary matching in the schemes remains an important issue.