Face detection is an important research direction in the field of computer vision because of its wide potential applications, such as video surveillance, human computer interaction, face recognition, security authentication, and face image database management etc. Face detection is to determine whether there are any faces within a given image, and return the location and extent of each face in the image if one or more faces are present.
Today, high definition (HD) cameras are an affordable commodity and are being widely used in all types of applications, video surveillance, for instance. Video analytics in the form of face detection has to match the high resolution output from the cameras and thus the performance of these algorithms is extremely critical for overall performance of analytics.
Face detection algorithms are usually employed in smart phones, bio-metric devices to detect the face and later recognize them. All smart phones today are equipped with a feature wherein it can unlock the phone by matching the faces. This application requires a fast face detection algorithm at its core. The exemplary output of a face detection engine is shown in FIG. 1.
A face detection framework which is essentially an Adaptive Boosting (AdaBoost) based cascaded classifier subsystem and has produced excellent accuracy with real-time performance. AdaBoost is a machine learning meta-algorithm which may be used in conjunction with many other types of learning algorithms to improve their performance. This performance though is directly proportional to the resolution of the image/video frame.
The general overall process of face detection algorithm is shown in FIG. 2 and the modules of any face detection algorithm includes but not limited to:
Feature representation module: Any face detection system uses some sort of feature representation which can identify facial features and correlate them in way such that overall output can be judged as a face or a non-face. Examples of feature representations are, Local Binary Patterns (LBP) and Modified Census Transform (MCT). These are alternative representations (in place of pixel intensity) which usually have better invariance to illumination, slight changes in pose/expressions.
Classifier module: Classifier provides a way to correlate multiple features. Examples are Cascaded Adaboost Classifier and Support Vector Machines (SVM).
Search space generator module: Given an image/video frame, a face can be present at any “location” and at any “scale”. Thus the face detection logic has to search (using a sliding window approach) for the possibility of the face “at all locations” and “at all the scales”. This usually results in scanning of hundreds of thousands of windows even in a low resolution image.
Also there are various algorithms like bounding box based algorithms that tries to identify the bounding box within which there is a possibility of a face to be detected. Thus the face detection classifier now has to search only within this bounding box and thus improves the speed of detection dramatically. The estimated bounding box and the face box as shown in FIG. 3.
However, it may be understood that it is not necessary to always find a face within the estimated bounding box. Secondly the estimated bounding box might not be centered on the face.
The sliding window approach is the most common technique to generate search space used for objects detection. A classifier is evaluated at every location, and an object is detected when the classifier response is above a preset threshold. Cascades speed up the detection by rejecting the background quickly and spending more time on object like regions. Although cascades were introduced, scanning with fine grid spacing is still computationally expensive.
To increase the scanning speed one approach is to train a classifier with perturbed training data to handle small shifts in the object location. But this significantly increases the number of weak classifiers required in the overall model since the training data will be noisy (unaligned/perturbed).
Another simple approach is to increase the grid spacing (decreases the number of windows being evaluated). Unfortunately, as the grid spacing is increased the number of detection decreases rapidly.
As shown in FIG. 4, in the graph (bottom line), we can see that as the grid spacing increases there is an exponential drop in the accuracy of the regular full face classifier.
A technique to reduce the number of miss detections while increasing the grid spacing when using the sliding window approach for object detection also exists.
The disclosed technique trains a classifier (Cpatch) using decision tree and this Cpatch classifier is evaluated on a regular grid, while the main classifier (Cobject) is placed on location predicted by Cpatch. The left hand side (LHS) of FIG. 5 shows a sample face with different patch locations shown in different dashed rectangles. A patch is of size WP×hP and all the patches are given as an input to the decision tree, where wp is the width of the path and hp is the height of the patch. The leaf nodes of the decision tree corresponds to patches that have been identified. The right hand side (RHS) of FIG. 5 shows patches identified on leaf nodes and the corresponding offsets for the full face.
The core idea of this technique is to use a decision tree based approach using very light-weight and simple features such as pixel intensity value and then use this Cpatch classifier as a pre-processing step. The actual Cobject classifier works only on the output from the Cpatch classifier. Thus if the Cpatch classifier is able to remove bulk of the windows then the Cobject classifier has relatively less work to be done resulting in improved performance. The face bounding box for faster face detection technique is shown in FIG. 5.
There are other approaches which are based on skin color segmentation to speed up the face detection algorithms. These techniques try to check the portion of image where the skin color is found and then try to apply face detection only on that pockets/sub windows.
However, the technique that is discussed above results in loss of accuracy. As shown in FIG. 4 the lines shows the data of an available techniques for face detection using the sliding window approach for object detection. It improves the accuracy but still is lower than the desirable. For e.g. at 6×6 grid spacing the accuracy is shown to be about 80 percent (%) which is down by almost 15-18% from peak. Even though all the disclosed techniques and the available techniques for face detection are used for accurate face detection, they still have a massive drawback of an amount of time that is spent in the detection process and reducing the processing time with higher accuracy rate. Further, the existing image processing or face detection algorithms requires high end processing, and accordingly requires a high end processing advanced hardware which involves higher cost. Furthermore, as the image processing or face detection algorithm requires high end processing, the usage of central processing unit (CPU) for this purpose is also increased in the process.
In view of the drawbacks and limitation discussed above, there exists a need to provide an efficient technique for face detection with higher accuracy of detection, less processing time and the technique must work on low-cost hardware and must have low CPU usage.