1. Field of the Invention
The invention relates to image analysis, and more particularly, to using a Gaussian Mixture Model (GMM) image analysis technique to perform object detection.
2. Description of the Related Art
Face detection is used in a variety of technical fields including, for example, security, military, multimedia database management, and entertainment. Extensive efforts have been made during the last couple of decades in these areas to improve face detection techniques. Face detection often must be performed before face recognition can be attempted. Therefore, the face detection process may be viewed as part of the face recognition process.
Current face detection techniques can generally be classified into three categories, namely, template-based methods, knowledge-based methods and appearance-based methods. In template-based methods, several templates are stored that describe either the entire face pattern or separate local facial features. The correlation between an input image and the stored templates are then evaluated in order to perform face detection. Knowledge-based methods utilize a set of rules to capture the relationships between facial features. Face detection is then performed by examining the input image areas based on the pre-defined rules of the rule set. Appearance-based methods use face models that are first learned using a number of exemplary face images. Theses learned models are then used to perform face detection.
To date, appearance-based methods have achieved the best detection accuracy. The learning methods used in appearance-based methods include, for example, neural networks (NNs), support vector machines (SVMs), linear discriminate analysis (LDA), Hidden Markov Model (HMM). Face detection based on NNs is one of the best existing methods currently available. However, existing NN systems generally only work well for frontal face poses, and do not effectively handle faces with multiple poses.
All of the methods described above only detect faces for a specific pose. Dedicated pose estimation is normally needed for the candidate region before the face detection can be performed. Face detection using the HMM technique has been proposed for the detection of side-view face poses. With this technique, two HMMs are trained to discriminate face profiles from non-face profiles.
Compared to many other pattern recognition methods, the power of the HMM technique lies in its ability to tolerate pattern variations as long as the same transition patterns exist between the same variations. However, the HMM technique has a disadvantage in that the state distribution and the transition probabilities are defined independently. Although the HMM technique provides limited support for controlling the state transition globally, it does not describe a complex pattern very well when the states and/or their transitions are controlled by certain common global settings and are thus not independent.
Another known technique for face detection uses a set of classifiers/recognizers to detect salient facial features, and then integrates the feature detection results based on their spatial configurations, such as location, scale and pose. This technique has the advantage of handling large variations that result from different facial appearances, poses, and illumination conditions. However, the salient facial features need to be labeled in order to train the facial feature recognizers. This introduces some drawbacks. First of all, only clearly defined features such as nose, mouth, and eyes are suitable for manual labeling. Other features such as hair and face areas are difficult to define in manual labeling in terms of boundaries and center positions. Secondly, even for salient features, manual labeling does not provide optimized segmentation. It is difficult to determine, for example, if eye areas should also include eyebrows. In addition, manual labeling may include errors in determining the boundary or center of a salient feature. Thirdly, like other existing face detection systems, a different pattern needs to be learned for each pose, and knowledge learned from different poses generally cannot be shared.
Accordingly, a need exists for a technique for performing face detection that overcomes the aforementioned shortcomings of the aforementioned approaches, such as being limited with respect to pose, the requirement to manually label local features, etc., and thus is capable of robustly detecting complex patterns without being limited to certain poses and without requiring manual labeling of local features.