Human face detection continues to be a challenging problem in the field of computer/machine vision, due in part to the number of variations that can be caused by differing facial appearances, facial expressions, skin colors, lighting, etc.
Such variations result in a face data distribution that is highly nonlinear and complex in any space which is linear to the original image space. Moreover, for example, in the applications of real life surveillance and biometric processing, the camera limitations and pose variations make the distribution of human faces in feature space more dispersed and complicated than that of frontal faces. Consequently, this further complicates the problem of robust face detection.
Frontal face detection has been studied for decades. As a result, there are many frontal face detection algorithms. By way of example, some conventional systems employ classifiers that are built based on a difference feature vector that is computed between a local image pattern and a distribution-based model. Some systems use detection techniques based on an over-complete wavelet representation of an object class. Here, for example, a dimensionality reduction can be performed to select the most important basis function, and then trained a Support Vector Machine (SVM) employed to generate a final prediction.
Some conventional systems utilize a network of linear units. The SNoW learning architecture, for example, is specifically tailored for learning in the presence of a very large number of binary features. In certain systems, fast frontal face detection has been shown possible by using a cascade of boosting classifiers that is built on an over-complete set of Haar-like features that integrates the feature selection and classifier design in the same framework.
Most conventional non-frontal face detectors tend to use a view-based method, in which several face models are built, each describing faces in a given range of view. This is typically done to avoid explicit three-dimensional (3D) modeling. In one conventional system, the views of a face are partitioned into five channels, and a multi-view detector is developed by training separate detector networks for each view. There have also been studies of trajectories of faces in linear PCA feature spaces as they rotate, and SVMs have been used for multi-view face detection and pose estimation.
Other conventional systems have used multi-resolution information in different levels of a wavelet transform, wherein an array of two face detectors are implemented in a view-based framework. Here, for example, each detector can be constructed using statistics of products of histograms computed from examples of the respective view. Until now, this type system appears to have achieved the best detection accuracy; however, it is often very slow due to computation complexity.
To address the problem of slow detection speed, it has been proposed that a coarse-to-fine, simple-to-complex pyramid structure can be used to essentially combine the ideas of a boosting cascade and view-based methods. Although, this approach improves the detection speed, it still has several problems. For example, as the system computation cost is determined by the complexity and false alarm rates of classifiers in the earlier stage. As each boosting classifier works separately, the useful information between adjacent layers is discarded, which hampers the convergence of the training procedure. Furthermore, during the training process, more and more non-face samples collected by bootstrap procedures are introduced into the training set, which tends to increase the complexity of the classification. Indeed, it has been found that in certain systems the last stage pattern distribution between face and non-face can become so complicated that the patterns may not even be distinguished by Haar-like features.
Additionally, view-based methods tend to suffer from the problems of high computation complexity and low detection precision.
Thus, there is a need for improved methods, apparatuses and systems for use in face detection.