Face detection is a computer technology that determines the locations and sizes of human faces in digital images.
The Viola-Jones face detection framework is the first face detection framework to provide competitive face detection rates in real time proposed in 2001 by Paul Viola and Michael Jones (Paul Viola and Michael J. Jones, Robust Real-Time Face Detection, 2001). In their paper Viola and Jones reported face detection proceeding at 15 frames per second on a conventional desktop of 2001. The Viola-Jones framework is considered to be state-of-the-art in face detection.
For the sake of performance, Viola and Jones based their framework on Haar features. A Haar feature is defined by 2, 3 or 4 adjacent regions of two types—so called “black” and “white” regions. The value of a Haar feature of a particular size at a particular location within an image is calculated by subtracting the sum of intensities of pixels belonging to black regions from the sum of intensities of pixels belonging to white regions. Viola and Jones suggested a data structure called “integral image” that makes it possible to calculate a Haar feature of any size in constant time. Because a Haar feature is a weak classifier, a large number of Haar features are needed to distinguish between faces and not-faces with accuracy. In the Viola-Jones face detection framework, classifiers are arranged in a cascade in which each successive classifier is trained only on samples that pass through the preceding classifiers. If a classifier rejects the sub-window under inspection, no further processing is performed and the next sub-window is searched.
The Viola-Jones face detection framework has a relatively high false detection rate. A false detection is a region that the face detector has determined contains a face but which actually does not contain a face.
A Local Binary Pattern (LBP) feature is a type of feature typically used for texture classification and facial recognition in computer vision. Local Binary Patterns were described for the first time by T. Ojala, M. Pietikainen, and D. Harwood in 1996 (T. Ojala, M. Pietikainen, and D. Harwood (1996), “A Comparative Study of Texture Measures with Classification Based on Feature Distributions”, Pattern Recognition, vol. 29, pp. 51-59). The LBP feature vector can be created by dividing the examined window into cells. For each pixel in a cell, the pixel can be compared to each of its 8 neighbors (on its left-top, left-middle, left-bottom, right-top, etc.), following the pixels along a clockwise or counter-clockwise circle. If the center pixel's value is greater than the neighboring pixel's value, a value of 1 is assigned to that pixel. If the center pixel's value is not greater than the neighboring pixel's value, a value of 0 is assigned to that pixel, resulting in an 8-digit binary number. The histogram over the cell is computed of the frequency of each combination of which pixels are smaller and which are greater than the center pixel. The histogram can be normalized and normalized histograms of all the cells can be concatenated resulting in the feature vector for the window. The feature vector can be processed using the Support Vector Machine (SVM) or some other machine-learning algorithm, to produce a classifier.
The term SVM (Support Vector Machine) refers to a set of related learning methods that analyze data and recognize patterns. An SVM can be used for classification and regression analysis. The standard SVM is a non-probabilistic binary linear classifier. For each input it receives, the SVM can predict which of two possible classes the input is a member. An SVM training algorithm can receive a set of training examples, each member of the set marked as belonging to one of two categories. The SVM training algorithm can build a model that predicts into which category a new example input falls. An SVM model can represent members as points in space, mapped so that the examples of the separate categories are divided by a gap. The wider the gap the more reliable are the results of the categorization. When a new member is received, it can be mapped into the same space and predicted to belong to a category based on which side of the gap it falls on. An SVM can construct a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. A good separation can be achieved by a hyperplane that has a larger distance to the nearest training data points of any class (called the functional margin). In general, the larger the margin, the lower the generalization error of the classifier.