The Viola-Jones method is a known method for object detection in a digital image. It is one of the methods capable of detecting objects effectively in real time in an image by means of a sliding detection window.
It is generally used for detecting faces and persons, but can also be used for detecting other objects such as road vehicles or aircraft.
The Viola-Jones method is based on a supervised learning method. It therefore requires several hundred to several million examples of the object to be detected, in order to train a classifier. When its learning is complete, this classifier is used to detect any presence of the object in an image by scanning the image exhaustively in all positions and at all possible sizes and scales.
Being a supervised learning method, this method is divided into two stages, namely a classifier training stage based on a large number of positive examples, that is to say objects of interest such as faces, and negative examples, such as images indicated as not representing faces, followed by a phase of detection by application of this classifier to unknown images.
The Viola-Jones method is an approach based on appearance, and consists in scanning the whole image by means of a “scan window”, while calculating a certain number of characteristics in overlapping rectangular areas, or detection windows. It has the distinctive feature of using very simple, but very numerous characteristics.
The characteristics are a synthetic and informative representation calculated on the basis of the values of the pixels. The Haar-like characteristics which are used by the Viola-Jones method are calculated by arithmetical operations on the values of sums of pixels of one or more rectangular areas.
In order to calculate these characteristics on an image quickly and efficiently, the Viola-Jones method uses integral images.
An integral image, or “summed area table” in English, is a representation in the form of a digital image, of the same size as the original image, which contains, at each of its points, the sum of the pixels x and y located above and to the left of this point. More formally, the integral image ii is defined on the basis of the image i by:ii(x,y)=Σx′≤x,y′≤yi(x′,y′)
Because of this representation in the form of a correspondence table, the sum of the values in a rectangular area can be calculated in only four accesses to the integral image, and even in only six accesses for two contiguous rectangular areas, and therefore in a constant time regardless of the size of the area.
In the detection phase, the whole image is scanned by moving the detection window through a certain interval in the horizontal and/or vertical direction.
The Viola-Jones method is implemented by using a pyramid for each image to be analysed. In image processing, the pyramid is a multiresolution representation of an image. It can be used to model the image at different resolutions, from the initial image to a very coarse image. The image pyramid enables the detection method using the Viola-Jones method to work from the details to the “coarse” level, so that objects of different sizes, at a number of distances, can be detected.
A drawback of the Viola-Jones method concerns the bandwidth required in the interconnecting bus which couples a memory storing the representations, at the different resolutions of an image after the multiresolution representation of the image has been carried out, to the processor, making it possible, notably, to calculate the integral images for each resolution.