Face detection is a problem which arises in many video applications and especially in image indexing systems and in video-telephony systems in order to improve the quality of transmission.
Moreover, the detection of faces is a preliminary to their automatic recognition. The recognition of faces having progressed greatly over recent years, detection is becoming ever more expected.                In a multimedia context, a technique coupling detection and recognition of faces would allow thorough enquiries regarding bases of personal photos by making it possible to find all the photos in which such and such a person appears for example.        Such a technique would also be very useful to cinematographic or television companies in order to automatically archive videos on the basis of subtle criteria.        
More simply, an algorithm for detecting faces makes it possible to class personal photos into several categories; no face, with 1 face, with a few faces, etc.
Two types of approach exist for the problem of the detection of faces: model based approaches and appearance based approaches.
The model approaches seek to define in a simple manner the object sought, in terms of silhouette, colour, variation of light, texture, etc. The drawback of these methods is that it is difficult to define an object in terms of rules. If the rules adopted are too strict, objects that are only slightly outside the norm are not detected. Conversely, if the rules are too vague, the system detects many unwanted objects.
The appearance approaches rely however on nonparametric decision methods (or classifiers) such as neural networks. The rules defining the object to be detected are not clearly decreed, but they are learnt over a learning set. Measurements are made on the images, the descriptors (average colour, colours of the pixels, wavelet transform, etc.). The classifier then weights the descriptors of the learning set so as to define what is the object to be detected. The rules are then statistical averages over the descriptors.
The principle of such solutions is to partition the input image into small pieces and to submit each of these pieces to a classifier which decides whether such and such a piece is a face. The problem is to decide the size that the said pieces should have. In an image of identity photo type or in a group photo, the faces do not have absolutely the same size. It is then necessary to do a multi-resolution partitioning on every input image, that is to say each quarter of the image, then each eighth, etc. will be submitted to the classifier. This is what makes such systems very greedy in terms of calculation time.