Colour digital images are increasingly being stored in multi-media databases, and utilised in various computer applications. En many such applications it is desirable to be able to detect the location of a face in a visual image as one step in a multi-step process. The multi-step process can include content-based image retrieval, personal identification or verification for use with automatic teller machines or security cameras, or automated interaction between humans and computational devices.
Various prior art face detection methods are known including eigenfaces, neural networks, clustering, feature identification and skin colour techniques. Each of these techniques has its strengths and weaknesses, however, one feature which they have in common is that they are computationally intensive and therefore very slow, or they are fast but not sufficiently robust to detect faces.
The eigenface or eigenvector method is particularly suitable for face recognition and there is some tolerance for lighting variation, however it does not cope with different viewpoints of faces and does not handle occlusion of various facial features (such as occurs if a person is wearing sunglasses). Also it is not scale invariant
The neural network approach utilises training based on a large number of face images and non-face images and has the advantages of being relatively simple to implement, providing some tolerance to the occlusion of facial features and some tolerance to lighting variation. It is also relatively easy to improve the detection rate by re-training the neural network using false detections. However, it is not scale invariant, does not cope with different viewpoints or orientation, and leads to an exhaustive process to locate faces on an image.
The clustering technique is somewhat similar to the eigenface approach. A pixel window (eg 20×20) is typically moved over the image and the distance between the resulting test pattern and a prototype face image and a prototype non-face image is represented by a vector. The vector captures the similarity and differences between the test pattern and the face model. A neural network can then be trained to classify as to whether the vector represents a face or a non-face. While this method is robust it does not cope with different scales, different viewpoints or orientations. It leads to an exhaustive approach to locate faces and relies upon assumed parameters.
The feature identification method is based upon searching for potential facial features or groups of facial features such as eyebrows, eyes, nose and mouth. The detection process involves identifying facial features and grouping these features into feature pairs, partial face groups, or face candidates. This process is advantageous in that it is relatively scale invariant, there is no exhaustive searching, it is able to handle the occlusion of some facial features and it is also able to handle different viewpoints and orientations. Its main disadvantages are that there are potentially many false detections and that its performance is very dependent upon the facial feature detection algorithms used.
The use of skin colour to detect human facts is described in a paper by Yang J and Waibel A (1995) “Tracking Human Faces in Real-Time” CMU-CS-95-210, School of Computer Science, Carnegie Mellon University. This proposal was based on the concept that the human visual system adapts to different levels of brightness and to different illumination sources which implies that the human perception of colour is consistent within a wide range of environmental lighting conditions. It was therefore thought possible to remove brightness from the skin colour representation while preserving accurate, but low dimensional, colour information. As a consequence, in this prior art technique, the chromatic colour space was used. Chromatic colours (eg. r and g) can be derived from the RGB values as:r=R/(R+G+B) and g=G/(R+G+B) 
These chromatic colours are known as “pure” colours in the absence of brightness.
Utilising this colour space, Yang and Waibel found the distribution of skin colour of different people, including both different persons and different races, was clustered together. This means that the skin colours of different people are very close and that the main differences are in differences of intensity.
This prior art method first of all generated a skin colour distribution model using a set of example face images from which skin colour regions were manually selected. Then the test image was converted to the chromatic colour space. Next each image in the test image (as converted) was then compared to the distribution of the skin colour model. Finally, all skin colour pixels so detected were identified, and regions of adjacent skin colour pixels could then be considered potential face candidates.
This prior art method has the advantage that processing colour is much faster than processing individual facial features, that colour is substantially orientation invariant and that it is insensitive to the occlusions of some facial features. The system is also substantially viewpoint invariant and scale invariant. However, the method suffers from a number of disadvantages including that the colour representation of the face can be influenced by different lighting conditions, and that different cameras (eg. digital or film) can produce different colour values even for the same person in the same environment.
However a significant disadvantage of the prior art methods is that the ski colour model is not very discriminating (ie. selecting pixels on a basis of whether they are included in the skin color distribution results in a lot of non-skin colour pixels being included erroneously). It is also difficult to locate clusters or regions of skin colour pixels that can be considered as candidate faces.