The location of a human face in an image with a complex background, which is generally referred to as face detection, is an important initial step in many processes, including face recognition. After a face has been located, computer vision may be used to analyse that face, for example to interpret facial-expressions, which may be used in various application areas. Such applications include the gathering of population and age-statistics of patrons at entertainment/amusement parks, as well as television network viewer-rating studies. Computer vision with this capability can further have application in such fields as automated security/surveillance systems, demographic studies, safety monitoring systems, human interfaces to computers, and automated photography.
A first category of face detection includes those approaches where a face is located by identification of facial features, such as the mouth and eyes. Once these features are identified, the overall location of the face may be determined using facial geometric information.
A second category of face detection includes those approaches where the face is examined as a whole, generally using model-based vision techniques. Typically, the head of a person can be modelled as an ellipse with a ratio of 1.2 to 1.4 between the two principal axes of such an ellipse. Typically, the fitting of the elliptic shape is performed after a skin colour detection step, thereby increasing the importance of segments in the image having the colour of skin. However, this technique is often not sufficient to detect faces because, on many instances, the neck and the face will be detected as a single segment, with the detected segment not having an elliptical shape. Further, the image may include objects that are elliptical in shape and which have the colour of human skin, without such objects being a human face.
Eigenfaces and neural networks have also been used for face detection. The image is typically divided into multiple, possibly overlapping, sub-images. At each sub image, attempts are made to classify the sub-image as being either “a face” or “not a face”. This is done by attempting to match the sub-image with one of 4150 normalised canonical “face” patterns, which are used to synthesise six “face” pattern prototypes in a multi-dimensional image vector space. Such techniques have to deal with the difficult theoretical problem of how to define a non-face. Moreover, such techniques are not scale invariant, and do not cope with different viewpoints or orientation of included faces. These problems can make locating faces in an image an exhaustive process.