1. Field of the Invention
This invention relates to image processing, with possible applications in the field of object detection.
2. Description of the Prior Art
Many object detection algorithms, such as human face detection algorithms, have been proposed in the literature, including the use of so-called eigenfaces, face template matching, deformable template matching or neural network classification. None of these is perfect, and each generally has associated advantages and disadvantages. None gives an absolutely reliable indication that an image contains a face; on the contrary, they are all based upon a probabilistic assessment, based in turn on a mathematical analysis of the image, of whether the image has at least a certain likelihood of containing a face. Depending on their application, the algorithms generally have the threshold likelihood value set quite high, to try to avoid false detections of faces.
Face detection in video material, comprising a sequence of captured images, is a little more complicated than detecting a face in a still image. In particular, it is desirable that a face detected in one image of the sequence may be linked in some way to a detected face in another image of the sequence. Are they (probably) the same face or are they (probably) two different faces which chance to be in the same sequence of images?
One way of attempting to “track” faces through a sequence in this way is to check whether two faces in adjacent images have the same or very similar image positions. However, this approach can suffer problems because of the probabilistic nature of the face detection schemes. On the one hand, if the threshold likelihood (for a face detection to be made) is set high, there may be some images in the sequence where a face is present but is not detected by the algorithm, for example because the owner of that face turns his head to the side, or his face is partially obscured, or he scratches his nose, or one of many possible reasons. On the other hand, if the threshold likelihood value is set low, the proportion of false detections will increase and it is possible for an object which is not a face to be successfully tracked through a whole sequence of images. To address these problems, a composite tracking procedure has been proposed in copending application PCT/GB2003/005168, in which tracking decisions are based on a combination of face detection, colour matching and position prediction.
The aim of the tracking process is to generate a series of linked image regions which correspond to the same tracked face. The tracked series could be very long, however. For example, if a face is being tracked on a television programme such as a news programme or a chat show, there could be several minutes of video material within which that face is tracked. A similar volume of data could arise if a face is being tracked by a closed circuit television (CCTV) system or in the context of a video conferencing system.
Accordingly, it would be desirable to be able to select a representative image or group of face images from within a tracked series.
A technique has been proposed for deriving a single “representative key stamp” image from a video clip. This derives a single image which is most like the other images in a clip. It has been proposed that similar techniques be applied to generate a representative image from a tracked series of face images.