This invention relates to data processing systems and methods for performing face detection in a series of frames.
Whilst humans might take for granted the ability to readily identify faces in a picture, it can be a difficult problem for computers to solve. A significant effort has been made over recent years to improve the accuracy and reliability of automatic face detection. However, even the most powerful systems still fall far short of the performance of the human brain, with the performance of portable devices having low power processors often being unable to detect faces in images under a wide range of lighting conditions and face orientations.
The drive to improve automatic face detection on digital devices stems from the fact that being able to reliably detect faces in images and videos is enormously useful. For example, knowing the location of faces in an image allows manual or automatic tagging of images with the names of the people to whom the faces belong. And since the human brain is particularly sensitive to faces, knowing the location of faces in a video stream allows a video encoder to improve the perceived quality of an encoded video stream by preferentially encoding the areas of the video frames containing faces at a higher quality. Furthermore, if face detection can be performed in real-time, the location of faces in a scene can be used by the autofocus systems of a camera to help ensure that those areas of the image are in focus.
Many modern methods for performing automatic face detection are based on the Viola-Jones object detection framework which breaks down face detection in a digital image into a series of processing steps, each of which is fast to perform at a digital processor. The Viola-Jones framework operates by applying binary classifiers to subwindows of an image, each subwindow being at a different location, scale or angle of rotation within the image so as to allow faces at different locations, or of different sizes and angles of rotation to be detected. Each binary classifier performed on a subwindow of an image is made up of a cascaded set of strong classifiers of increasing complexity that are operated on the subwindow so as to detect whether the subwindow is likely to bound a face in the image. Only if all of the strong classifiers pass a subwindow is that subwindow passed as (potentially subject to further processing) representing a match for the binary classifier. If any of the strong classifiers reject the subwindow then no further processing is performed on that subwindow and processing moves onto the next subwindow. Further details of face detection performed according to the Viola-Jones framework can be found in the paper by P. Viola and M. Jones: “Robust real-time face detection”, International Journal of Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
The classifier operations performed according to the Viola-Jones object detection framework can be performed quickly at a digital processor and even in portable devices allow a basic level of real-time face detection to be performed. However, because the potential search space for an image is very large it is difficult to reliably detect all of the faces of an image in real-time using the Viola-Jones framework. The search space of an image may include subwindows having every possible combination of location, size and rotation in the image. In order to permit real-time detection of faces, the search space is typically narrowed significantly by ignoring the possible rotations of subwindows in the image, or only looking at a narrow range of rotated subwindows. This means that only those faces that are at least substantially upright in the image are likely to be found.
There is therefore a need for improved apparatus for performing face detection that allows accurate and reliable real-time face detection at a portable device and permits the identification of faces that are not presented upright in the image.