Computers perform processing of graphic images in a digital format (e.g., photographs, still images from videos, and so on). Often, the goal of this processing is to locate objects of interest (e.g., faces) in an image. Given enough processing time to process an image (typically using a digital processor), a computer is capable of detecting most or all well defined instances of an object in an image. One common goal for object detection is the detection of human faces, although computers can use object detection to detect various types of objects in an image. This process of detecting objects (e.g., faces) is useful for user interfaces, the scanning of image databases, in teleconferencing, electronic processing of photographs, and other suitable areas. The appearance of objects varies greatly across individuals, images, camera locations, and illuminations.
There are a number of existing methods for detecting objects (e.g., faces) in images. Most existing, prior art approaches for detecting objects (e.g., faces) in an image share a number of properties. First, for example, the conventional object detector uses a learning algorithm based on a training data set that contains many examples of face and non-face image patches (the smallest possible region that might contain a face—usually, a patch or subwindow 16×16 or 20×20 pixels). One such learning algorithm is based on conventional neural network approaches. In a learning phase based on a training data set, the learning algorithm constructs a classification function which can label patches as either face or non-face.
Finally, in a conventional approach, an object detector uses a scanning process to enumerate all possible patches (subwindows) within a given image. Each image contains many independent patches. Every unique location and scale in the image can yield an independent patch. In practice, a 320 pixel×240 pixel image can produce approximately 50,000 patches (the number of patches scales quadratically with the scale of the image). The classification function is run against all such patches to detect the possible presence of an instance of the object in the patch. When an object detector, through one or more classification functions, detects an object (e.g, a face), the object detector records the location and scale of the patch for later output (e.g., reporting to an end-user of the computer).
To detect an object in a patch, many conventional, prior-art approaches work directly with intensity values (grayscale degree of lightness or darkness) of the pixels of the patches. In one prior art approach, the object detection software uses wavelet functions, such as Haar Basis functions that evaluate boxes in a patch, to detect an object in a patch.