As computer systems are becoming more powerful, they are being used for increasingly computationally intensive tasks involving large images. One such task is “object detection.” The goal of object detection is to determine the presence and location of objects of a given type (such as faces) within one or more digital images. Typically, object detection begins by training a classifier (an object detector) to recognize the presence of the object of interest within a 2D window of a suitable aspect ratio. Then the trained detector is applied to a comprehensive set of subwindows to identify those subwindows that produce a detector response above a given acceptance threshold.
For example, the detector may be trained to determine whether a given 20×20 window of grayscale pixels represents a low resolution frontal view of a human face. To determine whether a digital image contains a face, the detector can be applied to every 20×20 scan window in an image, so that it can take into account a comprehensive set of positions, scales and orientations. Specifically, we can first check all 20×20 windows centered at each pixel, then we may scale down the image to, say, 90% of its original size and again check every 20×20 window. Note that by applying the detector to scaled down versions of the image we can look for larger (or smaller) examples of the object. Furthermore, we can also resample each scaled image at various angles and scan the result to search for the presence and location of the object at various angular orientations.
Traditional methods for object detection analyze the scan windows sequentially, which means that they fully analyze each window before proceeding to the next window. This typically involves scanning subwindows in each row, left-to right, and each column, top-to-bottom. Furthermore, it can involve performing object detection for each scale, small-to-large, and each orientation. Consequently, object detection is very computationally intensive because it involves analyzing subwindows of any discrete position, scale and orientation. For example, a 1024×768 image analyzed at six scales and six orientations has well over 10 million subwindows.
Furthermore, traditional object detection systems are not suitable for a variety of contexts in which object detection is needed. For example, certain applications of face detection do not require finding all the faces in the image. For example, for a “show images containing faces” search feature it is sufficient to find a single face in an image to determine whether to show the image. Moreover, some applications can only devote fixed amount of time per image, such as video surveillance systems that provide real-time image analysis. These applications try to do the best they can while keeping up with the frame rate. Other applications can take more time, but need the best intermediate results, such as a computer-assisted person tagging system, in which the user can start correcting the tag assignments before the system has analyzed all images in full. Hence, in some cases comprehensive detection may take more time than the system can allow, and in other cases it is better for the system to spend more time in the hope of finding more instances of the object.
Unfortunately, the speed/detection rate tradeoff is hard-coded in traditional systems and cannot be changed dynamically. While it is possible to interrupt a traditional detection process and ask for all found instances of the object, the number of instances (or detection probability) is linear on the time spent. In other words, all things being equal, interrupting a traditional system after half of the time needed to fully analyze an image will on average detect 50% of the object occurrences.
Hence, what is needed is a method and an apparatus for detecting an object within an image without the above-described problems.