1. Field of the Invention
The present invention relates to an apparatus, a method, and a program for detecting an object from an image.
2. Description of the Related Art
A technique for detecting a specific object pattern from an image is extremely useful and can be used for detection of a human face, for example. This technique can be used in many fields such as teleconferences, man-machine interfaces, security, monitoring systems for tracking a human face, and image compression. Various methods to realize this technique for detecting a face in an image are mentioned in “Detecting Faces in Images: A Survey” written by M. H. Yang, D. J. Kriegman, and N. Ahuja, published in Institute of Electrical and Electronics Engineers (IEEE) Transactions on Pattern Analysis and Machine Intelligence (Trans. on PAMI), Volume 24, Number 1, Pages 34 to 58, issued in January, 2002.
Especially, a method based on the Boosting algorithm proposed by Viola et al., which is discussed in “Robust Real-time Object Detection” written by P. Viola and M. Jones, published in Proceedings of IEEE Workshop Statistical and Computational Theories of Vision (SCTV), issued in July, 2001, is widely utilized in the research of the face detection due to its excellent execution speed and high accuracy of detection rate. The method based on the Boosting algorithm proposed by Viola et al. can speed up the face detection by connecting substantially identical small classification units (weak classifiers) in series and stopping detection processing when it is determined that the target region is not a face region in the middle of the detection processing to omit subsequent detection processing.
The size of a window image, which is a partial image referred to by the weak classifiers, is determined relative to a detection target having a certain presumed size. However, the size of a detection target in an actual image may continuously vary (for example, the size of the detection target continuously varies when a digital camera captures images of the object while changing a distance therebetween). Therefore, it is desirable to design an actual classifier (make a classifier learn) so as to be capable of accepting a change in the size of a detection target to some degree. However, for designing a classifier so as to be able to accept a change in the size, it is extremely difficult to realize a classifier that can detect a target even when, for example, a detection window contains only a part of the detection target because the detection target is extremely large. Similarly, it is also extremely difficult to realize a classifier that can detect a target when the detection target is extremely small relative to a detection window.
To solve this problem, there are two possible approaches. One approach is to increase the window size (also enlarge, for example, the reference area of the weak classifiers accordingly). The other approach is to, while fixing the window size, change (especially, reduce) the magnification ratio of an input image and set equally-sized windows to the several kinds of generated images with their magnification ratios changed (reduced) to detect a target by the classifier. In either method, it is difficult to continuously enlarge the window or reduce the image, so that it is desirable that the classifier can accept a change in the size to some degree after all. Further, similarly, it is also desirable that the classifier can accept changes in the position and the shape to some degree.
However, the classifier capable of accepting changes in the size and the position tends to provide a plurality of detection determinations around actually one detection target. This is because the position of the target may horizontally or vertically move in the image or the size may be different as described above (due to the change in the window size or the reduction of the image size). Therefore, integration processing, which integrates the plurality of detection results to output one detection result for one detection target, is necessary. The integration processing is a kind of clustering processing for clustering detection results distributed in a space in which the axes represent the horizontal and vertical positions and the size. Alternatively, the integration processing may be performed by using a space additionally having, for example, an axis representing reliability (likelihood) of a detection result. Since the integration processing is necessary, it is considered that each of the above-described detection results by the classifiers is an intermediate result, and it is desirable to configure a system which can store these intermediate results once and then perform the integration processing thereon.
As described above, it is necessary to once store the detection results of the classifiers, which are intermediate results, but an issue here is to estimate how large storage area to be prepared therefor. However, how many detection results are actually output from an image that is a detection target cannot be determined before actual execution of the detection processing. A theoretical upper limit value is the number (N) of windows settable in that image, but in many cases, the number of detection results from actual detection applied to an image is approximately 1 to 2% of N. Therefore, it is inefficient to prepare a large storage area capable of storing detection results corresponding to the theoretical upper limit value.
The size of the storage area may be not an important matter when the detection processing is realized by, for example, software running on a personal computer (PC), since a relatively high-capacity memory can be prepared in this case. However, in a case where detection processing is realized on software or hardware embedded in a device, it is desirable to realize the detection processing with use of a predetermined (fixed) small storage capacity. However, performing the detection processing using the predetermined small storage capacity may result in a shortage of the storage area exceptionally depending on the content of an image. In this case, data overflowing the storage capacity, among all detection results, should be discarded, but this leads to a problem of determining which and how detection result should be discarded.