1. Field of the Invention
The present invention relates to a method and an apparatus for detecting different objects in a digital image. The present invention also relates to a program therefore.
2. Description of the Related Art
Various kinds of methods have been proposed for detecting a predetermined object such as a face in a digital image such as a general photograph by using a computer or the like. As a method of detection for such an object is known a method by template matching that has been used from comparatively early days. In addition is known a method using learning by so-called boosting that recently attracts attention (see U.S. Patent Application Publication No. 20020102024).
In a method using learning by boosting, a detector that can judge whether an image represents a predetermined object is prepared by causing the detector to learn characteristics of the predetermined object based on a plurality of sample images representing the predetermined object and a plurality of sample images that do not represent the predetermined object. Partial images are sequentially cut from a detection target image in which the predetermined object is to be detected, and the detector judges whether each of the partial images is an image representing the predetermined object. In this manner, the predetermined object is detected in the detection target image.
The detector comprises a plurality of weak classifiers that judge whether the image represents the predetermined object based on characteristic quantities of the image. The weak classifiers are selected from a plurality of weak classifiers based on the learning. Each of the weak classifiers has a specific algorithm for calculating the characteristic quantities. Bases for judgment criterion therefore are a first histogram W1(x) representing a relationship between values of the calculated characteristic quantities and frequency values thereof generated from sample images representing the detection target object, and a second histogram W2(x) representing the same relationship generated from sample images representing objects other than the detection target object, as shown in FIG. 9. The judgment criterion is a histogram represented by h(x)=(W1(x)−W2(x))/(W1(x)+W2(x)), as shown in FIG. 10. More specifically, when the characteristic quantities are calculated for an unknown input image, a probability of the input image being the detection target object is known based on whether the frequency value corresponding to the characteristic quantities is positive or negative in the histogram h (x), and based on magnitude of the absolute value thereof. For example, in the case where the frequency value is a positive value, the probability becomes higher as the magnitude of the absolute value thereof becomes larger. On the contrary, a probability of the input image being an image representing an object other than the detection target object becomes higher as the magnitude of the absolute value thereof becomes larger in the case where the frequency value is negative. Each of the weak classifiers calculates a score representing the probability of the input image being the detection target object, based on the histogram. By evaluating the scores calculated by the weak classifiers, whether the input image is an image of the detection target object can be judged.
A method of this type is effective for solving a 2-class problem such as detection of face by judging whether an image represents a face or a non-face object. Especially, the method using learning by boosting can achieve fast and high-performance detection, and is used widely in various fields in addition to a technique similar thereto.
However, for detecting a plurality of objects in an image by using the above-described method of learning by boosting, images need to be classified into 3 or more types, and the same number of detectors as the number of the types are necessary. For example, in the case where a face in an arbitrary direction is to be detected in an image, the directions of face need to be categorized and face detection needs to be carried out for each of the face directions. In the case where an occluded face and a face photographed in underexposure are also to be detected, face detection needs to be carried out therefore. Consequently, an increase in the number of detectors is expected, which leads to more time-consuming learning and detection. Therefore, detection is not carried out efficiently. Furthermore, a problem seems to occur on difference in judgment criteria between the detectors.