A template matching search method has been used since long ago. In the template matching search method, in a case where an object of a particular type shown (recorded) in an image is detected, and the position of the object is to be determined, a template image showing only the object is prepared. Then, a search window is set in the image in question which is considered to include a partial image representing the object, and a matching calculation with the template image is repeated. In this method, a low level of accuracy is obtained from the matching simply using a pixel value of the template image and a pixel value in the image in question. Therefore, in order to enhance the matching accuracy, a method has been developed. In this method, gradient information with surrounding pixels, secondary differential information, and the like are calculated in the template image and the image in question, and they are converted into a numerical value string called a feature quantity that can be easily used for matching. Further, a method has been developed to perform matching calculation by using a classification dictionary learned by using a statistical pattern classification technique. This classification dictionary is a dictionary for performing two class classification, i.e., a positive class which is an object class and a negative class which is a non-object class. More specifically, it is a memory storing a parameter group required for classification.
A statistical pattern classification technique often used for object detection includes a learning method called Boosting. In the Boosting, relatively simple identifiers called weak classifiers are combined to generate an identifier having a high level of classification accuracy called a strong classifier. Non-patent literature 1 discloses a method in which the Boosting is effectively used for object detection. In this method, the weak classifier employs Decision-Stump processing. In this processing, a Rectangle Filter is used to simply perform threshold value processing on a single Haar wavelet-like feature quantity on which high speed extraction processing can be performed. Boosting called AdaBoost is employed for generation of a strong classifier.
Non-patent literature 2 discloses a Boosting method called Real AdaBoost obtained by improving AdaBoost. In this method, instead of simple Decision-Stump processing, weak classification is performed on the basis of an occurrence probability of a numerical value representing a feature quantity of a positive class and an occurrence probability of a numerical value representing a feature quantity of a negative class. Therefore, as long as there is a large amount of learning data, highly accurate object detection can be realized.
Non-patent literature 3 discloses a Boosting method called Gentle AdaBoost. In this method, a definition of an index (loss) optimized in the Boosting is made into positive, and therefore, the learning is stabilized. Accordingly, the same level of accuracy as Real AdaBoost or a level of accuracy higher than that can be obtained. When a method obtained by combining non-patent literature 1, non-patent literature 2, and non-patent literature 3 is used, a particular object such as a face and a vehicle can be detected with a high degree of accuracy from an image. However, only an object of which change in the appearance of the object is not large can be obtained. For example, only a front face and the like can be detected.
In order to detect an object of which change in the appearance is large from the image in question, the following method is often used. In the method, a classification dictionary is prepared for each type of change in appearance, and detection processing is performed as many as the number of dictionaries. However, this method requires detection processing to be performed as many as the number of classification dictionaries, and therefore, there is a problem in that the processing time increases. To solve this problem, non-patent literature 4 discloses a method for classifying a change in the appearance and recognizing them as separate categories, and performing learning so that a feature quantity and a weak classifier can be shared between multiple classes. In this method, the weak classifier can be shared, and therefore, the processing time would not simply increase a number of times as many as the number of classes. Therefore, according to this method, relatively fast processing can be realized. Classes for object detection as described above has a hierarchical multi-class configuration in which positive classes are multiple sub-classes and the negative is a single class. In the hierarchical multi-class configuration, it is necessary to identify positive classes from each other under the condition that an classification between the entire positive classes and the negative class is treated with the highest degree of importance. In non-patent literature 4, learning is performed so as to be able to classify each positive class and the negative class, output the score of each positive class, and classify the hierarchical class as a result.