1. Field of the Invention
The present invention relates to a recognition apparatus and a recognition method and, in particular, to a technique suitably used for detecting an object.
2. Description of the Related Art
A technique for automatically detecting a specific object pattern from an image has been applied to various fields such as image search, object detection, object recognition, and object tracking fields. In the above fields, P. Viola and M. Jones, “Robust Real-time Object Detection,” SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTATIONAL THEORIES OF VISION, Jul. 13, 2001 discusses a machine learning method referred to as AdaBoost has been proposed. AdaBoost is a machine learning method based on the concept that a discriminator strong in discrimination performance (hereinafter referred to as strong discriminator) is produced by combining a plurality of discriminators weak in discrimination performance (hereinafter referred to as weak discriminator).
The AdaBoost cannot reflect easiness of determination of an input pattern in an output value because each weak discriminator outputs a binary of 0 or 1. On the other hand, Schapire, R. E. and Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions, Machine Learning, pp. 297-336 (1999) discusses that Real AdaBoost improves determination performance by each weak discriminator taking discrete continuous values. The output of the strong discriminator in the Real AdaBoost is represented by the following equation (1).
                              H          ⁡                      (            x            )                          =                  sign          ⁡                      (                                          ∑                                  t                  -                  1                                T                            ⁢                                                          ⁢                                                h                  t                                ⁡                                  (                  x                  )                                                      )                                              (        1        )            
Where, H(x) is the output of the strong discriminator with respect to an input image x, sign is a function whose value is determined according to a sign, and ht(x) is an output of the t-th weak discriminator. The ht(x) is expressed by the following equation (2) and calculated based on a ratio between a probability density distribution W+j of a correct answer image and a probability density distribution W−j of a non-correct answer image.
                                          h            t                    ⁡                      (            x            )                          =                              1            2                    ⁢          ln          ⁢                                                    W                +                j                            +              ɛ                                                      W                -                j                            +              ɛ                                                          (        2        )            
Where, ε is a coefficient that prevents a denominator from becoming zero and a very small value. The correct answer image is a learning image of a discrimination target. The non-correct answer image is a learning image other than the discrimination target. The probability density distribution W+j and W−j are represented by the following equations (3) and (4).
                              W          +          j                =                              ∑                                                            i                  ⁢                                      :                                    ⁢                  j                                ∈                                  J                  ⋀                                      y                    i                                                              =                              +                1                                      n                    ⁢                                          ⁢                                    D              t                        ⁡                          (              i              )                                                          (        3        )                                          W          -          j                =                              ∑                                                            i                  ⁢                                      :                                    ⁢                  j                                ∈                                  J                  ⋀                                      y                    i                                                              =                              -                1                                      n                    ⁢                                          ⁢                                    D              t                        ⁡                          (              i              )                                                          (        4        )            
Where, n is the total of learning images and j is the value of feature quantity output when the weak discriminator m is applied to the learning image. J is the aggregation of output j of the weak discriminator m applied to the learning image and i denotes an image ID. More specifically, the probability density distributions W−j and W−j j are distribution in which the numbers of the correct answer images and the non-correct answer images outputting the feature quantity j are added in consideration of weight Dt(i) to each image. The weight Dt(i) has a relationship represented by the following equation (5).Dt+1(i)=Dt(i)exp[−yiht(xi)]  (5)
When the image i is the correct answer image (yi=1), the greater the ht(x) with a positive value, the smaller the equation (5), and the greater the ht(x) with a negative value, the greater the equation (5). On the other hand, when the image i is the non-correct answer image (yi=−1), the greater the ht(x) with a negative value, the smaller the equation (5), and the greater the ht(x) with a positive value, the greater the equation (5). This means that a weight with respect to a difficult image_from which a correct output cannot be output becomes large in value and weight to an easy image from which a correct output can be output becomes small in value in the learning image. In other words, a discrimination performance is improved by focusing on learning images which are difficult to discriminate, and outputting the output of the weak discriminator with continuous values based on the probability density distribution.
However, in a conventional method, even if the number of learning images corresponding to a feature quantity is extremely fewer than the total of the learning images, the weight becomes greater if the image is difficult to discriminate, so that the output of the weak discriminator becomes high or low in value. The learning image satisfying the above conditions is significantly different in feature from other correct answer images and often has a feature in which the learning image is difficult to distinguish from a non-correct answer image. For this reason, in the conventional method, a value high in accuracy is output irrespective of the feature difficult to discriminate, i.e., the feature most probably outputting incorrect result.