1. Field of the Invention
The present invention relates to an information processing apparatus which executes classification of information by using reference information, an information processing method, and a storage medium.
2. Description of the Related Art
Conventionally, there has been proposed a technique of classifying and identifying information by using an ensemble of classification trees. This technique generates L (L is a constant equal to or more than two) classification trees, and obtains higher identification performance by using all of them.
Mustafa Ozuysal, Pascal Fua, and Vincent Lepetit, “Fast keypoint recognition in ten lines of code,” cvpr, pp. 1-8, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007 (hereinafter referred to as “Mustafa”) discloses a technique capable of identifying a plurality of types of images by applying a technique using an ensemble of classification trees to computer vision. The technique disclosed by Mustafa randomly generates N pieces of two-point location information each representing a pair of two reference locations for reference to two portions on an image. These two reference points will be referred to as a reference point pair hereinafter, and a set of N reference point pairs will be referred to as a reference point pair string hereinafter. This technique compares the magnitudes of image luminance values at the respective locations of the points of a reference point pair string with respect to a reference image and expressing a string of comparison results by a bit string of 0 and 1, thereby generating an N-bit binary code from one image and one reference point pair string. The technique generates N-bit binary codes in correspondence with a plurality of reference images as identification references, as described above, and records the relationship between the binary codes and the types of reference images. This operation is equivalent to the learning of one classification tree. The technique executes learning based on the above processing for L reference point pair strings respectively having different reference point pairs. That is, the technique learns the L classification trees.
When identifying an image, this technique uses all the L classification trees for the input image. That is, the technique calculates an N-bit binary code from the input image in accordance with the locations of the N reference point pairs determined at the time of learning of each classification tree. The technique executes this operation for the L reference point pair strings to obtain L binary codes. The technique then determines the type of reference image with the highest likelihood as the final identification result by using the obtained L binary codes and the L classification trees obtained by pre-learning.
According to Mustafa, upon generating many variations of reference images and obtaining a given binary code concerning an input image, this technique learns in advance the probability indicating to which type of reference image the input image corresponds. The technique then obtains the probability corresponding to each type of reference image concerning each of L binary codes acquired from the input image, and sets, as the final identification result, the type of reference image exhibiting the maximum product of probabilities obtained from L classification trees.
The method disclosed by Mustafa uses, as a weak classifier, a classification tree for classification based on a simple feature amount, that is, a comparison between pixel values at two reference points, and obtains the final identification result by an ensemble of many weak classifiers. This method performs image identification by converting an input image into a binary code by comparing pixel values at reference point pairs of the image and referring to a dictionary table based on the binary code in processing at the time of identification. This eliminates the necessity to entirely scan a tree structure at the time of identification as in the case of a classical classification tree, and hence can execute identification processing at higher speed than processing based on a classical classification tree. In addition, the literature has reported that the identification accuracy of this method is sufficiently high.
It is assumed that an image to be identified is not identical to a reference image used for learning, and is an image obtained by adding noise and deformation to the reference image. An identifier is expected to be able to identify an input image as an image kin to one of such reference images even if the input image differs from the reference images. In consideration of this, when using the method disclosed by Mustafa, the larger the luminance differences on an image at the locations of reference point pairs, the better, for the following reason. Assume that a given classification tree is applied to a given image. In this case, as the luminance value difference at reference point pair locations set on a classification tree decreases, the result of magnitude comparison between luminance values tends to be reversed due to noise. This increases the probability of a larger error in the image identification result obtained by using this classification tree.
However, the locations of two points where the luminance difference is large vary depending on each reference image, proper reference point locations generally vary depending on each reference image. On the other hand, according to the prior art, the locations of reference point pairs are set for each classification tree, and the locations of the reference point pair are commonly used to classify all images. It is therefore impossible to perform learning while changing reference point pair locations for each reference image.
In order to avoid such an inconvenience, it is conceivable to select reference point pairs that exhibit luminance value differences that are as large as possible with respect to all reference images. It is sufficiently possible that proper reference point pair locations differ among images. When there are many types of images to be identified, in particular, it is highly possible that there are no reference point pairs which exhibit luminance value differences equal to or larger than a predetermined value with respect to all images.
It is therefore difficult to set proper reference point pair locations common to all images at the time of learning. As a consequence, even if reference point pairs set to obtain a given classification tree are at locations suitable for the classification of several types of images, the locations are not suitable for the classification of other several types of images. In addition, according to the prior art, when a given type of image is input, using a classification tree based on reference point pairs unsuitable for the classification of the type of image will degrade the identification performance.
In consideration of the above problems, the present invention provides an information processing apparatus which selects a reference location pattern suitable for the classification of input information from a plurality of reference location patterns, an information processing method, and a program.