There is conventionally known a technology that accelerates retrieval processing by relaxing strictness when feature amount vectors indicative of a feature of data, such as a fingerprint, an image, a voice, are used to retrieve similar data. As an example of such a technology, there is known a method that reduces calculation cost by converting the feature amount vectors into binary strings while holding a distance relationship between the feature amount vectors and calculating a hamming distance between the binary strings.
Further, there is known LSH (Locality-Sensitive Hashing) technology as the method that converts the feature amount vectors into the binary strings while holding the distance relationship between the feature amount vectors. For example, an information processing device sets a plurality of hyperplanes that divide a feature amount vector space and converts the feature amount vectors into the binary strings each indicative of whether an inner product between a normal vector and feature amount vector in each hyperplane is positive or negative. That is, information processing device uses the hyperplane to divide the feature amount vector space into a plurality of areas and converts the feature amount vectors into the binary strings indicative of to which one of the areas the feature amount vector belongs.
When a label indicative of similarity, such an ID for identifying an individual who makes data registration, is added to each data, it is desirable to set the hyperplane that classifies the data for each label in order to facilitate classification of data to be newly registered. To this end, there are available multiple technologies that uses a predetermined method to select a data pair for learning from the feature amount vectors added with the label and use the selected learning data pair to learn the hyperplane that divides the data for each label.
For example, the information processing device randomly selects, from among feature amount vectors to be classified, two feature amount vectors (hereinafter, referred to as “positive example pair”) added with the same label and two feature amount vectors (hereinafter, referred to as “negative example pair”) added with different labels. Then, the information processing device repetitively optimizes the hyperplane so as to reduce a hamming distance between the positive example pair and to increase a hamming distance between the negative example pair to thereby learn the hyperplane that classifies the data for each label.
In another method, the information processing device randomly selects one reference vector. Then, the information processing device defines, as the positive example pair, a feature amount vector that is most similar to the reference vector among feature amount vectors added with the same label as the label of the reference vector and the reference vector. Further, the information processing device defines, as the negative example pair, a feature amount vector that is most similar to the reference vector among feature amount vectors added with the different label from the label of the reference vector and the reference vector. Then, the information processing device repetitively optimizes the hyperplane so as to reduce the hamming distance between the positive example pair and to increase the hamming distance between the negative example pair.
Non Patent Document 1: M. Datar, N. Immorlica, P. Indyk, V. S. Mirrokni: Locality-Sensitive Hashing Scheme Based on p-Stable Distributions, Proceedings of the twentieth annual symposium on Computational geometry (SCG 2004)
Non Patent Document 2: M. Norouzi and D. Fleet: Minimal Loss hashing for compact binary codes, Proceedings of the 28th International Conference on Machine Learning (ICML '11)
Non Patent Document 3: Ran Gilad-Bachrachy, Amir Navotz Naftali Tishbyy: Margin Based Feature Selection—Theory and Algorithms (ICML 2004)
However, the above technologies for learning the above-mentioned hyperplane select the learning data pair by using a prescribed method irrespective of a statistical property that a data set has, so that accuracy with which the hyperplane classifies the data may degrade.
That is, a data set to be classified has different statistical properties in accordance with the number of data, distribution of data, the number of labels added, and the like. Thus, different methods are used to select an adequate learning data pair depending on the statistical property of the data set to be classified. However, the technology that selects the learning data pair by using a prescribed method irrespective of the statistical property of the data set may select an inadequate data pair. When the inadequate data pair is used to learn the hyperplane, accuracy with which the hyperplane classifies the data may degrade.