Conventionally, a technology for speeding up retrieval processing is known which is achieved by relaxing strictness using feature vectors representing features of data such as finger print, image, sound, or the like to retrieve similar data. As one example of such a technology, a method for reducing calculation cost is known by which the feature vectors are converted to binary strings, while holding a distance relationship between the feature vectors, and a Hamming distance is calculated between the binary strings.
Further, as one example of a technique for converting the feature vectors to the binary strings while maintaining the distance relationship between the feature vectors, a technology of locality-sensitive hashing (LSH) is known. For example, an information processing apparatus sets a plurality of hyperplanes for dividing a feature vector space, and converts feature vectors to binary strings each representing whether an inner product of a normal vector and a feature vector of a hyperplane is positive or negative. That is, the information processing apparatus uses the hyperplane to divide the feature vector space into a plurality of areas, and converts a feature vector to a binary string representing that the feature vector belongs to which of the divided areas.
Here, when a label representing similarity, such as an ID for identification of an individual person registering data, is applied to each data, a hyperplane for classifying each data according to labels is preferably set to facilitate classification of new data to be registered. Therefore, a technology is known in which a data pair to which the same label is applied, and a data pair to which different labels are applied are used to learn a set of hyperplane for classifying each data according to labels.
For example, an information processing apparatus selects two feature vectors to which the same label is applied (hereinafter, described as a positive example pair), and two feature vectors to which different labels are applied (hereinafter, described as a negative example pair) from feature vectors to be classified. Then, the information processing apparatus repeatedly optimizes set of hyperplanes so that the positive example pair has a small Hamming distance, and the negative example pair has a large Hamming distance, to learn the set of hyperplane for classifying each data according to labels.    Patent Document 1: Japanese Laid-open Patent publication No. 2006-252333    Patent Document 2: Japanese Laid-open Patent publication No. 2010-061176    Patent Document 3: Japanese Laid-open Patent publication No. 2007-004458    Non Patent Document 1: M. Datar, N. Immorlica, P. Indyk, V. S. Mirrokni: Locality-Sensitive Hashing Scheme Based on p-Stable Distributions, Proceedings of the twentieth annual symposium on Computational geometry (SCG 2004)    Non Patent Document 2: M. Norouzi and D. Fleet: Minimal Loss hashing for compact binary codes, Proceedings of the 28th International Conference on Machine Learning (ICML '11) (2011)
Here, in order to improve accuracy of distance relationship upon conversion to the binary string, it is only requested to increase the number of hyperplanes to be set. However, in the technology for learning the set of hyperplanes, when the number of hyperplanes is increased, the amount of calculation is increased to optimize the set of hyperplanes, so that learning of the set of hyperplane for classifying each data according to labels is disadvantageously made difficult.
Whereas, when a plurality of hyperplanes are optimized individually, hyperplanes similarly dividing the feature vector space are apt to be learned, and the accuracy of distance relationship is deteriorated.