Conventionally, there are known techniques with which pieces of data similar or relevant to inputted query data to the extent that the similarity or relevance thereof satisfy a predetermined condition are searched from multiple pieces of data registered in a database. As one example of such techniques, a technique called neighborhood search is known with which: the levels of similarity or relevance between pieces of data are represented as distances in a multidimensional feature quantity vector space; and pieces of data having distances from query data that do not exceed a threshold, or a predetermined number of pieces of data that are the closest to the query data are selected.
FIG. 8 is a diagram for illustrating a conventional neighborhood search. For example, an information processing apparatus that executes neighborhood search stores, as indicated with white circles in FIG. 8, the feature quantity vectors of pieces of data to be searched. Then, upon acquiring query data indicated by (A) in FIG. 8, the information processing apparatus calculates the distance between the query data and each of the feature quantity vectors, and sets, as neighboring data of the query data, pieces of data that fall within a predetermined range in terms of distance from the query data as indicated by (B) in FIG. 8.
Here, in a case where a database has a large number of pieces of data registered therein, calculation of the distances between all of the pieces of data and the query data leads to an increase in calculation cost involved in neighborhood search. Therefore, there are other known techniques each of which aims at reduction in calculation cost involved in neighborhood search in such a manner as to narrow down data to be searched by creating indices for the feature quantity vector space or by using indices that utilize distances from particular feature quantity vectors. However, these techniques are not capable of reducing the calculation cost in a case where each feature quantity vector has a large number of dimensions.
Hence, there are other known techniques with which, in order to reduce a calculation cost for neighborhood search, search processing is speeded up by relaxing the level of strictness on a search result and acquiring a set of similar data approximate to query data. For example, matching search between binary strings and calculation of hamming distances enable processing to be performed faster than calculation of distances between vectors. Hence, there are other known techniques with which reduction in calculation cost is enabled in such a manner that: feature quantity vectors are converted into binary strings with distance-based relations between the feature quantity vectors being maintained; and matching search or calculation of hamming distances with respect to a binary string obtained through conversion of query data is performed.
One example of methods for converting such feature quantity vectors into binary strings is described below. FIG. 9 is a diagram for illustrating search processing using binarization. Note that a method for converting feature quantity vectors, indicated by white circles in FIG. 9, into 2-digit binary strings is described by the example illustrated in FIG. 9.
For example, an information processing apparatus stores feature quantity vectors indicated with white circles in FIG. 9. Here, the information processing apparatus applies a projective function thereto, thereby setting the first digits of binary strings to “1” for feature quantity vectors included in the area above a dotted line in FIG. 9, and to “0” for feature quantity vectors included in the area below the dotted line. Further, the information processing apparatus sets the second digits of binary strings to “1” for feature quantity vectors included in the area to the right of a solid line in FIG. 9, and to “0” for feature quantity vectors included in the area to the left of the solid line.
As a result, each of the feature quantity vectors is converted into any one of “01,” “11,” “00” and “10.” Then, indicated by (C) in FIG. 9, in a case where a binary string obtained by converting query data is “11,” the information processing apparatus sets, as neighboring data of the query data, feature quantity vectors converted into binary strings having hamming distances of “0,” that is, those converted into binary strings “11.”    Patent Document 1: Japanese Patent No. 2815045    Patent Document 2: Japanese Laid-open Patent Publication No. 2006-277407    Patent Document 3: Japanese Laid-open Patent Publication No. 2007-249339    Non-patent Document 1: Datar, M., Immorlica, N. Indyk, P., and Mirrokni, V. S., “Locality-sensitive Hashing Scheme Based on P-stable Distributions,” Proceedings of the Twentieth Annual Symposium on Computational Geometry (SCG '04), pp. 253-262, 2004    Non-patent Document 2: Charikar, M., “Similarity estimation techniques from rounding algorithms,” Proceedings of the 34th Symposium on Theory of Computing (STOC '02), pp. 380-388, 2002    Non-patent Document 3: Weiss, Y., Torralba, A., and Fergus, R., “Spectral Hashing,” Advances in Neural Information Processing Systems 21 (NIPS '08), 2008    Non-patent Document 4: Kulis, B. and Darrell, T., “Learning to Hash with Binary Reconstructive Embeddings,” Advances in Neural Information Processing Systems 22 (NIPS '09), 2009    Non-patent Document 5: Norouzi, M. and Fleet, D., “Minimal Loss Hashing for Compact Binary Codes,” Proceedings of the 28th International Conference on Machine Learning (ICML '11), 2011    Non-patent Document 6: Lowe, D. G., “Distinctive Image Features from Scale-invariant Keypoints,” Internal Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110, 2004    Non-patent Document 7: Bay, H., Ess, A., Tuytelaars, T., and Gool, L. V., “SURF: Speeded Up Robust Features,” Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346-359 (2008)    Non-patent Document 8: Kira, K. and Rendell, L. A., “A Practical Approach to Feature Selection,” Proceedings of the 9th International Workshop on Machine Learning, pp. 249-256, 1992    Non-patent Document 9: Gilad-Bachrachy, R., Navotz, A. and Tishby, N., “Margin Based Feature Selection—Theory and Algorithms,” Proceedings of the 21st International Conference on Machine Learning (ICML '04), pp. 43-50, 2004
Here, in order for maintaining the accuracy of search in converting the feature quantity vectors into binary strings, it is important to convert feature quantity vectors into binary strings while maintaining distance-based relations among the original feature quantity vectors. However, the above described technique for converting feature quantity vectors into binary strings uses threshold processing to convert feature quantity vectors, which are continuous values, into non-continuous binary strings. Therefore, the technique involves a problem that it is difficult to optimize conversion functions that, while maintaining distance-based relations among the feature quantity vectors, convert feature quantity vectors into binary strings.