As illustrated in PTL 1 to 10 mentioned below, various similarity search techniques have now been proposed. It is often the case that such a similarity search is performed for multidimensional data or high-dimensional data as search objects, such as feature amount data about images. For example, PTL 1, 3 and 6 propose similar image search methods. PTL 2 proposes a technique that searches for similar data by using a database in which a link for tracing from one datum to another datum is set among data. PTL 5 proposes a technique that hierarchically classifies an arbitrary set of images. PTL 7 proposes a technique that searches a set of high-dimensional feature vectors for a feature vector similar to a query feature vector. PTL 9 proposes a technique that classifies various learning patterns into buckets that correspond to hash values by using a hash function, and retrieves from the learning patterns that belong to a packet corresponding to a hash value of an input pattern a learning pattern that is the most similar to the input pattern. PTL 10 proposes a data matching method that extracts desired data from multidimensional data that can express a plurality of feature amounts by vectors by specifying conditions. Note that, hereinafter, “high dimensional” and “multidimensional” will be used without any particular discrimination.
In the similarity search as described above, similarity degree between object data is calculated by using a similarity degree function or the like. For example, a feature amount datum about an image is expressed by a multidimensional numerical vector, and a similarity degree between feature amount data that are comparison objects is calculated by the similarity degree function. PTL 4 proposes a technique that, regarding all the feature amounts within a database, calculates similarity degrees of each feature amount with the other feature amounts, and stores an f(x) number of highest-level ID information pieces in descending order of similarity degree together with the place in order of similarity degree, and then performs search regarding the stored contents to search for similar feature amounts.
Moreover, an index regarding object data is constructed, and a similarity search is performed by using this index, so that search is made quicker. As an index generation technique for multidimensional data, R-tree is known (refer to NPL 1). Furthermore, PTL 8 proposes a technique that divides a feature vector space into a plurality of approximation regions and that generates an indexing tree in which the approximation regions are hierarchized according to the density and scarcity of the approximation regions. Note that NPL 2, 3 and 4 as mentioned below will be discussed later.