As disclosed in Patent Documents 1 to 10 mentioned below, various similarity retrieval methods have been presently proposed. Such similarity retrieval is often performed on multi-dimensional data or high-dimensional data such as feature value data of an image. For example, Patent Documents 1, 3, and 6 propose a similar image retrieval method. Patent Document 2 proposes a method of retrieving similar data by using a database in which a link for tracing from one data to other data is set between pieces of data. Patent Document 5 proposes a method of hierarchically classifying any image set. Patent Document 7 proposes a method of retrieving a feature vector similar to a query feature vector from a set of high-dimensional feature vectors. Patent Document 9 proposes a method of classifying learning patterns into buckets corresponding to hash values by using a hash function, and searching for a learning pattern which is most similar to an input pattern, among learning patterns belonging to the bucket corresponding to the hash value of the input pattern. Patent Document 10 proposes a data matching method of extracting desired data by designating a condition with respect to multi-dimensional data in which a plurality of feature values can be expressed by vectors. Note that, the wording “high-dimensional” and the wording “multi-dimensional” will be used below without being specially distinguished from each other.
In such similarity retrieval, a degree of similarity between pieces of target data is calculated using a similarity degree function and the like. For example, feature value data of an image is expressed by multi-dimensional numerical value vectors, and a degree of similarity between pieces of feature value data to be compared with each other is calculated by a similarity degree function. Patent Document 4 proposes a method of retrieving a similar feature value by calculating a degree of similarity between feature values with respect to all feature values within a database, storing pieces of ID information of high-order f(x) pieces with their order of a degree of similarity in descending order of the degree of similarity, and retrieving the stored contents.
In addition, an index is constructed with respect to target data, and the similarity retrieval is performed using this index, thereby achieving the speed-up of retrieval. An R-tree has been known as a method of generating an index of multi-dimensional data (see Non-Patent Document 1). In addition, Patent Document 8 proposes a method of dividing a feature vector space into a plurality of approximate regions, thereby generating an indexing tree in which the approximate regions are hierarchized in accordance with the denseness and sparseness of the approximate regions. Note that, Non-Patent Documents 2, 3, and 4 will be described later.