1. Field of the Invention
The present invention relates to an optimal high-speed multi-resolution retrieval method on a large capacity database, and more particularly to a technique for inducing an inequality capable of allowing an accurate and rapid retrieval of desired information from a database, and implementing an optimal high-speed information retrieval using the induced inequality.
2. Description of the Related Art
In order to search for the best match to a query based on a similarity measure, an exhaustive search should be performed literally for all data contained in a database. However, straightforward exhaustive search algorithms require a large quantity of calculation. Thus, a variety of high-speed search algorithms have recently been proposed.
Berman and Shapiro have proposed introduction of a triangular inequality so as to remove candidates having no possibility to be determined as the best match(es), from a retrieval procedure. For a reduction of additional calculation quantity, they have also proposed to simultaneously use diverse distance measures and representative data called “key data”. However, this method involves a considerable variation in retrieval speed depending on “key data”, and exhibits an insufficient speed performance in association with large capacity databases.
Recently, Berman and Shapiro has also proposed the application of a data structure called a “Triangle Trie” to achieve an improvement in performance. In this method, however, there is still a problem in that the retrieval speed is considerably influenced by the tree depth and threshold value of “key data”.
Meanwhile, Krishnamachari and Mottaleb have proposed a cluster-based indexing algorithm in which diverse data contained in a database are partitioned into clusters in such a fashion that each cluster contains data having similar features, in accordance with an architectural clustering scheme.
In accordance with the cluster-based indexing algorithm, it is possible to remarkably reduce the quantity of calculation because query data is not compared with all data contained in a database, but compared with a part of the data in a retrieval procedure in accordance with the clustering scheme.
In particular, the cluster-based indexing algorithm is suitable for large capacity databases in that the number of comparisons to obtain a desired retrieval accuracy is not linearly proportional to the capacity of the database.
FIG. 1 is a schematic diagram illustrating problems involved in conventional cluster-based search algorithms.
Referring to FIG. 1, the second cluster is selected as a candidate because its center C2 is nearest to the query Q. In accordance with the illustrated search algorithm, an element X2 in the second cluster is selected as the best match, based on the distance of each element belonging to the second cluster from the query Q. However, the actual best match is the element X8 of the fist cluster.
The reason why such a problem occurs is that the center of the cluster, to which the actual best match belongs, is not always nearest to the query Q. To this end, a method for simultaneously searching for several near clusters has been proposed. However, this method cannot ensure an optimal retrieval inherently.
Also, the conventional cluster-based search algorithms, which cannot ensure an optimal retrieval, have a drawback in that they cannot provide a retrieval speed sufficiently rapid to obtain a satisfactory retrieval accuracy.