1. Field of the Invention
The present invention relates to a retrieval system, a retrieval method and a computer readable recording medium that records a retrieval program, particularly relates to a retrieval system that executes pattern retrieval, a retrieval method of executing pattern retrieval and a computer readable recording medium that records a retrieval program that instructs a computer to execute pattern retrieval.
2. Description of the Related Art
Technology for pattern retrieval and structure retrieval means technology for retrieving the pattern of a character, voice and an image, compound molecular structure, RNA secondary structure and others using a computer, and a demand for higher-speed and higher-precision retrieval is arising as diversified and complicated computerized society has developed in recent years.
The retrieval of the most similar pattern or structure may be demanded in addition to the retrieval of a pattern or structure completely coincident.
For example, in the field of information chemistry, the issue of structure and activity that the property may be similar if the structure of compounds is similar has been researched from long ago and in such a case, there is a strong demand for efficiently classifying millions of compound molecular structures and efficiently retrieving similar structure.
In the field of pattern retrieval, a round robin retrieval method is known that distance or similarity between patterns is defined using their features, and a retrieval pattern is estimated by comparing an input-unknown pattern (hereinafter called a retrieval pattern) with all retrieved patterns (hereinafter called a learned pattern) using the above distance or similarity.
However, in the above round robin retrieval method, as it requires so much labor to calculate distance or similarity between patterns, there is a serious defect that it requires considerable time to compare with a large number of learned patterns.
Therefore, heretofore, a rough classification retrieval method has been widely used that distance between learned patterns is calculated beforehand, learned patterns are classified into some clusters and retrieval is made.
For example, in Japanese Published Unexamined Patent Application No. Hei 6-251156, the nearest cluster is acquired by classifying learned patterns into some clusters using distance between the patterns and comparing a retrieval pattern with the representative of each cluster, while converting the feature of the retrieval pattern.
A retrieval pattern is estimated by comparing all the learned patterns that belong to the acquired cluster with the retrieval pattern.
In the meantime, in the field of information chemistry, if the similar molecular structure or the partially similar molecular structure to designed molecular structure is retrieved when a new compound is synthesized, it comes into question how distance or similarity between compound molecular structures should be defined, how a compound molecular structure should be represented, how a compound should be classified and how a similar structure should be retrieved.
In Japanese Published Unexamined Patent Application No. Hei 7-28844, distance between structures is calculated by representing the solid structure of a substance by a point set and overlapping two solid structures. A similar structure is retrieved by narrowing down proposed structures depending upon geometric relationships and reducing a retrieved range.
However, in the above conventional type rough classification retrieval method, there is a problem that retrieval precision is bad though retrieval speed is fast.
FIGS. 18A and 18B show the problem of the rough classification retrieval method in relation to precision. FIG. 18A shows a case that clusters are overlapped and FIG. 18B shows a case that clusters are not overlapped.
Suppose that in retrieval shown in FIG. 18A, a cluster A is first acquired as a cluster distance between an input pattern q and the representative of which is the minimum (that is, D1 less than D2 as shown in FIG. 18A).
Distance between each of plural learned patterns in the cluster A and the input pattern q is compared. Then, a learned pattern a located at the minimum distance d1 is acquired as a similar pattern.
However, while distance between the input pattern q and the cluster representative is larger in a cluster B than in a cluster A, distance d2, which is the distance from a learned pattern b, is smaller than distance d1. That is, the learned pattern b is actually a pattern the most similar to the input pattern q.
Also in the case of FIG. 18B, distance D1 between a retrieval pattern q and the representative of the cluster A is smaller than distance D2 between the retrieval pattern q and the representative of the cluster B, however, a pattern the most similar to the retrieval pattern q is not the learned pattern a but is actually the learned pattern b in the cluster B (d2 less than d1).
Therefore, there is a problem that retrieval precision in the rough classification retrieval method strongly depends upon the definition and a calculation method of distance between learned patterns, a method of representing a learned pattern itself and a method of classifying learned patterns into any cluster, and the secure retrieval of the most similar pattern is not guaranteed.
Also, in the above related art, the enhancement of precision is tried by converting the feature of an input pattern and utilizing the characteristics of the solid structure of a substance, however, in any case, since retrieval precision depends upon the distance, a representation method and a classification method of a pattern, the reliability of retrieval precision has not been sufficiently high.
The present invention is made in view of these points and provides a retrieval system that executes high-speed and high-precision retrieval without depending upon the distance, a representation method and a classification method.
The present invention also provides a retrieval method for enabling high-speed and high-precision retrieval without depending upon the distance, a representation method and a classification method.
Further, the present invention also provides a computer readable recording medium that records a retrieval program for executing high-speed and high-precision retrieval without depending upon the distance, a representation method and a classification method.
In order to solve the above problems, the retrieval system has a retrieval dictionary generation unit that classifies learned patterns into plural clusters, and generates a retrieval dictionary using the clusters, and a nearest cluster detector that, based on clusters in a space between a couple of spheres having radiuses smaller and larger, respectively, than a distance from a central cluster that locates near the center of a multidimensional space to an input retrieval pattern, detects a cluster nearest to the retrieval pattern from among the clusters in the multidimensional space utilizing the retrieval dictionary. The system also has a learned pattern detector that compares each of learned patterns belonging to the nearest cluster with the retrieval pattern and detects a learned pattern at a predetermined distance from the retrieval pattern, a retrieval range decision unit that decides a retrieval range using the learned pattern detected by the learned pattern detector and the retrieval dictionary, and a retrieval unit that retrieves the retrieval pattern among the learned patterns in the retrieval range.