1. Field of the Invention
The present invention relates to a retrieval system, a retrieval method and a computer readable recording medium that records a retrieval program, particularly relates to a retrieval system that executes pattern retrieval, a retrieval method of executing pattern retrieval and a computer readable recording medium that records a retrieval program that instructs a computer to execute pattern retrieval.
2. Description of the Related Art
Technology for pattern retrieval and structure retrieval means technology for retrieving the pattern of a character, voice and an image, compound molecular structure, RNA secondary structure and others using a computer, and a demand for higher-speed and higher-precision retrieval is arising as diversified and complicated computerized society has developed in recent years.
The retrieval of the most similar pattern or structure may be demanded in addition to the retrieval of a pattern or structure completely coincident.
For example, in the field of information chemistry, the issue of structure and activity that the property may be similar if the structure of compounds is similar has been researched from long ago and in such a case, there is a strong demand for efficiently classifying millions of compound molecular structures and efficiently retrieving similar structure.
In the field of pattern retrieval, a round robin retrieval method is known that distance or similarity between patterns is defined using their features, and a retrieval pattern is estimated by comparing an input unknown pattern (hereinafter called a retrieval pattern) with all retrieved patterns (hereinafter called a learned pattern) using the above distance or similarity.
However, in the above round robin retrieval method, as it requires much labor to calculate distance or similarity between patterns, there is a serious defect that it requires considerable time to compare with a large number of learned patterns.
Therefore, heretofore, a rough classification retrieval method that distance between learned patterns is calculated beforehand, learned patterns are classified into some clusters and retrieval is made has been widely used.
For example, in Japanese Published Unexamined Patent Application No. Hei 6-251156, the nearest cluster is acquired by classifying learned patterns into some clusters using distance between the patterns and comparing a retrieval pattern with the representative of each cluster, while converting the feature of the retrieval pattern.
A retrieval pattern is estimated by comparing all the learned patterns that belong to the acquired cluster with the retrieval pattern.
In the meantime, in the field of information chemistry, if the similar molecular structure or the partially similar molecular structure to designed molecular structure is retrieved when a new compound is synthesized, it comes into question how distance or similarity between compound molecular structures should be defined, how compound molecular structure should be represented, how a compound should be classified and how similar structure should be retrieved.
In Japanese Published Unexamined Patent Application No. Hei 7-28844, distance between structures is calculated by representing the solid structure of a substance by a point set and overlapping two solid structures. A similar structure is retrieved by narrowing down proposed structures depending upon geometric relationship and reducing a retrieved range.
However, in the above conventional type rough classification retrieval method, there is a problem that retrieval precision is bad though retrieval speed is fast.
FIGS. 18A and 18B show the problem of the rough classification retrieval method in relation to precision. FIG. 18A shows a case that clusters are overlapped and FIG. 18B shows a case that clusters are not overlapped.
Suppose that in retrieval shown in FIG. 18A, a cluster A is first acquired as a cluster distance between an input pattern q and the representative of which is the minimum (that is, D1 less than D2 as shown in FIG. 18A).
Distance between each of plural learned patterns in the cluster A and the input pattern q is compared. Then, a learned pattern a located at the minimum distance d1 is acquired as a similar pattern.
However, while distance between the input pattern q and the cluster representative is larger in a cluster B than in a cluster A, distance d2, which is the distance from a learned pattern b, is smaller than distance d1. That is, the learned pattern b is actually a pattern the most similar to the input pattern q.
Also similarly in the case of FIG. 18B, distance D1 between a retrieval pattern q and the representative of the cluster A is smaller than distance D2 between the retrieval pattern q and the representative of the cluster B, however, a pattern most similar to the retrieval pattern q is not the learned pattern a but is actually the learned pattern b in the cluster B (d2 less than d1).
Therefore, there is a problem that retrieval precision in the rough classification retrieval method strongly depends upon the definition and a calculation method of distance between learned patterns, a method of representing a learned pattern itself and a method of classifying learned patterns into any cluster, and the secure retrieval of the most similar pattern is not guaranteed.
Also, in the above related art, the enhancement of precision is tried by converting the feature of an input pattern and utilizing the characteristics of the solid structure of a substance, however, in any case, as retrieval precision depends upon the distance, a representation method and a classification method of a pattern, it cannot be said that the reliability of retrieval precision is sufficiently high.
The present invention is made in view of these points and the object is to provide a retrieval system that executes high-speed and high-precision retrieval without depending upon the distance, a representation method and a classification method of a pattern.
Another object of the present invention is to provide a retrieval method for enabling high-speed and high-precision retrieval without depending upon the distance, a representation method and a classification method of a pattern.
Further, another object of the present invention is to provide a computer readable recording medium that records a retrieval program for executing high-speed and high-precision retrieval without depending upon the distance, a representation method and a classification method of a pattern.
According to the present invention, in order to solve the above problems, a retrieval system is provided that executes pattern retrieval and is characterized by comprising a retrieval dictionary generation unit that classifies learned patterns into plural clusters, and generates a retrieval dictionary using the clusters, a nearest cluster detector that detects the nearest cluster to an input retrieval pattern among the above clusters stored in the above retrieval dictionary, a learned pattern detector that detects a learned pattern located at a predetermined distance from the above retrieval pattern by comparing all the learned patterns that belong to the nearest cluster with the above retrieval pattern, a retrieval range decision unit that decides a retrieval range using the above learned pattern detected by the learned pattern detector and the retrieval dictionary and a retrieval unit that retrieves the above retrieval pattern from among all the learned patterns that belong to the above retrieval range.
The above retrieval dictionary generation unit classifies learned patterns into plural clusters, and generates a retrieval dictionary using the clusters. The above nearest cluster detector detects the nearest cluster to an input retrieval pattern among the clusters stored in the retrieval dictionary. The learned pattern detector detects a learned pattern located at predetermined distance from the retrieval pattern by comparing all the learned patterns that belong to the nearest cluster with the retrieval pattern. The above retrieval range decision unit decides a retrieval range using the learned pattern detected by the learned pattern detector and retrieval information. The retrieval unit retrieves the retrieval pattern from among all the learned patterns in the retrieval range.