The present invention is related to a resemblance retrieval apparatus for retrieving stored subject data resembling subject data, with respect to a designated retrieval subject, from a saved data group, and also to a recording medium for recording such a resemblance retrieval program.
Conventional resemblance retrieval apparatus are described in, for instance, a publication "Incremental Instance-based Learning of Independent and Graded Concept Descriptions", D. Aha, Proceedings of the Sixth International Workshop on Machine Learning, 1987", and another publication "A Nearest Hyperrectangle Learning Method", S. Salzberg, Machine Learning, 6, pp. 251-276, 1991. FIG. 8 is a schematic block diagram showing an example of such a conventional resemblance retrieval apparatus.
In this drawing, reference numeral 1 is a subject image designating unit for designating an image of a retrieval subject; reference numeral 2 is a feature quantity extracting unit for extracting a feature quantity which quantitatively indicates the feature of the subject image designated by the subject image designating unit 1; reference numeral 4 shows an attribute input unit for inputting an attribute of the subject, other than the feature quantity related to the subject image; and reference numeral 5 represents a subject vector data forming unit for forming subject vector data in which both the feature quantity extracted by the feature quantity extracting unit 2 and the attribute input by the attribute input unit 4 are used as a vector structural elements. Reference numeral 7 denotes a vector database storing a plurality of vector data formed using the feature quantities and the attributes as vector elements; reference numeral 8 is an image database for storing a plurality of images corresponding to respective subjects; reference numeral 13 is a weight vector given to each of the vector elements of the vector data to calculate resemblance degree in a resemblance retrieval engine; and reference numeral 10 represents a resemblance retrieval engine for seeking vector data resembling subject vector data in the subject vector data forming unit 5 from a plurality of vector data stored in the vector database 7. Further, reference numeral 11 shows a retrieval result display unit for displaying the resemblance vector data retrieved by the resemblance retrieval engine 10, and also an image corresponding to this vector data; reference numeral 14 indicates an answer instructing unit for determining whether both the resemblance vector data designated by the resemblance retrieval engine 10 and the image are correct; reference numeral 15 shows a weight vector updating unit for updating the weight vector 13 based on the result determined by the answer instructing unit 14; and reference numeral 12 indicates a new data adding unit for newly entering vector data and images in the vector database 7 and the image database 8, respectively.
Now, operation will be explained. For instance, as to medical information, such as electronic medical diagnostic data and a medical image database, and as to design information, such as design drawings, that are stored, when data suitable for a new purpose are selected, the following resemblance retrieval technique for the vector data is applied. These data are rearranged as vector data stored in the database. Then, calculations are made to determine the resemblance degree between the vector data sought, which express a new purpose, and data saved in the database. The data in the database that most resembles the desirable vector data is found.
One example of such a purpose is aiding diagnoses of pathological tissue. In such a case, a pathological tissue image resembling a stored pathological tissue image is retrieved with respect to a pathological tissue image under examination. The purpose is to diagnose a disease by observing biological tissue. This pathological tissue diagnosis is mainly carried out to determine whether a tumor must be removed and to determine the sort of tumor.
FIG. 9 is a flow chart for describing the operation of the conventional resemblance retrieval apparatus. First, the subject image designating unit 1 designates a subject image for which a resemblance retrieval is to be performed, for, example, pathological tissue images to be examined (step ST1). Next, in the feature quantity extracting unit 2, a feature quantity for quantitatively expressing a feature of the designated subject image is extracted from the subject image (step ST2). Subsequently, in the attribute input unit 4, an attribute of the subject image designating unit 1 is input (step ST4). Examples of attributes of the subject image include patient name, patient ID, image ID, dimension of tumor, age of the patient, diagnosis title, and the like. It should be noted that since the diagnosis title is not yet determined at this stage, no diagnosis title is input. Subject vector data are produced using both the feature quantity extracted by the feature quantity extracting unit 2 and an attribute input into the attribute input unit 4 as vector elements (step ST101). FIG. 10 shows an example of subject vector data. Vector data having a high degree of resemblance to the subject vector data are retrieved from the vector database 7 by the resemblance retrieval engine 10, employing the weight vector 13 (step ST102).
In other words, assuming that the dimension (namely, the number of elements) of the vector data is selected to be "n", the subject vector data is X=(x1, x2, . . . , xn), and vector data stored in the vector database is Y=(y1, y2, . . . , yn). These data are used to calculate a degree of resemblance between the subject vector data and database vector data. The weight vector is W=(w1, w2, . . . , wn) and a resemblance degree sim(X, Y) between the vector data X and the vector data Y is calculated based on the following formula: ##EQU1##
where .delta.(xi, yi) is equal to: EQU (xi-yi)/(section width of attribute "i"), when the attribute "i" has a continuous value; EQU 0, when the attribute "i" has a discrete value, and xi=yi;
and EQU 1, when the attribute "i" has a discrete value, and xi is not equal to yi. (2)
(It should be noted that the section width of the attribute "i" is equal to the absolute value of the difference between the maximum value of the attribute "i" and the minimum value thereof.)
In other words, the resemblance degree sim(X, Y) is equal to the weighted distances X and Y, the symbols of which are inverted.
As previously described, when all of the vector data stored in the vector database 7 are set as Y, the degree of resemblance sim(X, Y) between these vector data and subject vector data is calculated. The maximum, or highest, degree of resemblance is selected to retrieve the vector data most closely resembling the subject vector data. When several vector data have the same maximum degree of resemblance, any one of these vector data may be selected. For instance, the first selected vector data may be employed or a selection from these vector data having the maximum degree of resemblance may be made at random.
In the retrieval result display unit 11, both a portion of the attributes of the retrieved resemblance vector data and an image corresponding to the resemblance vector data, among the images in the image database 8, are displayed (step ST103). FIG. 11 represents an example of a display screen in which six sets of resemblance images, including images, patient IDs, and diagnosis titles are displayed in the order of degree of resemblance. A user compares the subject image with the images displayed as a retrieval result. Then, the user determines which retrieved image truly resembles the subject image. There is a high possibility that the subject image is relevant to the diagnosis title corresponding to this resemblance image. As a result, display of the resemblance vector data and the resemblance image with respect to the subject image may give very important reference information, aiding a pathological tissue diagnosis.
Next, the user makes a decision concerning the diagnosis title with reference to the images displayed on the retrieval result display unit 11. If this diagnosis title is coincident with the diagnosis title of the resemblance vector data having the highest degree of resemblance, then the user chooses "CORRECT" in the answer instructing unit 14 because of this resemblance. Conversely, if this diagnosis title is not coincident with the resemblance vector data having the highest degree of resemblance, then the user chooses "INCORRECT" in the answer instructing unit 14 (step ST104).
Subsequently, the update vector 13 is updated in the weight vector updating unit 15 based upon the vector data selected as the most closely resembling vector by the resemble retrieval engine 10, the subject vector data, and the retrieval result designated by the user via the answer instructing unit 14 as to whether this retrieval result is "CORRECT" (step ST105).
Next, the subject vector data having the diagnosis title of the attribute of the subject image that is changed to the diagnosis title determined by the user is added to the vector database 7, and the subject image is added to the image database 8 by the new data adding unit 12 (step ST106).
Since the conventional resemblance retrieval apparatus is arranged as described, when this resemblance retrieval apparatus is utilized in pathological tissue diagnosis, there is a risk that an image with the correct diagnosis title may be dropped from the images displayed as potentially resembling images. For example, when the resemblance retrieval results shown as in FIG. 11 are displayed, resemblance images having the diagnosis titles of "tumor 1", "tumor 2", "tumor 3", and "tumor 7" are displayed. There is no problem when the subject image corresponds to one of these resemblance images with the diagnosis titles. However, if the correct diagnosis title of the subject image is the same as another diagnosis title which is not displayed, for example, resemblance images with diagnosis titles such as "tumor 4" and "tumor 5", there is a risk that the user may make a mistaken diagnosis because the user could not recognize the correct diagnosis title.
Also, there is another problem. Since only one sort of weight vector is used, even when the weight vector is optimized, it is practically difficult to greatly improve resemblance retrieval precision. For example, when this conventional resemblance retrieval apparatus is used in pathological tissue diagnosis, generally speaking, an optimum weight vector used when a resemblance image is retrieved from images with one diagnosis title, such as "tumor 1", is different from an optimum weight vector used when a resemblance image is retrieved from images of another diagnosis title, such as "tumor 2". This different weight occurs because the importance degree of the respective elements of the feature quantities are different from each other as to "tumor 1" and "tumor 2" when the degrees of resemblance are measured. The optimum weight vector used when the resemblance image is retrieved from combined images having different diagnosis titles, such as "tumor 1" and "tumor 2", will become an intermediate weight vector, between the optimum weight vector with respect to "tumor 1" and the optimum weight vector with regard to "tumor 2". As a result, for example, the resemblance retrieval precision when the resemblance image is retrieved from the images of "tumor 1" by employing the intermediate weight vector is reduced, as compared with the resemblance retrieval precision when the resemblance image is retrieved using the optimum weight vector with respect to "tumor 1". As described above, when resemblance image retrieval employing one sort of weight vector for all of the images containing all of the diagnosis titles, there is a problem in achieving the same effect as when the weight vector is optimized.
Also, when the only feature quantity extracted from the image as the feature quantity of the image is a portion of the subject vector data, it is not possible to use a feature quantity which can hardly be extracted. Furthermore, a feature of an image which is intentionally determined by the user cannot be utilized in the resemblance image retrieval operation.