(1) Field of the Invention
The present invention relates to a rule discovery program, a rule discovery process, and a rule discovery apparatus for discovering a relationship between a plurality of information items, and in particular a relationship between multimedia data items and text data items respectively associated with the multimedia data items.
(2) Description of the Related Art
Currently, situation analysis based on multimedia data is being used in various fields. For example, it is possible to determine whether an image of a component (part) used in manufacture is satisfactory or unsatisfactory by taking and analyzing an image of the component. Various types of estimation of situation (e.g., diagnosis of diseases) based on multimedia data such as the above image are currently being used in a wide range of fields including analysis of static or moving images, analysis of volume data, analysis of time-series information, fluid analysis, performance analysis of mechanical components, diagnosis of medical images, clarification of brain functions, market analysis, and the like.
In the case where situation analysis is performed based on multimedia data, the efficiency in analysis of the multimedia data can be improved when a relationship between a property of an object represented by the multimedia data and information represented by the multimedia data is indicated by a rule (scientific rule). For example, in the case where a photographic image of a component exists, and a region of the photographic image to which attention should be paid is known when determination whether the component is satisfactory or unsatisfactory is made, the determination can be easily made based on the photographic image.
Therefore, a device (e.g., an image mining device) which supports discovery of knowledge about the relationship between the multimedia data and the text data representing the property of the object is necessary. In this case, it is necessary to determine a portion of the multimedia data having a strong correlation with the text data (which indicates, for example, whether or not the component is satisfactory or unsatisfactory).
At this time it is possible to determine a feature portion of the multimedia data to be the above portion having a strong correlation with the text data. A number of methods are known for extracting a predetermined feature (e.g., an image feature in the case where the multimedia data represents an image) from multimedia data. Since there are myriad of image features including relatively general features such as colors and features specific to individual fields such as shapes of some portions of images, it is difficult to designate in advance an appropriate image feature (which has a strong correlation with the text data).
In consideration of the above circumstances, a method is proposed for supporting an operation of extracting an image feature based on an operator's visual observation in processing for discovering knowledge (rule) about a relationship between an image feature and a text feature from a plurality of pairs of image data items and character data items (text data items) associated with the image data items. At this time, it is possible to set an association rule indicating a relationship between images and texts (i.e., a rule indicating the strength of association between events), and display an evaluation result of the association rule (as disclosed in, for example, Japanese Unexamined Patent Publication No. 2003-67401).
However, according to the technique disclosed in Japanese Unexamined Patent Publication No. 2003-67401, the features of images are humanly determined, and therefore the following problems occur.
(a) The labor cost increases.
(b) The discovered rule can depend on a personal point of view.
(c) Rules which are difficult to discover can be overlooked.
In order to solve the above problems, Japanese Patent Application No. 2003-433233, filed by the assignee of the present patent application discloses a technique. According to this technique, wavelet transformation is performed on an image, and coefficients based on which a feature of text data can be determined are extracted from among the coefficients generated by the wavelet transformation.
Nevertheless, in the above technique disclosed in Japanese Patent Application No. 2003-433233, only a relationship between each coefficient and the text data is obtained by analysis. Therefore, even when there is a strong correlation among the text data and portions of data located in a plurality of discrete positions in the image, it is impossible to extract a rule which indicates the correlation. For example, in some cases, even when there is no correlation either between the text data and a portion of data in a region A or between the text data and a portion of data in a region B, a strong correlation can exist between the text data and a sum of the portions of data in the regions A and B.
In addition, generally, each multimedia data item is constituted by a plurality of data elements (for example, the brightness of each pixel of an image). Therefore, if each data element is treated as an individual variable, the number of possible combinations of the variables becomes extremely great. However, it is difficult to accurately obtain a relationship among text data and the extremely great number of possible combinations of the variables on a real-time basis by making a thorough investigation of the extremely great number of possible combinations of the variables.