In recent years, with the introduction of IT (information technology) in various fields, the electronification of data on materials in the natural world, social phenomena, human behaviors, etc., has progressed. With this background, data mining techniques of detecting frequently appearing patterns from a large amount of accumulated data and effectively utilizing the detected patterns for business and scientific purposes are attracting attention.
The following documents are considered:
Non-patent Document 1
Alberts, B., Bray, D., Johnson, A., Lewis, J., Raff, M., Roberts, K., & Walter, P., Translation Supervisors: Nakamura Keiko, Fujiyama Asao and Matubara Kenichi. Essential Cell Biology. Nankodo.
Non-patent Document 2
Asai Tatuya, Abe Kenji, Kawazoe Shinji, Hiroki Arimura, and Setuo Arikawa. Efficient search for partial structure pattern for semistructured data mining. Technical Report from Data Engineering Technical Group in the Institute of Electronics, Information and Communication Engineers, Vol. 101, No. 342, 1-8.
Non-patent Document 3
Cook, D. J., & Holder, L. B. (1994). Substructure Discovery Using Minimum Description Length and Background Knowledge. Journal of Artificial Intelligence Research, Vol. 1, (pp. 231-255).
Non-patent Document 4
Dehaspe, L., Toivonen, H., & King, R. D. (1998). Finding frequent substructures in chemical compounds. Proc. of the 4th KDD, (pp. 30-36).
Non-patent Document 5
De Raedt, L., & Kramer, S. (2001). The Levelwise version Space Algorithm and its Application to Molecular Fragment Finding. Proc. of the 17th IJCAI, (pp. 853-859).
Non-patent Document 6
AIDS Antiviral Screen, http://dtp.nci.nih.gov/docs/aids/aids_data.html
Non-patent Document 7
Inokuchi, I., Washio, T., & Motoda, H. (2000). An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. Proc. of the 4th PKDD, (pp 12-23).
Non-patent Document 8
Inokuchi, A., Washio, T., Nishimura, Y., & Motoda, H. A Fast Algorithm for Mining Frequent Connected Subgraphs. IBM Research Report, RT0448 (February, 2002).
Non-patent Document 9
Inokuchi, Akihiro, Washio Takashi, Nishimura Yoshio, and Motoda Hiroshi. Method of extracting connected frequent graphs from graph-structured data. The 16th Annual Conference of the Japanese Society for Artificial Intelligence, 1 A3-03, (2002).
Non-patent Document 10
Inokuchi, Akihiro, Washio Takashi, Nishimura Yoshio, and Motoda Hiroshi. Data mining on HIV data. The 58th Special Interest Group on Knowledge Base System, (2002).
Non-patent Document 11
Kramer, S., De Raedt, L., & Helma, C. (2001). Molecular Feature Mining in HIV Data. Proc. of the 17th International Conference on Knowledge Discovery and Data Mining, (pp. 136-143).
Non-patent Document 12
Kuramochi, M., & Karypis, G. (2001) Frequent Subgraph Discovery. Procs. of the 1st ICDM.
Non-patent Document 13
Kuramochi, M., & Karypis, G. Discovering Frequent Geometric Subgraphs. Technical Report 02-024, 2002.
Non-patent Document 14
Matsuda, T., Horiuchi, T., Motoda, H., & Washio, T. (2000). Extension of Graph-Based Induction for General Graph Structured Data. Proc. of the 4th PAKDD, (pp. 420-431).
Non-patent Document 15
Matsumoto Takatoshi and Tanabe Kazutoshi. Prediction of Carcinogenicity of Chlorine-containing Organic Compound by Neural Network. JCPE Journal, Vol. 11, No. 1, 29-34 (1999)
Non-patent Document 16
Matsuzawa, H., & Fukuda, T., Mining Structured Association Patterns from Databases. Proc. of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
Non-patent Document 17
T. Miyahara, T. Uchida, T., Shoudai, T., Kuboyama, K. Takahashi and H. Ueda: Discovery of Frequent Tree Structured Patterns in Semistructured Data. Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 1-10, 2001.
Non-patent Document 18
Morimoto Yasuhiko. Algorithm for counting frequent sets from spatial database. The 2nd Data Mining Workshop, pp. 1-10.
Non-patent Document 19
Morishita, S. and Sese, J. (2000), Traversing Lattice Itemset with Statistical Metric Pruning. Proc. of POS 2000.
Non-patent Document 20
Motoda, H., & Yoshida, K. (1997). Machine Learning Techniques to Make Computers Easier to Use. Proc. of the 15th IJCAI, Vol. 2, (pp. 1622-1631).
Non-patent Document 21
Wang, X., Wang, J., Shasha, D., Shapiro, B., Dikshitulu, S., Rigoutsos, I., & Zhang, K. Automated Discovery of Active Motifs in Three Dimensional Molecules. Proc. of the 3rd International Conference on KDD. pp. 89-95. (1997)
Non-patent Document 22
Wang, X., Wang, J., Shasha, D., Shapiro, B., Rigoutsos, I., & Zhang, K. Finding Patterns in Three-dimensional Graphs: Algorithms and Applications to Scientific Data Mining. IEEE Transactions on Knowledge and Data Engineering, Vol. 14 No. 4 pp. 731-749. (2002)
Non-patent Document 23
Yoshida, K., & Motoda, H. (1995). CLIP: Concept Learning from Inference Patterns. AI, Vol. 75, No. 1 pp. 63-92
Non-patent Document 24
Zaki, M. Efficiently Mining Frequent Trees in a Forest. Proc. of the 8th International Conference on KDD.
A method of detecting a frequently appearing pattern from relations stored in a relational table or a typical log such as POS transactions has been proposed. (See non-patent document 18).
A method of detecting a frequently appearing pattern from graph- or tree-structured data as well as from a typical log has also been proposed. (See non-patent documents 4, 5, 7, 8, 9, and 12 with respect to techniques for data mining on graph-structured data, and see non-patent documents 2, 16, and 24 with respect to techniques for data mining on tree-structured data).
A data mining technique of detecting a frequently appearing pattern from tree-structured or graph-structured data can find applications in various fields, e.g., applications to pattern detection from the molecular structure of chemical materials, results of syntax analysis on a natural language, the modification structure of words in a natural language.
For additional background, see other related non-patent documents 1, 3, 6, 10, 11, 13, 14, 15, 17, 18, 19, 20, 21, 22, and 23.
The present invention solves problems related to the above. The problems to be solved include the following considerations. The conventional techniques reside in detecting a single frequently appearing pattern in a group of data satisfying a predetermined condition. For example, a finding that data including a frequently appearing pattern can easily satisfy a predetermined condition has been obtained thereby. In some cases, however, a more suitable finding is required depending on the kind of data to be processed, etc.
For example, in the field of chemistry, novel chemical materials are synthesized one after another to be used as chemicals for people's living and health. On the other hand, side effects of such chemical materials are a consideration. Therefore, there is a need to evaluate the hazardousness of chemical materials, e.g., the degradability and accumulability under natural environmental conditions including the air, water and soil, and the accumulability, condensability, etc., in the interior of living things. However, many years and a high cost are required for experimental evaluation of the hazardousness of chemical materials.
If the effectiveness and hazardousness of chemical materials can be nonexperimentally evaluated, the time and cost can be largely reduced (See non-patent document 15). The conventional data mining techniques enable each of patterns considered to be a factor of the effectiveness of a chemical material and patterns considered to be a factor of the hazardousness of the chemical material to be separately detected. However, it is difficult to suitably perform detection under a predetermined combination of conditions, e.g., detection of a chemical material having a certain degree of effectiveness while having a low degree of hazardousness by using any of the conventional techniques.