1. Field of the Invention
The present invention relates to a frequent pattern mining apparatus, frequent pattern mining method, and program and recording medium therefor. More particularly, it relates to a frequent pattern mining apparatus, frequent pattern mining method, and program and recording medium therefor which mine frequent subgraphs contained in many of objects with predetermined characteristics based on analysis results which indicate whether the objects analyzed and represented by graph structures have predetermined characteristics.
2. Background Art
Recently, with the introduction of IT (Information Technology) into various fields, data on substances found in nature, social phenomena, human behavior, etc. has been converted into electronic form. Against this background, attention has been paid to data mining techniques which involve detecting frequent patterns among large volumes of stored data and using the detected patterns effectively for business and science. Methods have been proposed which detect frequent patterns in logs in a fixed format such as relations or POS transactions stored in relational tables (see Non-patent Document 10). An example of such frequent pattern detection methods involves basket analysis (see Non-patent Document 2). Basket analysis methods proposed include those which mine correlation rules or frequent item sets when each item contained in a transaction has a conceptual hierarchy (see Non-patent Documents 5 and 11).
On the other hand, methods have been proposed which detect frequent patterns not only in logs in a fixed format, but also in graph- or tree-structured data. Data mining techniques for graph-structured data include WAMAR (see Non-patent Document 3), AGM (see Non-patent Documents 6 and 7), FSG (see Non-patent Document 8), MolFea (see Non-patent Document 4), etc. Data mining techniques for tree-structured data include Non-patent Document 9, FREQT (see Non-patent Document 1), TreeMiner (see Non-patent Document 12), etc. Also, Non-patent Document 13 proposes a method for mining frequent patterns in sequential data when a conceptual hierarchy at each vertex is provided.
Data mining techniques for detecting frequent patterns in graph-structured or tree-structured data can be applied to various fields including molecular structures of chemical substances, results of natural language parsing, and modification structures in a natural language.
Data mining techniques are described generally in, for example:    1. Tatsuya Asai, Kenji Abe, Shinji Kawasoe, Hiroki Arimura, Hiroshi Sakamoto, Setsuo Arikawa, Efficient Substructure Discovery from Large Semi-structured Data, the Proc. of the Second SIAM International Conference on Data Mining (SDM2002), pp. 158-174, 2002.    2. Agrawal, R., & Srikant, R.: Fast Algorithm for Mining Association Rules in Large Databases. Proc. of the 20th VLDB, pp. 487-499, 1994.    3. Dehaspe, L., Toivonen, H., & King, R. D. Finding frequent substructures in chemical compounds. Proc. of the 4th KDD, pp. 30-36, 1998.    4. De Raedt, L., & Kramer, S.: The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding. Proc. of the 17th IJCAI, pp. 853-859, 2001.    5. Han, J., & Fu, Y.: Discovery of Multiple-Level Association Rules from Large Databases Proc. of VLDB conference, pp. 420-431, 1995.    6. Inokuchi, I., Washio, T., & Motoda, H.: An Apriori-based Algorithm for Mining Frequent Substructures from Graph Data. Proc. of the 4th PKDD, pp. 12-23, 2000.    7. Inokuchi, A., Washio, T., Nishimura, Y., & Motoda, H.: A Fast Algorithm for Mining Frequent Connected Subgraphs. IBM Research Report, RT0448, February, 2002.    8. Kuramochi, M., & Karypis, G.: Frequent Subgraph Discovery. Proc. of the 1st ICDM, 2001.    9. Matsuzawa, H., & Fukuda, T.: Mining Structured Association Patterns from Databases. Proc. of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining.    10. Morimoto, Y.: Algorithm for Counting Frequent Sets in a Space Database, 2nd Data Mining Workshop, pp. 1-10.    11. Srikant, R., and Agrawal, R.: Mining Generalized Association Rules, Proc. of VLDB conference, pp. 407-419, 1995.    12. Zaki, M.: Efficiently Mining Frequent Trees in a Forest. Proc. of the 8th International Conference on KDD.    13. Ramakrishnan Srikant, Rakesh Agrawal, Mining Sequential Patterns: Generalizations And Performance Improvements, Proc. 5th Int. Conf. Extending Database Technology, pp. 3-17, 1996.
The graph mining and tree mining described above are techniques which extend objects of basket analysis to graph-structured data. Specifically, the vertices and edges of a graph are made to correspond to items while vertex labels and edge labels are made to correspond to types of item. When mining frequent patterns by introducing a concept of hierarchy into labels, the following problem arises.
Graph data, which has two or more vertices and edges with the same label, contains huge numbers of frequent patterns and candidates compared to an item set. Furthermore, if patterns in which labels represented by superordinate concepts and labels represented by subordinate concepts are generated as candidates for frequent patterns, frequent pattern candidates to be matched with the same subgraph are generated by substituting different vertex labels or edge labels with superordinate concepts. This means huge numbers of frequent pattern candidates, which are impossible to implement.
Thus, a need exists to provide a frequent pattern mining apparatus, frequent pattern mining method, and program and recording medium therefor which can solve the above problems.