1. Field of the Invention
The present invention relates to a method and a system for processing data having a graph structure, and a program therefor. The present invention particularly relates to a method and a system that effectively support a technique for efficiently extracting, from a database, a graph having a support level that is equal to or greater than a minimum support level designated by a user, and a program therefor.
2. Related Art
In Japanese Unexamined Patent Publication No. Hei 09-297686, for example, a basket analysis technique is disclosed whereby a regularity present between attributes included in data is extracted from a relational database as an association rule. This technique is used to establish an association between the products that a customer in a retail store loads into a single basket (shopping basket), and is also called a simultaneous purchase analysis. For example, the association rule that a customer who “buys bread and at the same time purchases milk” is expressed using the form “bread→milk”. The association rules are used to prepare a marketing strategy.
According to the basket analysis technique, all possible product combinations are enumerated as association rule choices, and actual product purchase databases are searched to establish the validity of combinations. However, the employment of this method would insure that the examination process would have to be performed for some relatively meaningless association rules, including those that would be applicable only to individual customers and that would contribute little or nothing to a marketing strategy discussion. For example, if in a retail store there were 10,000 different types of products, there would be 210000 possible product combinations. And since included in these combinations would be meaningless ones such as are described above, evaluating all product combinations would be neither efficient nor practical because of the enormous amount of time that would be required.
According to the technique in the above publication, pruning using a support level and a confidence level as references is performed. Further, an Apriori-based algorithm has been proposed for using the property of a support level to quickly extract frequently appearing product combinations. An example Apriori-based algorithm is described in “First algorithms for mining association rules”, R. Agrwal and R. Srikant, in Proceedings of the 20th VLDB Conference, pp. 487-499, 1994 (reference 1). The method described in this publication, reference 1, is used not only in the preparation of supermarket marketing strategies, but also for various other data processes, including those for factory quality control and for the extraction of knowledge from massive libraries of patient charts.
The method described in the above publication, or in reference 1, constitutes a method for quickly extracting sub-sets that frequently appear in multiple sets. However, when multiple sets are provided as multiple graph structures, this method can not be used to extract the sub-structures that frequently appear from multiple graph structures. In such cases, a method is required by which graph patterns can be efficiently extracted from databases having graph structures. An example method is an AGM algorithm (an Apriori-based Graph Mining algorithm) proposed by the present inventor. AGM algorithms are also described in “Application of the Method to Drive Frequent Graphs to Chlorinated Hydrocarbons”, Akihiro Inokuchi, Takashi Washio and Hiroshi Motoda, the 39th SIG-FAI, Japanese Society for Artificial Intelligence, 1999. No. 6, pp. 1052-1063, 1994 (reference 2), “Applying Algebraic Mining Method of Graph Substructures to Mutagenesis Data Analysis”, A. Inokuchi, T. Washio, T. Okada and H. Motoda, Proc. of International Workshop KDD Challenge on Real-world Data, pp. 41-46, PAKDD-2000 (2000) (reference 3), “An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data”, A. Inokuchi, T. Washio and H. Motoda, Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 13-20 (2000) (reference 4), or “Fast and Complete Mining Method for Frequent Graph Patterns”, Akihiro Inokuchi, Takashi Washio and Hiroshi Motoda, Journal of Japanese Society for Artificial Intelligence, Vol. 15, No. 6, pp. 1052-1063, 1994 (reference 5).
An AGM algorithm can be employed to obtain an association existing between a chemical structure and physiological liveness. For example, in reference 3, when data are provided that describe multiple nitro-organic compounds and mutagen activities, which for cancer is a propagating factor, an association rule that may amplify mutagen activities and an association rule that may suppress it is extracted.
According to the AGM algorithm method, an improvement in efficiency is effected by the early pruning from all other search spaces of a space for which a search is unnecessary. However, AGM algorithms extract frequently appearing graph patterns efficiently only when compared with the method according to which a search of all possible combinations is made, and depending on the support level that is set, the calculation time can be enormous.