Computer systems generate logfiles to record the result of various data processing operations that they perform. Some of those logfiles may indicate relationship between two entities. For example, communication logs of servers may contain records of data communication performed between a source server and a destination server, and bank transaction logs may indicate specific transactions performed between two accounts. On the basis of such relationship-indicative log records, a computer may generate a graph to represent processing activities that a plurality of entities in the system conducted during a specific time period. This graph is formed from nodes representing individual elements of the system and edges (i.e., connections between nodes) describing relationship between the elements.
The above-described logs may record some security problems, such as attacks in network communication and malicious activities in bank transactions. The graphs generated from logs in such days of fraudulent events may exhibit a particular characteristic pattern of nodes and edges. These logs are thus subjected to a process of searching for a subgraph that frequently appear in multiple graphs. The discovered subgraph is called a “frequent graph” and used to detect another fraudulent event in later logs. That is, if a known frequent graph is seen in a graph generated from a new log, then it suggests that the system is most likely to have encountered a similar fraudulent event.
Several techniques have been proposed to discover frequent graphs in a large number of graphs. For example, one proposed method enumerates frequent graphs with a relatively small size and growing them by adding nodes until their frequencies fall to or below a threshold. Another proposed method extracts frequent graphs by repetitively consolidating a frequent pair of adjacent nodes into one node. See, for example, the following documents:
Japanese Laid-open Patent Publication No. 2005-63277
Japanese Laid-open Patent Publication No. 2014-225117
Felipe Llinares-Lopez, Mahito Sugiyama, Laetitia Papaxanthos, Karsten M. Borgwardt, “Fast and Memory-Efficient Significant Pattern Mining via Permutation Testing,” KDD′15 Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2015, Pages 725-734.
The conventional frequent graph discovery methods are non-deterministic polynomial time complete (NP-complete), and the number of possible combinations increases explosively with a growth of nodes. For example, the aforementioned method of adding nodes to small frequent graphs has to determine whether the subgraph in question matches with all the other possible subgraphs each time a node is added. Such graph matching problems are NP-complete and would run into a combinatorial explosion. In fact, the noted method is only suitable for graphs composed of several hundred nodes or the like.
Now think of a subgraph with several hundred nodes sampled from a given source graph. A frequent graph search is performed only within the subgraph. If a found graph is frequently seen in the above subgraph, then this would also hold true in a larger area in the source graph. Accordingly, the found frequent graph is used as a candidate graph in the next round of search in an expanded range of nodes. The described method is, however, also prone to run into combinatorial explosions because of its NP-completeness. It is therefore difficult to expand the search range to above several hundred nodes.