“Frequent patterns” are sequences of data items that occur in a database at a relatively high frequency. Data items may be numbers, characters, strings, names, records, and so forth.
Discovery of frequent patterns, also referred to as frequent pattern searching or mining, has become important in many fields, and it is often desired to find frequently occurring patterns in very large data sets.
One way to visualize the process of pattern mining is as a hierarchical graph or tree of patterns and sub-patterns. Suppose, for example, that it is desired to find frequently occurring character patterns in a text. A first pass might identify all single characters that might form the beginning of a character pattern. These “candidate” items would form the first level of a hierarchical tree structure. A second pass might then add a second, dependent level to the hierarchical tree structure. The second pass would find, for each first-level candidate item, all single characters or strings that occur subsequent to the first-level candidate item. This process would then be iterated to add further sub-levels to the hierarchical tree, which could eventually be examined to find those strings or patterns that occur most frequently.
Many algorithms are available for implementing the process of searching for frequently occurring patterns. However, frequent pattern mining against large databases is computationally expensive and time consuming. Accordingly, efforts have been made to utilize multiple computers or computing nodes, running in parallel, to speed the process. A traditional approach to distributing tasks among computing nodes might be to partition the search space into many sub-search spaces, and utilize available computing nodes to search the partitions in parallel. However, it can be difficult to predict the amount of work that will be involved in processing any particular partition, and it is therefore difficult to create partitions in such a way that each computing node will have the same amount of work. The resulting unbalanced partitioning tends to decrease the efficiency of parallel pattern mining algorithms.