Pattern mining, a subfield of data mining, is a process of analyzing data from different perspectives to identify strong and interesting relations among variables in datasets. The traditional pattern mining techniques based on simple pattern structures, such as itemset mining and sub-string mining, are not capable of capturing hidden relations among variables in the datasets. Mining patterns with complicated structures becomes increasingly important in the ‘Big Data’ era. Two mining techniques for hierarchical patterns, sequential pattern mining (SPM) and disjunctive rule mining (DRM), have attracted a lot of attention in the field of data mining.
Sequential Pattern Mining (SPM) is a data mining technique, which identifies strong and interesting sequential relations among variables in structured datasets. SPM has become an important data mining technique with broad application domains, such as customer purchase patterning analysis, correlation analysis of storage systems, web log analysis, software bug tracking, and software API usage tracking [Document 3]. For example, a college student at the University of Virginia (UVA) buys textbooks according to his classes during his college years. Since every class has pre-requisite classes, a student normally follows the prerequisite order to buy and study textbooks accordingly. The UVA bookstore could study the frequent sequential patterns from the records of book purchases and give every student good recommendations for his/her next step of learning. SPM is a right technique to mine sequential relations from the records of transactions. Here, a sequential pattern refers to a hierarchical pattern consisting of a sequence of frequent transactions (itemsets) with a particular ordering among these itemsets. In addition to frequent itemset mining (FIM). SPM needs to capture permutations among the frequent itemsets. This dramatically increases the number of patterns to be considered and hence, the computational cost relative to simple set mining or string mining operations. In addition, as the sizes of interesting datasets keep growing, higher performance becomes critical to make SPM practical.
Many algorithms have been developed to improve the performance of SPM. Three most competitive algorithms today are Generalized Sequential Pattern (GSP) [Document 21], Sequential PAttern Discovery using Equivalence classes (SPADE) [Document 27] and PrefixSpan [Document 17]. SPADE and PrefixSpan are generally favored today and perform better than GSP on conventional single-core CPUs in average cases. However, the GSP is based on Apriori algorithm, which exposes massive parallelism and may be a better candidate for highly parallel architectures. Several parallel algorithms have been proposed to accelerate SPM on distributed-memory systems [Documents 8, 12, 20, and 26]. Increasing throughput per node via hardware acceleration is desirable for throughput as well as energy efficiency. However, even though hardware accelerators have been widely used in frequent set mining and string matching applications [Documents 10, 28, and 29], no hardware-accelerated solution for SPM has been proposed yet.
Disjunctive rule mining (DRM) is derived from frequent itemset mining, but allows “alternatives” for each item. For example, consider the bookstore story mentioned earlier. Each class recommends several reference books. Each student tends to select one or two reference books to buy together with the textbook for each class. Since the reference books are labeled as separate items, the strong relation between the textbook and one specific reference book may not be captured by traditional frequent itemset mining, but could be recognized by disjunctive rule mining when considering possible alternatives. The UVA bookstore could calculate the disjunctive rules from the records of book purchases and give every student good recommendations of reference books for each class. Several CPU algorithms [Documents 7, 15, and 19] were proposed to mine disjunctive rules effectively. However, no hardware-accelerated disjunctive rule mining method has been proposed yet.
The new Automata Processor (AP) [Document 9] offers an appealing accelerator architecture for hierarchical pattern mining. The AP architecture exploits the very high and natural level of parallelism found in DRAM (Dynamic Random Access Memory) to achieve native-hardware implementation of non-deterministic finite automata (NFAs). The use of DRAM to implement the NFA slates provides a high capacity: the first-generation boards, with 32 chips, provide approximately 1.5M automaton states. All of these states can process an input symbol and activate successor states in a single clock cycle, providing extraordinary parallelism for pattern matching. The AP's hierarchical, configurable routing mechanism allows rich fan-in and fan-out among states. These capabilities allow the AP to perform complex symbolic pattern matching and test input streams against a large number of candidate patterns in parallel. The AP has already been successfully applied to several applications, including regular expression matching [Document 9], DNA motif searching [Document 18], and frequent set mining [Documents 6, 22, and 24]. It has been previously shown [Document 23] that the AP can also achieve impressive speedups for mining hierarchical patterns. The present invention extends that prior work with additional capabilities and analysis.
Specifically, CPU-AP heterogeneous computing solutions are described to accelerate both SPM and DRM under the Apriori-based algorithm framework, whose multipass algorithms to build up successively larger candidate hierarchical patterns are best suited to the AP's highly parallel pattern-matching architecture, which can check a large number of candidate patterns in parallel. The present invention extends the prior AP-SPM work [Document 23] with disjunctive capabilities and describes a flexible framework for mining hierarchical patterns such as sequential patterns and disjunctive rules with hardware accelerators. Designing compact NFAs is a critical step to achieve good performance of AP-accelerated SPM and DRM. The key idea of designing an NFA for SPM is to flatten sequential patterns to strings by adding an itemset delimiter and a sequence delimiter. This strategy greatly reduces the automaton design space so that the template automaton for SPM can be compiled before runtime and replicated to make full use of the capacity and massive parallelism of the AP board. The described NFA design for recognizing disjunctive rules utilizes the on-chip Boolean units to calculate AND relations among disjunctive items (“d-item” in short, an item allowing several alternatives), but takes full use of the bit-wise parallelism appearing in the state unites of the AP chips to calculate OR relations of items within a d-item.
On multiple real-world and synthetic datasets, the performance of the described AP-accelerated SPM is compared against CPU and GPU implementations of GSP, an Apriori based algorithm, as well as Java multi-threaded implementations of SPADE and PrefixSpan [Document 11]. The performance analysis of the AP-accelerated SPM shows up to 90× speedup over the multicore CPU GSP and up to 29× speedups over the GPU GSP version. The described approach also outper-forms the Java multi-threaded implementations of SPADE and PrefixSpan by up to 452× and 49× speedups. The described AP-accelerated SPM also shows good performance scaling as the size of the input dataset grows, achieving even better speedup over SPADE and PrefixSpan. The input size scaling experiments also show that SPADE fails at some datasets larger than 10 MB (a small dataset size, thus limiting utility of SPADE in today's ‘big data’ era).
The described CPU-AP DRM solution shows up to 614× speedups over sequential CPU algorithm on two real-world datasets. The experiments also show a significant increase of CPU matching-and-counting time when increasing the d-rule size or the number of alternative items but constant AP processing time with increasing complexity of disjunctive patterns. This analysis extends the prior analysis [Document 23] with Boolean-based pattern matching including analysis of disjunctive features.
The present invention has the following goats:                1. To develop a flexible CPU-AP computing infrastructure for mining hierarchical patterns based on Apriori algorithm;        2. To describe a novel automaton design strategy, called linear design, to generate automata for matching and counting hierarchical patterns and apply it on SPM. This strategy flattens the hierarchical structure of patterns to strings and adopts a multiple-entry scheme to reduce the automaton design space for candidate patterns;        3. To describe another novel automaton design strategy, called reduction design, for the disjunctive rule matching and counting. This strategy takes full use of the bit-wise parallelism of the state units on the AP chips to discover the optionality of items on a lower level and utilizes Boolean units on The AP chip to identify occurrences of items on a higher level; and        4. To show performance improvement of AP SPM and DRM solutions and broader capability over multicore and GPU implementations of GSP SPM, and to show dial AP SPM and DRM solutions outperform state-of-the-art SPM algorithms SPADE and PrefixSpan (especially for larger datasets).Related Work        
Because of the larger permutation space and complex hierarchical patterns involved, performance is a critical issue for applying hierarchical pattern mining techniques. Many efforts have been made to speed up hierarchical pattern mining via software and hardware.
Sequential Pattern Mining
Sequential Algorithms
Generalized Sequential Pattern GSP [Document 21] follows the multi-pass candidate generation-pruning scheme of the classic Apriori algorithm and inherits the horizontal data format and breadth-first-search scheme from it Also in the family of the Apriori algorithms. Sequential PAttern Discovery using Equivalence classes (SPADE) [Document 27] was derived from the concept of equivalence class [Document 25] for sequential pattern mining and adopted the vertical data representation. To avoid the multiple passes of candidate generation and pruning steps, PrefixSpan [Document 17] algorithm extended the idea of the pattern growth paradigm [Document 13] to sequential pattern mining.
Parallel Implementations
Shintani and Kitsuregawa [Document 20] proposed three parallel GSP algorithms on distributed memory systems. These algorithms show good scaling properties on an IBM SP2 cluster. Zaki [Document 26] designed pSPADE, a data-parallel version of SPADE for fast discovery of frequent sequences in large databases on distributed-shared memory systems, and achieved up to 7.2× speedup on a 12-processor SGI Origin 2000 cluster. Guralnik and Karypis [Document 12] developed tree-projection-based parallel sequence mining algorithms for distributed-memory architectures and achieved up to 30× speedups on a 32-processor IBM SP cluster. Cong [Document 8] presented a parallel sequential pattern mining algorithm (Par-ASP) under their sampling-based framework for parallel data mining, implemented by using MPI over a 64-node Linux cluster, achieving up to 37.8× speedup.
Accelerators
Hardware accelerators allow a single node to achieve orders of magnitude improvements in performance and energy efficiency. General-purpose graphics processing units (CPUs) leverage high parallelism, but GPUs' single instruction multiple data (SIMD), lockstep organization means that the parallel tasks must generally be similar. Hryniów [Document 14] presented a parallel GSP implementation on GPU. However, this work did not accelerate sequential pattern mining, instead relaxed the problem to an itemset mining. There has been no previous work on hardware acceleration for true SPM. In particular, SPADE and PrefixSpan have not been implemented on GPU. In the present invention, true GSP for SPM is implemented on GPU.
Disjunctive Rule Mining
Nanavati [Document 15] first introduced the concept of disjunctive rules and did conceptual and algorithmic studies on disjunctive rules of both inclusive OR and exclusive OR. Sampaio [Document 19] developed a new algorithm to induce disjunctive rules under certain restrictions to limit the search spaces of the antecedent and consequent terms. Chiang [Document 7] proposed disjunctive consequent association rules, a conceptual combination of the disjunctive rule and the sequential pattern, and illustrated the promising commercial applications of this new mining technique. However, all of these existing works focused on effectiveness more than the efficiency of the implementations.
The Automata Processor shows great potential in boosting the performance of massive and complex pattern-searching applications. The present invention describes that the AP-accelerated solutions for sequential pattern mining and disjunctive rule mining have great performance advantages over the CPU and other parallel and hardware-accelerated implementations.