The present invention relates to data processing and more particularly, but not exclusively, relates to the discovery and visualization of sequential patterns.
The task of sequential patterns in knowledge discovery and data mining is to identify an item that frequently precedes another item. Generally a sequential pattern can be described as a finite series of elements. A four-element sequential pattern can be represented as A→B→C→D; where A, B, C, and D are elements of the same domain. A nonlimiting example of a sequential pattern is: “90% of the die-hard fans who saw the movie Titanic went on to buy the movie sound track CD, followed by the videotape when it was released.” Using the above notation, this sequential pattern example can be more generally represented as: A→B→C; where A=“saw the movie”, B=“buy CD”, and C=“buy video tape,” where A→B→C has a support of 90%.
In contrast, an association rule is an implication of the form X→Y where X is a set of antecedent items and Y is the consequent item. For the given elements A, B, C, and D of a common domain, A+B+C→D is an example of an association rule. An association rule is a study of “togetherness” of elements, whereas a sequential pattern is a study of the “ordering” or “arrangement” of elements. Further background information about association rule data mining can be found in Pak Chung Wong, Paul Whitney and Jim Thomas, “Visualizing Association Rules for Text Mining” Proceedings of IEEE Information Visualization, (published by IEEE CS Press) (dated 26 Oct. 1999).
Frequently, one goal of sequential pattern discovery is to assess the evolution of events against a measured timeline and detect changes that might occur coincidentally. This information can be used to detect medical fraud in insurance claims, evaluate drug performances in pharmaceutical industry, determine risk factors in military operations, and/or develop retail sales trends for marketing purposes, just to name a few. Further general background information about sequential patterns can be found in U.S. Pat. Nos. 6,006,223; 5,819,266; and 5,742,811 to Agrawal et al; and in Rakesha Agrawal and Ramakrishnan Srikant, “Mining Sequential Patterns,” Proceedings of the International Conference on Data Engineering (ICDE), (dated March 1995).
As more powerful processors and larger datasets become available, the ability to effectively recognize and utilize sequential patterns becomes more difficult. Accordingly, new strategies are needed to identify and present sequential pattern information. The present invention addresses such needs.