The present invention relates to data processing techniques and more particularly, but not exclusively, relates to the discovery and visualization of association rules.
Association is a powerful data analysis technique that finds frequent use in data mining tasks. Given a set of items, S={i1, i2, ij, . . . , in} where nxe2x89xa72, an association rule is an implication of the form Xxe2x86x92ij; where X⊂S, and ij∉S such that ij∉X. The set of items X is the antecedent, while the item ij is the consequent of the association rule. The size of X is between 1 to (nxe2x88x921) items. The xe2x80x9csupportxe2x80x9d of the rule Xxe2x86x92ij is the percentage of items in S that satisfies the union of items in X and ij. The xe2x80x9cconfidencexe2x80x9d of the rule is the percentage of items that satisfies X and also satisfies ij. The support and confidence levels of an association rule are among the metadata frequently of interest to analyzers.
For the given elements A, B, C and D of a common domain, A+B+Cxe2x86x92D is an example of an association rule; where the occurrence of A and B and C together imply D. Another example from a supermarket database is xe2x80x9c80% of the people who buy diapers and baby powder also buy baby oil.xe2x80x9d Applying the more general notation from the earlier example, this supermarket database association can be represented in elemental form as A+B=C; where A=xe2x80x9cbuy diapersxe2x80x9d, B=xe2x80x9cbuy baby powderxe2x80x9d, and C=xe2x80x9cbuy baby oil.xe2x80x9d For further background information concerning association rule data mining, reference is made to Pak Chung Wong, Paul Whitney and Jim Thomas, xe2x80x9cVisualizing Association Rules for Text Miningxe2x80x9d Proceedings of IEEE Information Visualization, (published by IEEE CS Press) (dated Oct. 26, 1999).
In contrast to association rules, another common knowledge discovery and data mining tool is sequential patterning. A four-element sequential pattern can be represented as Axe2x86x92Bxe2x86x92Cxe2x86x92D; where A, B, C, and D are elements of the same domain. An association rule is a study of xe2x80x9ctogethernessxe2x80x9d of elements, whereas a sequential pattern is a study of the xe2x80x9corderingxe2x80x9d or xe2x80x9carrangementxe2x80x9d of elements. Further background information about sequential patterns can be found in above cited U.S. Provisional Patent Application No. 60/239,334 filed Oct. 9, 2000.
To support analyses of association rules, scientists and engineers have developed various limited visualization schemes. Among these limited schemes are the two-dimensional item-to-item matrix and the directed graph. The basic design of a two-dimensional (2D) association matrix positions the antecedent and consequent items on separate axes of a square matrix as illustrated in the examples of FIGS. 7 and 8. Customized icons are drawn on certain matrix tiles that connect the antecedent and the consequent items of the corresponding association rules. Different icons can be used to depict different metadata such as the support and confidence values of the rules. FIG. 7 depicts an association rule (Bxe2x86x92C). Both the height and the color of the column icon can be used to present metadata values. The values of support and confidence are mapped to 3D columns that are built separately on and beneath the matrix tiles. Alternatively, other icons such as disk and bar can be used to visualize metadata.
This type of item-to-item matrix is frequently effective to show a one-to-one binary relationship; however, it is often less effective when visualizing many-to-one relationships, as in the case of association rules with multiple antecedent items. For example, in FIG. 8 it is unclear if there is only one association rule (A+Bxe2x86x92C) or two (Axe2x86x92C and Bxe2x86x92C). The lack of a practical way to identify the togetherness of individual antecedent items makes this matrix form a weaker candidate to visualize rules with multiple antecedent items.
In one attempt to address this problem, all the antecedent items of an association rule are grouped as one unit and plotted against its consequent, resulting in an antecedent-to-consequent plot. For example, a dedicated item group (A+B) is created in FIG. 9 to describe the association rule (A+Bxe2x86x92C). Unfortunately, as the number of antecedents for a given rule increases, the number of possible item-to-item relationships becomes unwieldy. Furthermore, the loss of item identity within an antecedent group also undermines advantages provided by visualizing the associations with a matrix. For example, the row (or column) of the matrix connected to an item can no longer be used to search for all the rules involving that item.
Another problem with some item-to-item matrix displays is object occlusion, especially when multiple icons are used to depict different metadata values on the matrix tiles. The occlusion problem is illustrated in the example of FIG. 10.
As illustrated in FIG. 11, a directed graph is another possible scheme for depicting item associations. The nodes of a directed graph represent the items, and the edges represent the associations. FIG. 11 shows three association rules (Axe2x86x92C, Bxe2x86x92C, A+Bxe2x86x92C). Unfortunately, for as few as a dozen rules, a directed graph can often become tangled and difficult to follow. In an attempt to address this problem, the edges are animated to show the associations of certain items with 3D rainbow arcs. See, Beth Hetzler, W. Michelle Harris, Susan Havre, and Paul Whitney, xe2x80x9cVisualizing the Full Spectrum of Document Relationshipsxe2x80x9d Proceedings of the Fifth International Society for Knowledge Organization (ISKO) Conference (dated 1998). However, this animation technique typically requires significant human interaction to turn on and off the item nodes, and it is frequently difficult to show multiple metadata values, including support and confidence, alongside the association rules.
Indeed, with any of these existing schemes, it is often difficult to meaningfully visualize a large number of association rules, and effective management of association rules with multiple antecedents is generally lacking. Accordingly, new strategies are needed to identify and present association rule information. The present invention addresses such needs.
One embodiment of the present invention is a unique data processing technique. Other embodiments include unique apparatus, systems, and methods for processing association rules.
A further embodiment of the present invention includes processing a dataset to determine a number items and establishing several rules with a computer system. These rules each correspond to a different association between two or more of the items. A visualization is provided that displays a rule-to-item relationship for each one of the rules. This visualization can further display one or more types of metadata for the rules. The visualization can be in a two-dimensional or three-dimensional form.
Yet a further embodiment includes: processing a dataset with a computer to determine several association rules and providing a visualization of the association rules; where the association rules each correspond to a different one of a number of portions of the visualization. A set of identifiers is included in the visualization for each one of the association rules. These identifiers each have a different location along the different one of the portions. One of the identifiers represents a consequent item, and one or more other of the identifiers correspondingly represent one or more antecedent items.
In another embodiment, a computer system includes one or more processors operable to process a dataset to determine a number of items and establish a number of association rules for these items. The one or more processors generate a visualization output of the association rules that includes one or more signals representative of a rule-to-item relationship for each one of the association rules. The system further includes an output device responsive to the visualization output to display a visualization of the association rules.
In still another embodiment, a computer apparatus includes logic to generate a visualization of several association rules. This logic comprises an extraction engine operable to determine a number items from a dataset, an association rule mining engine operable to establish the association rules from the items, and a visualization output generator. This generator provides the visualization with a region defined by a first axis and a second axis. The association rules each correspond to a different location along the first axis and the items each correspond to a different location along the second axis to provide a rule-to-item relationship for each of the association rules.
In yet further embodiments, other systems, computer-readable devices, computer information transmission mediums (such as computer networks) are provided that include logic and/or programming instructions to generate one or more unique association rule visualizations of the present invention.
Accordingly, one object of the present invention is to provide a unique data processing technique.
Another object is to provide a unique apparatus, system, device, or method for visualizing association rules.