Proliferation of computers throughout developed societies has enabled the collection and storage of many types of large data sets, including for example, information on banking transactions, medical data and information on communications (e.g., telephone and email records). Thanks to orders of magnitude increases in data storage capacity and processing power, there is the potential to exploit this data for various purposes. Thus, the field of Data Mining has arisen with the aim of finding techniques for extracting useful information from large data sets.
It is well known that many existing data mining techniques often produce a large number of rules, which make it very difficult to identify interesting rules by manual inspection. This is called the interestingness problem. Over the years, many techniques have been proposed to deal with this problem in order to help the user find useful knowledge. However, despite these efforts, interestingness remains a difficult problem. Few existing techniques have made it to real life applications. The difficulty is often attributed to the fact that interestingness is highly subjective. It depends on the user's current needs and his/her existing domain knowledge. While this is true, the inventors believe that another reason for the limited success is that workers in the art have perhaps looked in the wrong direction. Data mining software following the current rule mining paradigm, tends to fragment the knowledge space, generating massive rules, and at the same times, creating a large number of holes in the space of useful knowledge that could potentially be gleaned from the data thus making it difficult for the user to find interesting knowledge.
One important type of data that is subjected to data mining is “class” labeled data. For example a medical database can include, for each person, a myriad of different patient history data items (called “attributes” hereinbelow), such as age, sex, indication of any family history of disease, etc and a data item which indicates whether the person succumbed to a disease that is the subject of the database. The latter data item (attribute) would be the class attribute.
Another example of a type of data that can be productively subjected to data mining is mobile telephone call records. Mobile telephone records that are collected by network service providers contain a myriad of different parameters related to each telephone call. One application of such data is to help understand what leads to failed calls so that network service can be improved. For this application the class label would be an attribute that indicates the final disposition of the call, i.e., failed during set up, dropped while in progress, or ended successfully.
The applications of class labeled data can be divided into two categories: (1) Predictive data mining: the objective of which is to build predictive or classification models that can be used to classify future cases or to predict the classes of future cases and which has been the focus of research of the machine learning community. (2) Diagnostic data mining: the objective of which is usually to understand the data and to find causes of some problems in order to solve the problems.
For software designed to facilitate gleaning understanding from data, no prediction or classification is needed. The class labels are already known. The objective is not prediction, but to better understand the data and to find causes of particular outcomes (classes, e.g., call failures, patient succumbing to particular disease) or to identify situations in which particular outcomes are more likely to occur. That is, the software user wants interesting and actionable knowledge. Interestingness evaluation of rules is thus the key. Clearly, the discovered knowledge has to be understandable.
As the data set is a typical classification data set, rules that characterize the subject of the data mining are of the following form:
X→y,
where X is a set of conditions and y is a class, e.g., for the mobile telephone example above yε{failed-during-setup, dropped-while-in-progress, ended-successfully}. The system described herein focuses on helping the user identify interesting knowledge based on such rules. These rules basically give the conditional probabilities of Pr(y|X), which are exactly what a diagnostic data mining application is looking for. Moreover, such rules are easily understood.
It is easy to see that such rules are classification rules, which can be produced by classification algorithms such as decision trees and rule induction, and class association rule mining. However, traditional classification techniques such as decision trees and rule induction are not suitable for the task due to three main reasons:
(1) A typical classification algorithm only finds a very small subset of the rules that exist in data based on statistical significance. Most of the rules with similar classification performance are not discovered because the objective is to find only enough rules for classification. However, the subset of discovered rules may not be useful in diagnostic data mining. Those useful rules are left undiscovered. We call this the completeness problem.
(2) Due to the completeness problem, the context information of rules is lost, which makes rule analysis later very difficult as the user does not see the complete information. We call this problem the context problem.
(3) Since the rules are for classification purposes, they usually contain many conditions in order to achieve high accuracy. Long rules are, however, of limited use according to our experience because engineers, doctors and other domain experts can hardly take any action based on them. In many cases, it may not be possible to simulate many conditions in the laboratory to find the real causes. Furthermore, the data coverage of long rules may often be so small that it is not worth doing anything about them. We call this problem the long rules problem.
Class association rule mining is found to be more suitable as it generates all rules in data that satisfy the user specified minimum support and minimum confidence thresholds. Class association rules are a special type of association rules with only a class on the right-hand-side of each rule.
Using the above mentioned call record data set, we were able to put several interestingness techniques to the test. We found that most existing interestingness techniques are useful to some extent, but they are “good to have” techniques rather than essential techniques. Thus, they cannot form the core of a rule interestingness analysis system to help the user systematically identify interesting knowledge. To our great surprise, we also discovered that the current rule mining paradigm itself poses a major obstacle for this interestingness analysis task. Below we first summarize the main shortcomings of the current interestingness techniques:
Lack of contexts: Most existing methods treat rules individually. However, a key discovery from our interactions with domain experts is that a single rule is seldom interesting by itself no matter what its support and confidence values are. It is only interesting if it deviates significantly from its siblings. That is, a rule is only interesting in a meaningful context and in comparisons with others. The user wants to see both the rule and the context.
Existing techniques do not find generalized knowledge from rules (meta-mining): Each individual rule may not be interesting by itself. A group of related rules together may represent an important piece of knowledge. For example, a set of rules from an attribute may show some interesting trend, i.e., as the values of the attribute go up, a call is more likely to fail. Our domain experts suggested that such knowledge is much more useful than individual rules because they may reveal some hidden underlying principles.
Lack of knowledge exploration tools: Due to the subjective nature of interesting knowledge, a systematic method is required for the user to explore the rule space in order to find useful knowledge. Our experiences show that the user-driven interactive discovery may be the best approach. Although there are many existing techniques for visualizing rules, they mostly treat and visualize rules individually, which we found in our applications, was not very effective.
Context is the key to dealing with all the above problems. However, the existing rule mining paradigm eliminates a large amount of contextual information. Let us see why:
In the mining of class association rules, user-specified minimum support (minsup) and minimum confidence (minconf) values are used to ensure that the computation is feasible. Those rules that do not meet the minsup or minconf requirements are not generated. However, they can form important context information for other rules and generalized knowledge. Such contextual information is thus lost.
For example, an attribute B has three possible values, a, b, d, and C is the class attribute. Due to the minsup we only find the rule B=a→C=c, where c is a class value. (Note that it is common practice that C can be omitted at the right side of the formula. Alternatively, the above rule can also be written as (B=a, C=c) or (B=a, c)). The other two possible rules, B=b→c and B=d→c, which form the context for B=a→c, are not found because they do not satisfy the minsup. We call them holes (or gaps) in the knowledge space. Then rule B=a→c does not have a context. We also may not be able to find any generalized knowledge about the attribute due to incomplete information or the holes. Hence, we say that the current mining paradigm fragments the knowledge space and creates discontinuity in the space, which make the understanding and exploration of knowledge by human users very difficult.
What is needed is a new approach to address all the above mentioned shortcomings.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.