This invention relates generally to data processing systems and, more particularly, to a system and method for improving understanding of business data by interactive rule manipulation.
From a business perspective, the term xe2x80x9cknowledge discoveryxe2x80x9d refers to an iterative process approach to modeling and exploring data, and gaining knowledge therefrom for integration into a business process. FIG. 1 depicts an exemplary representation of the knowledge discovery process. First, a general business problem is identified (step 105). The problem is typically framed within a particular business context to formulate a clear and well-understood problem definition (item 110). Once the problem has been explicitly defined, data relevant to the problem, or reflective of the problem, is collected, assembled, and preprocessed (step 120), usually from one or more existing databases (item 115). This data is then preprocessed to a suitable form to produce a dataset (item 125). The dataset is processed by a data mining technique (step 130) and translated into a model (item 135) that abstracts systematic patterns in the underlying data. Model interpretation and evaluation (step 140) is aimed at extracting nuggets of knowledge (item 145) for exploitation and use in a business, and ultimate integration into a business process (step 150).
Building a model from raw data (step 130 above) may be done through a variety of data mining techniques. The term xe2x80x9cdata miningxe2x80x9d refers to an automated process of discovering systematic patterns in large amounts of data. A data mining algorithm is thus used to model data by detecting and generalizing patterns in historical data, so that the model can be applied to new scenarios (or combinations) not directly covered, or observed in, the data. Continuing with the exemplary representation of knowledge discovery in FIG. 1, the results of model evaluation and interpretation (step 140) and business process integration (step 150) may provide additional feedback into other steps in subsequent iterations of the knowledge discovery process, including problem identification (step 105), data assembly and preprocessing (step 120) and data mining (step 130). Further information on a prior art knowledge discovery process is included in xe2x80x9cAdvances in Knowledge Discovery and Data Mining,xe2x80x9d by Usama M. Fayyad et al., AAAI Press/MIT Press, Cambridge, Mass. (1996), incorporated herein by reference.
xe2x80x9cPredictive data miningxe2x80x9d refers to the use of data mining techniques for building predictive models. Predictive models are learned from historical (ore pre-classified) data using data mining algorithms. These models can then predict the quality or attribute of interest for new and unseen cases. For example, a predictive model, learned from prior known cases of credit (un)worthiness, can be used to predict the credit worthiness (or otherwise) of a new customer. Predictive data mining models are quantitative and compute objective results: xe2x80x9cyesxe2x80x9d or xe2x80x9cnoxe2x80x9d, a probability, a value, a classification, etc. Examples of predictive data mining techniques include neural networks, statistical regression, decision trees, decision rules, etc. Qualitative data mining techniques, on the other hand, generally provide insight into relationships among data and provide more subjective results.
The knowledge discovery process depicted in FIG. 1 fails to explicitly include one crucial component, required for a successful knowledge discovery process in a business contextxe2x80x94domain knowledge. xe2x80x9cDomain knowledgexe2x80x9d includes knowledge about a business, business processes, and xe2x80x9ccommon sensexe2x80x9d knowledge pertaining to a problem being investigated. Domain knowledge comes into play in various tasks of an iterative knowledge discovery process, including circumscribing the business problem, selecting and using appropriate data, and evaluating models (or patterns) generated by data mining algorithms.
In the prior art, techniques for qualitative data mining include exploratory data analysis, data visualization, data base reports and on-line analytical processing (OLAP). Exploratory data analysis uses a variety of statistical techniques to explore raw data. xe2x80x9cExploratory Data Analysis,xe2x80x9d by J. W. Tukey, Addition Wesley (1977), incorporated herein by reference, describes several of these techniques. Data visualization techniques display data in graphs, charts, and other visual constructs. Database reports and OLAP usually provide canned views of data for specific applications and to highlight particular pieces of information. xe2x80x9cOLAP and Data Warehousing,xe2x80x9d Workshop Notes from the Third International Conference on Knowledge Discovery and Data Mining, by S. Chaudhuri and U. Dayal (1997), incorporated herein by reference, provides an overview of techniques and references to OLAP literature. xe2x80x9cVisual Techniques for Exploring Databases,xe2x80x9d from the Third International Conference on Knowledge Discovery and Data Mining, by D. Keim (1997), incorporated herein by reference, provides an overview of techniques and references to visualization literature. All of these techniques, because they operate only on raw data, can be time consuming and cumbersome, are usually tailored to specific tasks, and place the entire burden of interpretation and understanding on the user. Further, they fail to provide a systematic way of representing, and taking into account, a particular business problem and context. Nor do they explicitly incorporate domain knowledge into the knowledge discovery process.
Thus, given the need for incorporating domain knowledge in a knowledge discovery endeavor, there exists a need for an interactive tool that supports qualitative data mining, and that will guide a user towards understanding and gaining insight and subsequent knowledge from data and promote an understanding of market dynamics from a specific business perspective, in a quick and efficient manner.
In accordance with a first aspect of the present invention, as embodied and broadly described herein, an interactive method is implemented in a data processing system for directed data analysis. The system receives rules, which represent relationships among elements of a dataset. The system then displays the rules and computes business measures of quality associated with the rules. A user may manipulate the rules by changing, adding, or deleting parameters.
In accordance with an embodiment of the first aspect of the present invention, as embodied and broadly described herein, an apparatus is provided that includes a memory, including model generation software system that translates a dataset into a model, and a model manipulation system that allows a user to manipulate the generated model. The system further includes an output device for displaying the generated model, an input device for receiving user manipulations of the generated model, and at least one processor for executing the model generation software and the model manipulation system.
Furthermore, in accordance with an embodiment of the first aspect of the present invention, as embodied and broadly described herein, a graphical user interface is provided that includes a market segmentation interface and a rule manipulation window. The graphical user interface further includes means for displaying a rule, and allowing a user to manipulate the displayed rule, where a manipulation of the displayed rule permits an analyst to perform directed data analysis.