From a business perspective, the term “knowledge discovery” refers to an iterative process approach to modeling and exploring data, and gaining knowledge therefrom for integration into a business process. FIG. 1 depicts an exemplary representation of the knowledge discovery process. First, a general business problem is identified (step 105). The problem is typically framed within a particular business context to formulate a clear and well-understood problem definition (item 110). Once the problem has been explicitly defined, data relevant to the problem, or reflective of the problem, is collected, assembled, and preprocessed (step 120), usually from one or more existing databases (item 115). This data is then preprocessed to a suitable form to produce a dataset (item 125). The dataset is processed by a data mining technique (step 130) and translated into a model (item 135) that abstracts systematic patterns in the underlying data. Model interpretation and evaluation (step 140) is aimed at extracting nuggets of knowledge (item 145) for exploitation and use in a business, and ultimate integration into a business process (step 150).
Building a model from raw data (step 130 above) may be done through a variety of data mining techniques. The term “data mining” refers to an automated process of discovering systematic patterns in large amounts of data. A data mining algorithm is thus used to model data by detecting and generalizing patterns in historical data, so that the model can be applied to new scenarios (or combinations) not directly covered, or observed in, the data. Continuing with the exemplary representation of knowledge discovery in FIG. 1, the results of model evaluation and interpretation (step 140) and business process integration (step 150) may provide additional feedback into other steps in subsequent iterations of the knowledge discovery process, including problem identification (step 105), data assembly and preprocessing (step 120) and data mining (step 130). Further information on a prior art knowledge discovery process is included in “Advances in Knowledge Discovery and Data Mining,” by Usama M. Fayyad et al., AAAI Press/MIT Press, Cambridge, Mass. (1996), incorporated herein by reference.
“Predictive data mining” refers to the use of data mining techniques for building predictive models. Predictive models are learned from historical (ore pre-classified) data using data mining algorithms. These models can then predict the quality or attribute of interest for new and unseen cases. For example, a predictive model, learned from prior known cases of credit (un)worthiness, can be used to predict the credit worthiness (or otherwise) of a new customer. Predictive data mining models are quantitative and compute objective results: “yes” or “no”, a probability, a value, a classification, etc. Examples of predictive data mining techniques include neural networks, statistical regression, decision trees, decision rules, etc. Qualitative data mining techniques, on the other hand, generally provide insight into relationships among data and provide more subjective results.
The knowledge discovery process depicted in FIG. 1 fails to explicitly include one crucial component, required for a successful knowledge discovery process in a business context-domain knowledge. “Domain knowledge” includes knowledge about a business, business processes, and “common sense” knowledge pertaining to a problem being investigated. Domain knowledge comes into play in various tasks of an iterative knowledge discovery process, including circumscribing the business problem, selecting and using appropriate data, and evaluating models (or patterns) generated by data mining algorithms.
In the prior art, techniques for qualitative data mining include exploratory data analysis, data visualization, data base reports and on-line analytical processing (OLAP). Exploratory data analysis uses a variety of statistical techniques to explore raw data. “Exploratory Data Analysis,” by J. W. Tukey, Addition Wesley (1977), incorporated herein by reference, describes several of these techniques. Data visualization techniques display data in graphs, charts, and other visual constructs. Database reports and OLAP usually provide canned views of data for specific applications and to highlight particular pieces of information. “OLAP and Data Warehousing,” Workshop Notes from the Third International Conference on Knowledge Discovery and Data Mining, by S. Chaudhuri and U. Dayal (1997), incorporated herein by reference, provides an overview of techniques and references to OLAP literature. “Visual Techniques for Exploring Databases,” from the Third International Conference on Knowledge Discovery and Data Mining, by D. Keim (1997), incorporated herein by reference, provides an overview of techniques and references to visualization literature. All of these techniques, because they operate only on raw data, can be time consuming and cumbersome, are usually tailored to specific tasks, and place the entire burden of interpretation and understanding on the user. Further, they fail to provide a systematic way of representing, and taking into account, a particular business problem and context. Nor do they explicitly incorporate domain knowledge into the knowledge discovery process.
Thus, given the need for incorporating domain knowledge in a knowledge discovery endeavor, there exists a need for an interactive tool that supports qualitative data mining, and that will guide a user towards understanding and gaining insight and subsequent knowledge from data and promote an understanding of market dynamics from a specific business perspective, in a quick and efficient manner.