1. Field of the Invention
The present invention relates generally to machine learning, data mining, and data structure visualization.
2. Related Art
In supervised classification learning, an induction algorithm (or inducer) produces a model from a labeled set of training data. For example, many data mining tasks require classification of data into classes. Typically, a classifier classifies data into classes. The classifier provides a function that maps (classifies) a data item (instance) into one of several predefined classes (labels). The resulting classification model is capable of predicting labels for a new set of unlabeled records that have the same attributes as the training data. The attribute being predicted is called the label, and the attributes used for prediction are called the descriptive attributes. The word attribute is also used here to refer to a column in a relational data set. Thus, given a data set of labeled instances, supervised machine learning algorithms seek a hypothesis (i.e., the model) that will correctly predict the class of future unlabeled instances.
A classifier is generally constructed by an inducer. The inducer is an algorithm that builds the classifier from a training set. The training set consists of records with labels. The training set is used by the inducer to "learn" how to construct the classifier as shown in FIG. 1. Once the classifier is built, it can be used to classify unlabeled records as shown in FIG. 2.
Inducers require a training set, which is a data sat containing attributes, one of which is designed as the class label. The label attribute type must be discrete (e.g., binned values, character string values, or a small set of integers). FIG. 3 shows several records from a sample training set pertaining to an iris data set. The iris data set was originally used in Fisher, R. A., "The use of multiple measurements in taxonomic problems," in Annals of Eugenics 7(1):179-188, (1936). It is a classical problem in many statistical texts.
For example, if a classifier for predicting iris_type is built, the classifier can be applied to records containing only the descriptive attributes, and a new column is added with the predicted iris type. See, e.g., the general and easy-to-read introduction to machine learning, Weiss, S. M., and Kulikowski, C. A., Computer Systems that Learn, San Mateo, Calif., Morgan Kaufmann Publishers, Inc. (1991), and the edited volume of machine learning techniques, Dietterich, T. G. and Shavlik, J. W. (eds.), Readings in Machine Learning, Morgan Kaufmann Publishers, Inc., 1990 (both of which are incorporated herein by reference).
There are several well known types of classifiers, including evidence, neural network, decision tree, and decision table classifiers. An Evidence classifier is also called a Bayes or Bayesian classifier or a Naive-Bayes classifier. The Evidence classifier uses Bayes rule, or equivalents thereof, to compute the probability a given instance belongs to a class. Under the Bayes rule, attributes are assumed to be conditionally independent by the Evidence classifier in determining a label. This conditional independence can be assumed to be a complete conditional independence as in a Naive-Bayes. Alternatively, the complete conditional independence assumption can be relaxed to optimize classifier accuracy or further other design criteria.
For an introduction to decision tree induction, see Quinlan, J. R., C4.5: Programs for Machine Learning, Los Altos, Calif., Morgan Kaufmann Publishers, Inc. (1993); and the book on decision trees from a statistical perspective by Breiman et al., Classification and Regression Trees, Wadsworth International Group (1984).
Decision Table classifiers are also known. The method of classification is similar to that of a decision tree. Classification is performed by choosing the majority class of the region in which an example is found. Decision Tables are useful in that they present an inherent hierarchy of levels. See Kohavi, R., "The Power of Decision Tables", Proceedings of the 8.sup.th European Conference on Machine Learning, Lavrac et al. (Eds.), Springer Verlag, Berlin, Heidelberg, N.Y., pages 174-189 (1995) (incorporated by reference herein).
In practice, many such classifier models are applied as a black box. It is not necessary to understand the model in order to use it effectively for prediction. Recently, there has been a realization that great insight can be gained by visualizing the structure of a data mining model. Decision trees were one of the earliest classification models to be represented graphically because of their easy to understand structure. See Quinlan, J., C4.5: Programs For Machine Learning, Morgan Kaufmann Publishers, Inc. (1993). Neural networks have an arcane structure which is difficult to visualize.
The ability to describe the structure of a classifier in a way that people can easily understand, transforms classifiers from incomprehensible black boxes to tools for knowledge discovery. Classification without an explanation reduces the trust a user has in the system. Some researchers have found that physicians would reject a system that gave insufficient explanation even if the model had good accuracy. See Spiegelhalter and Knill-Jones, J. Royal Siatltistical Soc. A 147:35-37 (1984). A human may decide not to use a classifier if he or she realizes that it is based on irrelevant attributes, bad data, or if important factors are being ignored. Current classifier visualizers are directed to other types of classifiers, such as, decision-tree classifiers and evidence classifiers. See co-pending and commonly assigned U.S. application Ser. Nos. 08/813,336 and 08/841,341 (referenced above).
Data mining applications and end-users now need to know how a decision table classifier maps each record to a label. Understanding how a decision table classifier works can lead to an even greater understanding of data. There have been some initial attempts to display Decision Tables in the form of a General Logic Diagram (GIRD). See J. Winek et al., Machine Learning 4(2):139-168 (1994). In this format each cell in the Decision Table has a single color to indicate the predicted class.
Several methods of visualizing multi-dimensional data sets that use several variables to form a hierarchy have been proposed. Some have proposed dimensional stacking which shows a single colored cell at the lowest level of the hierarchy. See LeBlanc, J., et al., "Exploring N-Dimensional Databases", Proceedings of First IEEE Conference on Visualization (Visualization '90), pages 230-237 (1990) (incorporated by reference herein). Trellis displays have expanded this concept by generalizing the representation of the cell at the lowest level to be a plot of any type, such as a scatterplot, surface plot, or line graph. See Becker, W. et al., "Trellis Graphics Displays: A Multi-Dimensional Data Visualization Tool For Data Mining," presented at KDD '97 Conference, Newport Beach, Calif. (August 1997) (incorporated by reference herein).
A deficiency common to these related methods of displaying multi-dimensional data is that only a single static scene is used to convey information. This limits the amount of information which can be shown in a conventional visualization. Visualizations of complex decision table classifiers with many attributes and many cells cannot be effectively done in a single static scene. For example, a single static scene is limited to two or three levels of detail before the display gets too large or the cells become too small to be useful.
Thus, the attempts to visualize decision table classifiers have resulted in confusing displays where the user has difficulty in seeing the information he or she desires. In addition, conventional attempts to visualize decision table classifiers do not provide navigation capability or drill down features. Finally, the attempts to visualize decision table classifiers do not provide user interaction with the visualized classifiers.
What is needed is a decision table classifier visualizer having interactive capabilities and more comprehensible and informative displays.