1. Field of the Invention
The present invention relates generally to machine learning, data mining, and data visualization.
2. Related Art
Many data mining tasks require classification of data into classes. Typically, a classifier classifies data into classes. The classifier provides a function that maps (classifies) a data item (instance) into one of several predefined classes (labels). More specifically, the classifier predicts one attribute of a set of data given one or more other attributes. For example, in a database of iris flowers, a classifier can be built to predict the type of iris (iris-setosa, iris-versicolor or iris-virginica) given the petal length, petal width, sepal length and sepal width. The attribute being predicted (in this case, the type of iris) is called the label, and the attributes used for prediction are called the descriptive attributes.
A classifier is generally constructed by an inducer. The inducer is an algorithm that builds the classifier from a training set. The training set consists of records with labels. The training set is used by the inducer to "learn" how to construct the classifier as shown in FIG. 1. Once the classifier is built, it can be used to classify unlabeled records as shown in FIG. 2.
Inducers require a training set, which is a database table containing attributes, one of which is designed as the class label. The label attribute type must be discrete (e.g., binned values, character string values, or few integers). FIG. 3 shows several records from a sample training set pertaining to an iris database. The iris database was originally used in Fisher, R. A., "The use of multiple measurements in taxonomic problems," in Annals of Eugenics 7(1):179-188, (1936). It is a classical problem in many statistical texts.
Once a classifier is built, it can classify new unlabeled records as belonging to one of the classes. These new records must be in a table that has the same attributes as the training set; however, the table need not contain the label attribute. For example, if a classifier for predicting iris.sub.-- type is built, the classifier can be applied to records containing only the descriptive attributes, and a new column is added with the predicted iris type. See, e.g., the general and easy-to-read introduction to machine learning, Weiss, S. M., and Kulikowski, C. A., Computer Systems that Learn, San Mateo, Calif., Morgan Kaufmann Publishers, Inc. (1991), and the edited volume of machine learning techniques, Dietterich, T. G. and Shavlik, J. W. (eds.), Readings in Machine Learning, Morgan Kaufmann Publishers, Inc., 1990 (both of which are incorporated herein by reference).
A well known type of classifier is an Evidence classifier, also called a Bayes classifier or a Naive-Bayes classifier. The Evidence classifier uses Bayes rule, or equivalents thereof, to compute the probability of each class given an instance. Under the Bayes rule, attributes are assumed to be conditionally independent by the Evidence classifier in determining a label. This conditional independence can be assumed to be a complete conditional independence as in a Naive-Bayes classifier or Simple Bayes classifier. Alternatively, the complete conditional independence assumption can be relaxed to optimize classifier accuracy or further other design criteria.
For more information on classifiers, see the following documents, each of which is incorporated by reference in its entirety herein: Kononenko, I., Applied Artificial Intelligence 7:317-337 (1993) (an introduction to the evidence classifier (Naive-Bayes)); Schaffer, C., "A Conservation Law for Generalization Performance," in Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann Publishers, Inc., pp. 259-265 (1994) (a paper explaining that no classifier can be "best"); Taylor, C., et al., Machine Learning, Neural and Statistical Classification, Paramount Publishing International (1994) (a comparison of algorithms and descriptions); Langley et al., "An Analysis of Bayesian Classifiers," Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223-228 (1992) (a paper describing an evidence classifier (Naive-Bayes)); Good, I. J., The Estimation of Probabilities: An Essay on Modern Bayesian Methods, MIT Press (1965) (describing an evidence classifier), and Duda, R. and Hart, P., Pattern Classification and Scene Analysis, Wiley (1973) (describing the evidence classifier); and Domingos, P. and Pazzani, M., "Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier," Machine Learning, Proceedings of the 13th International Conference (ICML '96), pp. 105-112 (1996) (showing that, while the conditional independence assumption can be violated, the classification accuracy of the evidence classifier (called Simple Bayes in this paper) can be good).
Data mining applications and end-users now need to know how an evidence classifier maps each record to a label. Understanding how an evidence classifier works can lead to an even greater understanding of data. Current classifier visualizers are directed to other types of classifiers, such as, decision-tree classifiers. See, e.g., the AT&T product called Dotty that displays a decision-tree classifier in a 2-D ASCII text display. For an introduction to decision tree induction see Quinlan, J. R., C4.5: Programs for Machine Learning, Los Altos, Calif., Morgan Kaufmann Publishers, Inc. (1993); and the book on decision trees from a statistical perspective by Breiman et al., Classification and Regression Trees, Wadsworth International Group (1984).
What is needed is an evidence classifier visualizer.