1. Field of the Invention
The present invention relates generally to machine learning, data mining, and data visualization.
2. Related Art
Many data mining tasks require classification of data into classes. Typically, a classifier classifies data into. classes. The classifier provides a function that maps (classifies) a data item (instance) into one of several predefined classes (labels). More specifically, the classifier predicts one attribute of a set of data given one or more other attributes. For example, in a database of iris flowers, a classifier can be built to predict the type of iris (iris-setosa, iris-versicolor or iris-virginica) given the petal length, petal width, sepal length and sepal width. The attribute being predicted (in this case,: the type of iris) is called the label, and the attributes used for prediction are called the descriptive attributes.
A classifier is generally constructed by an inducer. The inducer is an algorithm that builds the classifier from a training set. The training set consists of records with labels. The training set is used by the inducer to xe2x80x9clearnxe2x80x9d how to construct the classifier as shown in FIG. 1. Once the classifier is built, it can be used to classify unlabeled records as shown in FIG. 2.
Inducers require a training set, which is a database table containing attributes, one of which is designed as the class label. The label attribute type must be discrete (e.g., binned values, character string values, or few integers). FIG. 3 shows several records from a sample training set pertaining to an iris database. The iris database was originally used in Fisher, R. A., xe2x80x9cThe use of multiple measurements in taxonomic problems,xe2x80x9d in Annals of Eugenics 7(1):179-188, (1936). It is a classical problem in many statistical texts.
Once a classifier is built, it can classify new unlabeled records as belonging to one of the classes. These new records must be in a table that has the same attributes as the training set; however, the table need not contain the label attribute. For example, if a classifier for predicting iris_type is built, the classifier can be applied to records containing only the descriptive attributes, and a new column is added with the predicted iris type. See, e.g., the general and easy-to-read introduction to machine learning, Weiss, S. M., and Kulikowski, C. A., Computer Systems that Learn, San Mateo, Calif., Morgan Kaufmann Publishers, Inc. (1991), and the edited volume of machine learning techniques, Dietterich, T. G. and Shavlik, J. W. (eds.), Readings in Machine Learning, Morgan Kaufmann Publishers, Inc., 1990 (both of which are incorporated herein by reference).
A well known type of classifier is an Evidence classifier, also called a Bayes classifier or a Naive-Bayes classifier. The Evidence classifier uses Bayes rule, or equivalents thereof, to compute the probability of each class given an instance. Under the Bayes rule, attributes are assumed to be conditionally independent by the Evidence classifier in determining a label. This conditional independence can be assumed to be a complete conditional independence as in a Naive-Bayes classifier or Simple Bayes classifier. Alternatively, the complete conditional independence assumption can be relaxed to optimize classifier accuracy or further other design criteria.
For more information on classifiers, see the following documents, each of which is incorporated by reference in its entirety herein: Kononenko, I., Applied Artificial Intelligence 7:317-337 (1993) (an introduction to the evidence classifier (Naive-Bayes)); Schaffer, C., xe2x80x9cA Conservation Law for Generalization Performance,xe2x80x9d in Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann Publishers, Inc., pp. 259-265 (1994) (a paper explaining that no classifier can be xe2x80x9cbestxe2x80x9d); Taylor, C., et al., Machine Learning, Neural and Statistical Classification, Paramount Publishing International (1994) (a comparison of algorithms and descriptions); Langley et al, xe2x80x9cAn Analysis of Bayesian Classifiers,xe2x80x9d Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 223-228 (1992) (a paper describing an evidence classifier (Naive-Bayes)); Good, I. J., The Estimation of Probabilities: An Essay on Modern Bayesian Methods, MIT Press (1965) (describing an evidence classifier), and Duda, R. and Hart, P., Pattern Classification and Scene Analysis, Wiley (1973) (describing the evidence classifier); and Domingos, P. and Pazzani, M.,_xe2x80x9cBeyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,xe2x80x9d Machine Learning, Proceedings of the 13th International Conference (ICML ""96), pp. 105-112 (1996) (showing that, while the conditional independence assumption can be violated, the classification accuracy of the evidence classifier (called Simple Bayes in this paper) can be good).
Data mining applications and end-users now need to know how an evidence classifier maps each record to a label. Understanding how an evidence classifier works can lead to an even greater understanding of data. Current classifier visualizers are directed to other types of classifiers, such as, decision-tree classifiers. See, e.g., the ATandT product called Dotty that displays a decision-tree classifier in a 2-D ASCII text display. For an introduction to decision tree induction see Quinlan, J. R., C4.5: Programs for Machine Learning, Los Altos, Calif., Morgan Kaufmann Publishers, Inc. (1993); and the book on decision trees from a statistical perspective by Breiman et al., Classification and Regression Trees, Wadsworth International Group (1984).
What is needed is an evidence classifier visualizer.
An evidence classifier visualization tool is needed to display information representative of the structure of an evidence classifier including information pertaining to how an evidence classifier predicts a label for each unlabeled record.
The present invention provides a computer-implemented method, system, and computer program product for visualizing the structure of an evidence classifier. An evidence classifier visualization tool is provided that displays information representative of the structure of an evidence classifier. The evidence classifier visualization tool displays information pertaining to how an evidence classifier assigns labels to unlabeled records.
An evidence inducer generates an evidence classifier based on a training set of labeled records. Each record in the training set has one or more attribute values and a corresponding class label. Once the evidence classifier is built, the evidence classifier can assign class labels to unlabeled records based on attribute values found in the unlabeled records.
According to the present invention, the evidence inducer includes a mapping module that generates visualization data files used for visualizing the structure of the evidence classifier generated by the evidence inducer. In the present invention, an evidence visualization tool uses the visualization data files to display an evidence pane and/or a label probability pane. The evidence pane includes two different representations: a first evidence pane display view and a second evidence pane display view. The first evidence pane display view shows a normalized conditional probability of each label value for each attribute value. The second evidence pane display view shows relative conditional probabilities of a selected label value for each attribute value.
The label probability pane includes a first label probability pane display view and/or a second label probability pane display view. The first label probability pane display view shows prior probabilities of each label value based on the training set. The second label probability pane display view shows posterior probabilities of each label value based on at least one selected attribute value.
According to one embodiment, the first evidence pane display view comprises a plurality of rows of charts. Each row corresponds to a respective attribute. Each row has a number of charts, each chart in a row corresponding to a respective discrete attribute value. Each discrete attribute value can be a numeric or categoric attribute value or range of values (e.g., a bin). Each chart shows a normalized conditional probability of each label value for said respective attribute value.
In one preferred example, the first evidence pane display view includes a plurality of rows of pie charts. Each pie slice in a pie chart has a size which is a function of the normalized conditional probability of each label value for the respective attribute value. The evidence inducer calculates the normalized conditional probability of each label value (L) for the respective attribute value (A) according to the following conditional probability P(A|L), normalized by dividing by a sum for all label values, xcexa3P(A|L), where P is the conditional probability that a random record chosen only from records with label L takes the attribute value A; the conditional probability P being determined based on record counts made with respect to the training set. A mapping module then maps each calculated normalized conditional probability to a respective pie slice.
Each pie slice further has a pie slice graphical attribute, such as color, representative of a label. Each pie chart also has a pie chart graphical attribute that is a function of the number of records in the training set associated with the evidence classifier. In one example, the pie chart graphical attribute is height. For each pie chart, the mapping module maps a height that is a function of the number of records in the training set associated with the evidence classifier. In this way, a user can view heights of pie charts to determine the reliability of a classification.
The first label probability pane view comprises a chart that shows the prior probability for each label value. According to one preferred embodiment, the chart is a pie chart. Pie slices in the pie chart have sizes which are a function of the respective prior probabilities of label values. A prior probability of a label value is the proportion of records having the label value in the original data (training set). The evidence inducer calculates the prior probability for each label value by counting the number of records with a class label, counting the total number of records, and dividing the number of records with a class label count by the total number of records. The mapping module maps the calculated prior probabilities for each class label to define the sizes of respective pie slices.
The second evidence pane display view comprises a plurality of rows of bars. Each row corresponds to a respective attribute and each row has a number of bars. Each bar in a row corresponds to a respective discrete attribute value. Each bar further has a height that is a function of a conditional probability of a respective attribute value conditioned on a selected label value.
In one display mode, each bar height represents evidence for said selected label value. The evidence inducer calculates evidence for value, z, for each bar height based on a negative log of the quantity one minus the size of the slice matching said selected label in a corresponding pie chart in the evidence pane.
The mapping module maps the calculated evidence for values, z, to respective bar heights. In a second display mode, the evidence inducer calculates evidence against values, zxe2x80x2, for each bar height based on a negative log of the size of the slice matching said selected label in a corresponding pie chart in the evidence pane. The mapping module maps the calculated evidence against, zxe2x80x2, values to respective bar heights.
The second label probability pane display view comprises a chart that shows posterior probabilities of each label value based on at least one selected attribute value. In one preferred embodiment, the chart comprises a pie chart. Pie slices in the pie chart have sizes which are a function of the respective posterior probabilities of each label value based on at least one selected attribute value. Each pie slice in a pie chart has a size which is a function of the posterior probability of each label value for a respective attribute value. The evidence inducer calculates posterior probabilities of each label value based on at least one selected attribute value by multiplying probabilities of all attribute values with the prior probabilities of each label value. The mapping module maps the calculated posterior probabilities to sizes of pie slices.
According to one embodiment, the sum of the heights of pie charts or bars in each row in the evidence pane is constant. The distribution of pie chart or bar heights in each row represents a histogram showing the way records are distributed over the attribute values for each attribute. In a further feature, the evidence inducer provides binning to divide a continuous attribute into discrete bins having binning intervals such that class distributions in each bin are as different as possible.
According to a further feature of the present invention, an importance slider is displayed that permits a user to control the filtering of attributes based on the importance of the attributes to a classification of unlabeled records.
According to another feature of the present invention, a count slider is displayed that permits a user to set a low count threshold for filtering out attribute values having low counts. In this way, an evidence pane need not include charts or bars corresponding to attribute values having low counts less than the low count threshold set by the count slider.
In another feature of the present invention, the evidence classifier visualization tool allows a user to control sorting of attributes and attribute values by the evidence inducer. For example, a user can select to sort attributes alphabetically by name, by importance, or in the original order the attributes appear in a database. A user can select to sort attribute values within a row by alphabetic order for categorical attribute values, by confidence of an attribute value (e.g., record count), and by the conditional probability of an attribute value given a specific label value (e.g., the pie slice size for a selected label value).
According to a further feature of the present invention, a subtracting minimum evidence capability is provided. In this subtracting, for each attribute value, the evidence inducer determines a minimum height value representing an approximate minimum height over all label values. The evidence inducer then subtracts the determined minimum height value across all label values. In this way, small differences are magnified in bar heights in the second evidence pane display views for different selected label values.
According to the present invention, a user can select at least one pie chart corresponding to a selected attribute value. The second label probability pane display view is displayed that shows posterior probabilities of each label value based on at least one selected attribute value; thereby, imitating the evidence classifier behavior. The posterior probability is the expected distribution of label values given the combination of selected attribute value(s).