Data classification techniques, often referred to as supervised learning, attempt to find an approximation or hypothesis to a target concept that assigns objects (such as processes or events) into different categories or classes. Data classification can normally be divided into two phases, namely, a learning phase and a testing phase. The learning phase applies a learning algorithm to training data. The training data is typically comprised of descriptions of objects (a set of feature variables) together with the correct classification for each object (the class variable).
The goal of the learning phase is to find correlations between object descriptions to learn how to classify the objects. The training data is used to construct models in which the class variable may be predicted in a record in which the feature variables are known but the class variable is unknown. Thus, the end result of the learning phase is a model or hypothesis (e.g., a set of rules) that can be used to predict the class of new objects. The testing phase uses the model derived in the training phase to predict the class of testing objects. The classifications made by the model is compared to the true object classes to estimate the accuracy of the model.
Numerous techniques are known for deriving the relationship between the feature variables and the class variables, including, for example, Disjunctive Normal Form (DNF) Rules, decision trees, nearest neighbor, support vector machines (SVMs) and Bayesian classifiers, as described, for example, in R. Agrawal et al., “An Interval Classifier for Database Mining Applications,” Proc. of the 18th VLDB Conference, Vancouver, British Columbia, Canada 1992; C. Apte et al., “RAMP: Rules Abstraction for Modeling and Prediction,” IBM Research Report RC 20271, June 1995; J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, Volume 1, Number 1, 1986; J. Shafer et al., “SPRINT: A Scaleable Parallel Classifier for Data Mining,” Proc. of the 22d VLDB Conference, Bombay, India, 1996; M. Mehta et al., “SLIQ: A Fast Scaleable Classifier for Data Mining,” Proceedings of the Fifth International Conference on Extending Database Technology, Avignon, France, March, 1996, each incorporated by reference herein.
Data classifiers have a number of applications that automate the labeling of unknown objects. For example, astronomers are interested in automated ways to classify objects within the millions of existing images mapping the universe (e.g., differentiate stars from galaxies). Learning algorithms have been trained to recognize these objects in the training phase, and used to predict new objects in astronomical images. This automated classification process obviates manual labeling of thousands of currently available astronomical images.
While data classification has been the subject of much study and data classifiers are a useful tool in real-world applications, the reasons why one algorithm may be more successful than others in giving good approximations to a target concept nevertheless remain elusive. When many algorithms are available for a specific classification task, it is hard to determine which algorithm will produce the best model for analysis. Thus, the area of model selection remains a difficult problem.
One approach to model selection seeks to trace a link between a learning algorithm and the domains on which this algorithm outperforms all other competitors. For a discussion of such approaches, see, for example, D. Michie, Machine Learning, Neural and Statistical Classification. Ellis Horwood 1994; S. M. Weiss, & C. A. Kulikowski, Computer Systems That Learn, Morgan Kaufmann Publishers, Inc. San Mateo, Calif. 1990 incorporated by reference herein. Other approaches include applying meta-learning techniques to find rules identifying the domains on which an algorithm attains high accuracy, and selecting the best of a set of classifiers as the best approximation to the target concept. Generally, such meta-learning techniques represent each domain as a vector of meta-features and attaches to each vector the best algorithm for that domain. See, for example, P. Brazdil et al., “Characterizing the Applicability of Classification Algorithms Using Meta Level Learning,” European Conference on Machine Learning (ECML-94), 83-102 (1994), incorporated by reference herein. Generally, meta-learning techniques evaluate the best learning algorithm that has been identified for each domain and attempt to identify the best learning algorithm for a new domain. An alternate approach to model selection uses a voting scheme that classifies an object using various learning algorithms and classifies an object based on the class receiving the most votes among the various models. See, for example, L. Breiman, “Bagging Predictors”, Machine Learning, 123-140 (1996).
While conventional model selection techniques perform satisfactorily for many applications, they suffer from a number of limitations, which if overcome, could greatly expand the utility and accuracy of data classifiers. Conventional model selection techniques, however, typically fail to recognize that a richer characterization of domains is necessary to produce theories that can explain accuracy performance. Such current model selection techniques are limited to (i) straightforward parameters, such as the number of examples, number of classes or number of features; (ii) statistical estimations, such as feature correlation or homogeneity of covariances; and (iii) information-theoretic measures, such as class entropy.
A need therefore exists for a model selection technique that considers additional meta-learning measures to make a distinction between concepts denoting regular patterns over the instance space that commonly lead to the discovery of concise representations, and concepts characterized by many irregularities which often lead to long representations. A proper characterization of concept complexity can explain accuracy performance by identifying the degree of match between the concept and the learning bias of the algorithm under analysis. A further need exists for a model selection technique that characterizes domains and identifies the degree of match between the domain meta-features and the learning bias of the algorithm under analysis. Yet another need exists for a model selection technique that accounts for features (e.g., meta-features) of a given domain.