Data classification techniques, often referred to as supervised learning, attempt to find an approximation or hypothesis to a target concept that assigns objects (such as processes or events) into different categories or classes. Data classification can normally be divided into two phases, namely, a learning phase and a testing phase. The learning phase applies a learning algorithm to training data. The training data is typically comprised of descriptions of objects (a set of feature variables) together with the correct classification for each object (the class variable).
The goal of the learning phase is to find correlations between object descriptions to learn how to classify the objects. The training data is used to construct models in which the class variable may be predicted in a record in which the feature variables are known but the class variable is unknown. Thus, the end result of the learning phase is a model or hypothesis (e.g., a set of rules) that can be used to predict the class of new objects. The testing phase uses the model derived in the training phase to predict the class of testing objects. The classifications made by the model is compared to the true object classes to estimate the accuracy of the model.
Numerous techniques are known for deriving the relationship between the feature variables and the class variables, including, for example, Disjunctive Normal Form (DNF) Rules, decision trees, nearest neighbor, support vector machines (SVMs) and Bayesian classifiers, as described, for example, in R. Agrawal et al., “An Interval Classifier for Database Mining Applications,” Proc. of the 18th VLDB Conference, Vancouver, British Columbia, Canada 1992; C. Apte et al., “RAMP: Rules Abstraction for Modeling and Prediction,” IBM Research Report RC 20271, June 1995; J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, Volume 1, Number 1, 1986; J. Shafer et al., “SPRINT: A Scaleable Parallel Classifier for Data Mining,” Proc. of the 22d VLDB Conference, Bombay, India, 1996; M. Mehta et al., “SLIQ: A Fast Scaleable Classifier for Data Mining,” Proceedings of the Fifth International Conference on Extending Database Technology, Avignon, France, March, 1996, each incorporated by reference herein.
Data classifiers have a number of applications that automate the labeling of unknown objects. For example, astronomers are interested in automated ways to classify objects within the millions of existing images mapping the universe (e.g., differentiate stars from galaxies). Learning algorithms have been trained to recognize these objects in the training phase, and used to predict new objects in astronomical images. This automated classification process obviates manual labeling of thousands of currently available astronomical images.
While such learning algorithms derive the relationship between the feature variables and the class variables, they generally produce the same output model given the same domain dataset. Generally, a learning algorithm encodes certain assumptions about the nature of the concept to learn, referred to as the bias of the learning algorithm. If the assumptions are wrong, however, then the learning algorithm will not provide a good approximation of the target concept and the output model will exhibit low accuracy. Most research in the area of data classification has focused on producing increasingly more accurate models, which is impossible to attain on a universal basis over all possible domains. It is now well understood that increasing the quality of the output model on a certain group of domains will cause a decrease of quality on other groups of domains. See, for example, C. Schaffer, “A Conservation Law for Generalization Performance,” Proc. of the Eleventh Int'l Conference on Machine Learning, 259-65, San Francisco, Morgan Kaufman (1994); and D. Wolpert, “The Lack of a Priori Distinctions Between Learning Algorithms and the Existence of a Priori Distinctions Between Learning Algorithms,” Neural Computation, 8 (1996), each incorporated by reference herein.
While conventional learning algorithms produce sufficiently accurate models for many applications, they suffer from a number of limitations, which, if overcome, could greatly improve the performance of the data classification and regression systems that employ such models. Specifically, the learning algorithms of conventional data classification and regression systems are unable to adapt over time. In other words, once a model is generated by a learning algorithm, the model cannot be reconfigured based on experience. Thus, the conventional data classification and regression systems that employ such models are prone to repeating the same errors.
A need therefore exists for data classification and regression methods and apparatus that adapt a learning algorithm through experience. Another need exists for data classification and regression methods and apparatus that dynamically modify the assumptions of the learning algorithm to improve the assumptions embodied in the generated models and thereby improve the quality of the data classification and regression systems that employ such models. Yet another need exists for a learning method and apparatus that performs meta-learning to improve the assumptions or inductive bias in a model.