1. Field of the Invention
This invention relates to the use of machine learning to predict outcomes and to the validation of such predictions.
2. Description of the Related Art
Machine learning techniques are used to build a model or rule set to predict a result based on the values of a number of features. The machine learning involves use of a data set that typically includes, for each record, a value for each of a set of features, and a result. From this data set, a model or rule set for predicting the result is developed.
These machine learning techniques generally build on statistical underpinnings. Statistical approaches test a proposed model against a set of data. Machine learning techniques search through a space of possible models, to find the best model to fit a given set of data.
Many existing machine learning systems use one type of machine learning strategy to solve a variety of types of problems. At least one system exists that uses a combination of machine learning strategies to derive a prediction method for solving a problem. As described in International Patent application number W097/44741, entitled “System and Method for Combining Multiple Learning Agents to Produce a Prediction Method,” published Nov. 27, 1997, and claiming priority from U.S. Ser. No. 60/018,191, filed May 23, 1996, an article entitled “Coevolution Learning: Synergistic Evolution of Learning Agents and Problem Representations, Proceedings of 1996 Multistrategy Learning Conference,” by Lawrence Hunter, pp. 85-94, Menlo Park, Calif.: AAAI Press, 1996 and an article entitled “Classification using Cultural Co-evolution and Genetic Programming Genetic Programming: Proc. of the First Annual Conference,” by Myriam Z, Abramson and Lawrence Hunter, pp. 249-254, The MIT Press, 1996 multiple learning strategies can be used to improve the ability of any one of those learning strategies to solve the problem of interest. A system incorporating some of these teachings is available as CoEv from the Public Health Service of the National Institutes of Health (NIH). The foregoing patent application and articles are incorporated herein by reference.
In a “co-evolution” system such as described above, an initial set of learning agents or learners is created, possibly using more than one machine learning strategy or method. Examples of machine learning methods include the use of genetic algorithms, neural networks, and decision trees. Each of the learning agents is then trained on a set of training data, which provides values from a set of features, and provides predictions for a rule for solving the problem. The predictions are then evaluated against a fitness function using RELIEF, which may be based on the overall accuracy of the results and/or the time required to obtain those results.
A set of results is obtained and the feature combinations used by the learning agents are extracted. The data is then transformed to reflect these combinations, thereby creating new features that are combinations of the pre-existing features.
In addition, a new generation of learning agents is created. Parameter values from the learning agents are copied and varied for the new generation, using (for example) a genetic algorithm approach.
Then, the process is repeated, with the new learning agents and representations of features, until sufficiently satisfactory results are obtained, or a set number of cycles or a set amount of time has been completed.
This system can provide improved results over systems using a single machine learning method. However, it still has significant limitations when attempting to apply it to real-world problems. For example, a fitness function based on overall accuracy is not suitable for all problems. Moreover, the method and the results are not easily used with many problems.