1. Field of Invention
This invention relates to the field of machine learning and information retrieval. More particularly, the present invention relates to the problem of communicating accumulated state information between tasks in a supervised learning system.
2. Description of Related Art
Supervised learning is a well known technique for producing predictors. A supervised learner inputs a set of training instances and outputs a predictor. A training instance includes a feature vector and a target value. The feature vectors represent what is known about the training instance while the target values represent an output desired from the predictor given the feature vector as input. The feature vectors and target values can be single data items or complex data structures.
A predictor is a rule that the applier uses to produce a prediction from a feature vector Most examples of predictors are mathematical functions, for example, linear regression models, boolean functions and neural networks. However, a predictor can also simply be a stored set of training instances, as when the applier performs k-nearest-neighbor classification.
For a given set of training instances a supervised learner creates a predictor. The predictor is then used by an applier. An applier takes as inputs, a predictor and a feature vector and produces a prediction. This process is referred to as applying the predictor. The prediction can be a single data item or a complex data structure. An effective supervised learner creates predictors that, when applied to feature vectors similar to those seen in the training instances, produce predictions similar to the corresponding target values seen in the training instances.
In some instances, a portion of the training instances become available before other training instances, and it may be desirable to learn and apply predictors before all training instances become available. In this case it can be desirable to implement the supervised learner as an incremental supervised learner. An incremental supervised learner when initialized with a set of training instances will produce a predictor for each learning task. If later given new training instances, it will produce a new predictor for each learning task, taking into account all previously received training instances and the new training instances.
To accomplish this, an incremental supervised learner must retain a state representation which summarizes necessary information about previously received training instances. When presented with new training instances, the incremental supervised learner uses both the summary information about past training instances, plus the new training instances, in producing both a new predictor for each learning task and a new state representation.
Incremental supervised learners use a variety of techniques to store state representation information. Some incremental supervised learners use a state representation which is simply a copy of all previously received training instances. Alternatively, an incremental supervised learner may use a state representation that attempts to identify and save only the most important training examples. Still other incremental supervised learners may use a state representation that includes other summary information which may be more compact or efficient For example, a group of incremental supervised learners known as online learners can use the set of predictors themselves as the state representation.
A supervised learner might be used, for example, to produce predictors to assign subject categories to news wire articles. A typical approach treats each category as a separate learning task. There would be two possible target values for each learning task: 1) True, indicating that the category should be assigned to the document, and 2) False, indicating that the category should not be assigned to the news wire article. Similarly, the predictor trained for each task might have two possible predictions: 1) True, encoding a prediction that the category should be assigned to the news wire article, and 2) False, encoding a prediction that the category should not be assigned to the news wire article.
To accomplish the training, a person can read selected news wire articles and manually assign them to categories. The text of those news wire articles can be encoded as a feature vector appropriate for the supervised learner, and the human category decisions would be encoded as a target vector. The supervised learner would receive training data consisting of the appropriate feature vectors and target vectors and produce one predictor for each category. Those predictors could subsequently be used to assign categories to future news wire articles.
If the supervised learner were an incremental supervised learner, the person could read additional news wire articles at a later point in time and provide new training instances to the incremental supervised learner. The incremental supervised learner could produce new predictors, generally with an improved ability to assign categories.
A difficulty arises for the incremental supervised learners if the new training instances include target values for new learning tasks. In the above example, suppose that the person creates a new category to cover news wire articles about a new topic (e.g., xe2x80x9cKosovo War Storiesxe2x80x9d). In this example, the incremental supervised learner would receive a training instance containing a target value for a learning task that it has not been told to produce predictors for, and would fail to produce a predictor for this new task.
To date, several solutions have been proposed for this problem. One proposed solution is that when the incremental supervised learner is notified of a new learning task, the learner modifies its state representation to include this new task and record the fact that zero previous training instances have been seen for the new task. The learning of the predictor for the new task then begins with the first training instance for which a target value was explicitly encoded for the new learning task. This technique has the disadvantage that the supervised learner is not able to make use of the large collection of previously received training examples, which can usually be assumed to have had default target values for the new task.
Another proposed technique uses an incremental supervised learner whose state representation explicitly contains all previously seen training instances. When the incremental supervised learner is informed of the new learning task, it modifies its state representation to reflect the assumption that the previously received training instances had the default target value for the new training task. In this fashion, both previous received training instances and new training instances can be used in producing a predictor for the new learning tasks.
The problem with this second technique is that it requires altering the state representation used by the incremental learner, requiring additional complexity in the learning software. Furthermore, explicitly saving all the previous training examples as required by this technique may be a less efficient or less effective state representation than the state representation that might otherwise be used by the incremental learner.
The present invention provides a method and apparatus for adding new learning, tasks to an incremental supervised learner. The present invention provides a flexible incremental representation of all training examples encountered, thereby permitting state representations for new learning tasks to take advantage of incremental training already completed by encoding all past training examples as negative examples for a hypothetical learning task. The state representation of the hypothetical learning task may then be copied as the initial state representation for a new learning task to be initiated. The new learning task would then be initialized with negative training examples of all previously presented training examples permitting the learning task to incorporate the previous examples efficiently. This method and apparatus reduces software complexity and facilitates decomposition of machine learning tasks through increased sharing of training instance information across software components.