Machine learning generally includes the construction or generation of machine learning algorithms that can learn from data. These algorithms are used to build a model based on features that is used to generate a classifier tuned to a particular purpose. Active machine learning is a discipline where a “teacher,” such as a user, presents training examples to train a function to the model.
Historically, whether training examples were labeled or unlabeled has been based on the particular purpose. For example, in existing systems training examples for implementing a classifier tuned to classify documents about the subject of baseball typically include examples of documents labeled as relating to baseball and examples of documents labeled as not relating to baseball.
Other existing training examples were unlabeled. For example, unlabeled examples might or might not have been related to baseball. Accordingly, a third party such as the teacher must label existing unlabeled training examples so that the model has valuable input by which to learn an associated function.
In particular, active learning necessitates relatively high-quality labeled training examples such that the model can adequately learn the desired function for future classification of any number of unlabeled input documents. However, the discovery of high-quality labeled training examples amongst the virtually unlimited number of unlabeled documents available to the machine learning algorithm is typically costly. For example, many users are employed to interpret unlabeled documents to determine viability for machine learning purposes. However, if a particular model being trained by existing machine learning algorithms needs to be limited, the viability of each potential candidate for a labeled training example must be even more carefully considered, and costs can exceed desired targets.