A great deal of attention has been given to automated machine-learning techniques. One area of study focuses on automated classification of input samples. For example, as the volume of digital data has exploded in recent years, there is significant demand for techniques to organize, sort and/or identify such data in a manner that allows it to be useful for a specified purpose.
Automated classification of digital information has application in a number of different practical situations, including image recognition (e.g., identifying which photographs from among thousands or millions in a database include a picture of a face or a picture of a particular face), text classification (e.g., determining whether a particular e-mail message is spam based on its textual content), and the like.
Various approaches to automated classification problems have been used. These approaches include supervised techniques, such as Support Vector Machine (SVM) and Naïve Bayes, in which a classifier is trained using a set of training samples for which labels have been assigned, typically by a human being who is an expert in the particular classification problem.
For this purpose, the training samples often are selected from the much larger group of samples to be classified. In some cases, the training samples are randomly selected. In others, the training samples are selected in a systematic manner according to pre-specified criteria. Active learning is one example of the latter approach.
Generally speaking, active-learning methods construct training sets iteratively, starting from a small initial set and then expanding that set incrementally by selecting examples deemed “most interesting” by the classifier at each iteration. The “most interesting” samples ordinarily are those that are closest to the decision boundary or where there otherwise is greater uncertainty as to whether the classification predicted by the classifier is correct.
However, the present inventors have identified certain shortcomings of conventional techniques for selecting training samples, such as active learning.