Machine learning seeks to permit computers to analyze known examples that illustrate the relationship between outputs and observed variables. One such approach is known as Probably Approximately Correct learning or “PAC” learning. PAC learning involves having a machine learner receive examples of things to be classified together with labels describing each example. Such examples are sometimes referred to as labeled examples. The machine learner generates a prediction rule or “classifier” (sometimes referred to as a “hypothesis”) based on observed features within the examples. The classifier is then used to classify future unknown data. One illustrative application of machine learning is speech recognition. When machine learning is applied to this application, the labeled examples might include a large number of sound samples. Each sound sample is drawn from a human speaker and contains one or more features. The features might be attributes of the signal (or in some cases a transform of the signal) such as amplitude for example. Each sample is given a label by human reviewers, such as a specific phoneme or word heard by the reviewer when the sound is played. Thus, if a sample were that of a person uttering the word “cat” then the label assigned to the sample would be “cat.” The goal of machine learning is to process the labeled examples to generate classifiers that will correctly classify future examples as the proper phoneme or word within an acceptable error level.
A boosting algorithm is one approach for using a machine to generate a classifier. Various boosting algorithms are known including MadaBoost and AdaBoost. Boosting algorithms in some cases involve repeatedly calling a weak learner algorithm to process subsets of labeled examples. These subsets are drawn from the larger set of labeled examples using probability distributions that can vary each time the weak learner is called. With each iteration, the weak learner algorithm generates a classifier that is not especially accurate. The boosting algorithm combines the classifiers generated by the weak algorithm. The combination of classifiers constitutes a single prediction rule that should be more accurate than any one of the classifiers generated by the weak learner algorithm.