Machine learning seeks to permit computers to analyze known examples that illustrate the relationship between outputs and observed variables. One such approach is known as Probably Approximately Correct learning or “PAC” learning. PAC learning involves having a machine learner receive examples of things to be classified together with labels describing each example. Such examples are sometimes referred to as labeled examples. The machine learner generates a prediction rule or “classifier” (sometimes referred to as a “hypothesis”) based on observed features within the examples. The classifier is then used to classify future unknown data with an acceptable rate of error. For example, one application of machine learning is filtering spam from legitimate email. The labeled examples might include large number of emails, both spam and non-spam. Each email contains one or more features in the form of the occurrence or non-occurrence of certain words and phrases such as “Buy Now!” Each instance of data is given a label such as “spam” or “non-spam.” The goal of machine learning is to process the labeled examples to generate classifiers that will correctly classify future examples as spam or non-spam, at least within an acceptable error rate.
A boosting algorithm is one approach for using a machine to generate a classifier. Various boosting algorithms are known, for example, MadaBoost and AdaBoost. Boosting algorithms in some cases involve repeatedly calling a weak learner algorithm to process a subset of labeled examples. These subsets are drawn from the larger set of labeled examples using probability distributions that can vary each time the weak learner is called. With each iteration, the weak learner algorithm generates a crude or weak classifier that is not especially accurate. The boosting algorithm combines the weak classifiers generated by the weak algorithm. The combination of weak classifiers constitutes a single prediction rule that should be more accurate than any one of the weak classifiers generated by the weak learner algorithm.