1. Technical Field
This invention is directed towards a statistical learning procedure that can be applied to many machine-learning applications such as, for example, face detection, image retrieval, speech recognition, text classification, document routing, on-line learning and medical diagnosis. Although the statistical learning procedure of the present invention is described as applied to a face detection system, the process can be used for boosting the performance of classifiers in any type of classification problem.
2. Background Art
Boosting is an approach to machine-learning classification problems that has received much attention of late. Boosting algorithms have recently become popular because they are simple, elegant, powerful and easy to implement. Boosting procedures have been used in many different applications. For instance, Fan, Stolfo and Zhang [2] introduced boosting, namely a boosting algorithm called AdaBoost, into a distributed online learning application. Iyer, Lewis, Schapire, Singer and Singhil [8] applied boosting to document routing, employing a boosting procedure for classifying and ranking documents in the context of Information Retrieval (IR). Moreno, Logan and Raj [13] employed a boosting classification algorithm in the confidence scoring of data in speech recognition application. They derived feature vectors from speech recognition lattices and fed them into a boosting classifier. This classifier combined hundreds of very simple ‘weak learners’ and derived classification rules that reduced the confidence error rate by up to 34 percent. Schapire and Singer [23] used a family of boosting algorithms to perform text and speech categorization tasks. Sebastiani, Sperduti and Valdambrini [25] also applied boosting to text categorization. Tieu and Viola [30] applied boosting to image retrieval.
In most classification problems, feature vectors are composed and fed into one or more classifiers. There are usually just a few types of features used, such as, for example, color and oriented edges found in a training image. Boosting typically combines hundreds or thousands of very simple classifiers, called ‘weak learners’, by using a weighted sum. A classification procedure is iteratively applied to a set of weighted feature vectors. Each weak learner is called upon to solve a sequence of learning problems. At first each feature vector is assigned an equal weight (or a weight depending on its prior probability). At each iteration, a classifier is learned and the feature vectors that are classified incorrectly have their weights increased, while those that are correctly classified have their weights decreased. That is, in each subsequent problem examples are reweighted in order to emphasize those which were incorrectly classified by the previous weak classifier. Each classifier focuses its attention on those vectors on which the previous classifier fails. The concept is that feature vectors that are difficult to classify receive more attention on subsequent iterations.
The classifier learned at each iteration is called a “weak classifier”. A weak classifier is one that employs a simple learning algorithm (and hence a fewer number of features) and is not expected to classify the training data very well. Weak classifiers have the advantage of allowing for very limited amounts of processing time to classify an input. The final classifier, the “strong classifier”, is formed as a weighted sum of the weak classifiers learned at each iteration. One important goal for many machine-learning applications is that the final classifiers depend only on a small number of features. A classifier which depends on a few features will be more efficient to evaluate a very large database, requiring less processing time and resources. Furthermore, the use of boosting classifiers with the choice of weak learners offers the advantage of being less sensitive to spurious features. It has been shown that the training error of a strong classifier approaches zero exponentially in the number of iterations.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.