1. Technical Field
The invention is related to classifier-based detection systems, and in particular, to a system and method for efficiently training combination classifiers for use in detecting instances of information of interest within data.
2. Related Art
As is well known to those skilled in the art, there are a large variety of techniques for implementing target detection systems for detecting particular elements or components within a signal. Such detection includes, for example, detection or identification of faces or other specific objects in images, detection of particular words in a speech sample, detection of specific heartbeat patterns in an electrocardiogram signal, etc.
One common detection technique involves the use of hierarchical classifiers, also referred to as “cascade detectors,” for use in constructing conventional target detection systems. Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Consequently, a great deal of effort has been directed towards cascade learning techniques for improving the training of classifiers used in such detection systems. While use of typical detection systems is typically fast (possibly real-time), initial training of such detection systems is typically very slow. Unfortunately, the process for effectively and efficiently training cascade detectors remains a challenging technical problem with respect to determining optimal cascade stage sizes and target detection rates.
One increasingly important application of cascade-based detection systems involves real-time face detection. For example, one conventional technique involves the use of adaptive boosting, also commonly referred to as “AdaBoost,” in combination with an “integral image” for training of the cascaded detector. This detection scheme requires a number of complex parameters, including, for example, the number and shapes of rectangle filters, the number of stages, the number of weak classifiers in each stage, and the target detection rates for each cascade stage. Unfortunately, while this type of system provides good detection results, its computational complexity means that the initial cascade training process requires significant amounts of time (possibly days or weeks, depending upon CPU resources being used), and as such, picking optimal parameters is a difficult task.
The conceptual and computational complexity of generic cascade training processes has led to a number of improvements and refinements of such training. For example, several recent “soft-cascade” based techniques operate by relaxing the original cascade structure of distinct and separate stages so that earlier computation of weak classifier scores can be combined with later weak classifiers. For example, in one such “soft-cascade” approach, the entire detector is trained as a single combination classifier without stages (with hundreds or even thousands of weak classifiers). The score assigned to a detection window by the soft-cascade is simply the sum of the weak classifiers. Computation of the sum is terminated early whenever the partial sum falls below some predetermined threshold.
A related soft-cascade training technique generally operates by setting intermediate thresholds based on an ad hoc detection rate target called a “rejection distribution vector.” Like early cascade-based schemes, the soft-cascade of this scheme gradually gives up on a number of positive examples in an effort to aggressively reduce the number of negatives passing through the cascade.
Giving up on some positive examples early in the training process is justified by an understanding that the original combination classifier will eventually give up on some positive examples anyway. The original combination classifier may discard a positive example because it is too difficult to detect, or because reducing the final threshold would admit too many false positives. While it is possible to set the intermediate thresholds so that no positive example is lost, this leads to very conservative thresholds and a very slow detector. The main question is which positive examples can be discarded and when. Unfortunately, one problem with conventional cascade learning approaches is that while many agree that discarding some positive examples is warranted, these schemes fail to provide an efficient or effective mechanism for determining which examples are best to discard.
For example, one conventional training scheme attempts to reject zero positive examples until it becomes impossible to continue, at which point, positive samples are rejected one at a time, as needed. A related scheme defines an exponential curve which determines the number of faces that can be discarded at each stage. Any positive example falling outside this somewhat arbitrary curve is simply discarded. Yet another conventional scheme uses a ratio test to determine rejection thresholds. While this scheme has some statistical validity, the distributions must be estimated (which introduces empirical risk). Each of these schemes has advantages and disadvantages that generally result in a tradeoff between various factors including target detection rates, target detection speed, and classifier training speed.