Computing devices can be used to recognize faces, voices, handwriting, and other objects, patterns and the like. In a typical implementation, a computing device can continuously monitor a particular input stream (e.g., a video stream from a video camera or an audio stream from a microphone), or receive a batch of similar input data. The computing device can determine whether a portion of the input is likely to contain information corresponding to the target item, object, or pattern to be detected. For example, the computing device can determine whether a particular portion of the input stream is likely include to any face, any speech, or any handwriting at all. Once this preliminary determination has been made, the computing device can then perform other processing or cause other processing to be performed. For example, the computing device may perform recognition of which particular face, voice or other target is present in the input, rather than detecting that any face/voice/etc. is present in the input.
One approach to implementing a detection system is to use a cascade-based detector. Cascade detectors process input samples through a sequence of classifiers that score the sample on how likely it is to contain an event of interest. At each stage of the cascade, a decision is made to either discard the sample under consideration or to pass it on to the next stage. A sample that passes through all the stages of the cascade is hypothesized to contain the event of interest; otherwise, the sample is hypothesized to not contain the event. Therefore, each stage only observes samples that have passed through all of the previous stages, and have therefore not been rejected by any previous stage.