Pedestrian detection has been a focus of recent research due to its importance for practical applications such as automotive safety [see refs 11, 8] and visual surveillance [23]. The most successful model to date for “normal” pedestrians, who are usually standing or walking upright, is still a monolithic global descriptor for the entire search window. With such a model, there are three main steps which can be varied to gain performance: feature extraction, classification, and non-maxima suppression. The most common features extracted from the raw image data are variants of the HOG framework, i.e. local histograms of gradients and (relative) optic flow [3, 4, 10, 24, 27], and different flavors of generalized Haar wavelets, e.g. [6, 23]. Competitive classifiers we know of employ statistical learning techniques to learn the mapping from features to scores (indicating the likelihood of a pedestrian being present)—usually either support vector machines [3, 13, 17, 19, 27] or some variant of boosting [23, 27, 28, 30].
The spectacular progress that has been made in detecting pedestrians (i.e. humans in an upright position) is maybe best illustrated by the increasing difficulty of datasets used for benchmarking. The first [16] and second [3] generation of pedestrian databases are essentially saturated, and have been replaced by new more challenging datasets [7, 27, 6]. These recent efforts to record data of realistic complexity have also shown that there is still a gap between what is possible with pedestrian detectors and what would be required for many applications: in [6] the detection rate of the best methods is still <60% for one false positive detection per image, even for fully visible people.