For detecting and recognizing objects in images, pattern recognition approaches have achieved measurable success in the domain of computer vision, examples include face, automobile, and pedestrian detection, see e.g., Avidan, “Support vector tracking,” IEEE Conference on Computer Vision and Pattern Recognition, 2001, Papageorgiou et al., “A general framework for object detection,” International Conference on Computer Vision, 1998, Rowley et al., “Neural network-based face detection,” IEEE Patt. Anal. Mach. Intell., Volume 20, pages 22–38, 1998, Schneiderman et al., “A statistical method for 3D object detection applied to faces and cars,” International Conference on Computer Vision, 2000, and Viola et al., “Rapid object detection using a boosted cascade of simple features,” IEEE Conference on Computer Vision and Pattern Recognition, 2001.
Those approaches generally use machine learning to construct detectors or filters from a large number of training images. The filters are then scanned over an input image in order to find a pattern of features, which is consistent with a target object. Those systems work very well for the detection of faces, but less well for pedestrians, perhaps because the images of pedestrians are more varied, due to changes in body pose and clothing, whereas faces are fairly uniform in texture and structure and with relatively little motion. Therefore, it is desired to provide a method that works on a temporally ordered sequence of images, as well as on a single static image.
The detection of pedestrians is made even more difficult in surveillance applications, where the resolution of the images is relatively low, e.g., the target object may only be a total of about 100–200 pixels, e.g., 5 by 20 or 10 by 20 pixels. Though improvement of pedestrian detection using better functions of image intensity is a valuable pursuit, a new solution is required.
It is well known that patterns of human motion, particularly the pendulum like motion of walking, is distinguishable from other sorts of motion, and that motion can be used to recognize and detect people, see Cutler et al., “Robust real-time periodic motion detection: Analysis and applications,” IEEE Patt. Anal. Mach. Intell., Volume 22, pages 781–796, 2000, Lee, “Gait dynamics for recognition and classification,” MIT Al Lab., Memo, AIM-2001-019, MIT, 2001, Liu et al., “Finding periodicity in space and time,” IEEE International Conference on Computer Vision, pages 376–383, 1998, and Polana et al., “Detecting activities,” Journal of Visual Communication and Image Representation, June 1994.
In contrast with the intensity-based approaches, they typically try to track moving objects over many frames, and then analyze the motion to look for periodicity or other cues. Processes that detect motion ‘styles’ are quite efficient, and it is possible to conduct an exhaustive search over entire images at multiple scales of resolution. When trained with a large data set of images, they can achieve high detection rates and very low false positive rates.
The field of human motion analysis is quite broad and has a history stretching back to the work of Hoffman et al., “The interpretation of biological motion,” Biological Cybernetics, pages 195–204, 1982. Most of the prior art systems presume that half the problem has already been solved, i.e., a particular type of moving object, for example, a person, has been detected, and that the only remaining problem is to recognize, categorize, or analyze the long-term pattern of motion of that specific moving object.
Recently, interest in motion-based methods has increased because of the possible application of those methods to problems in surveillance. An excellent overview of related work in this area is described by Cutler et al., above. They describe a system that works directly on images. Their system first performs object segmentation and tracking. The objects are aligned to the object's centroids. A 2D lattice is then constructed, to which period analysis is applied.
The field of object detection is equally broad, although systems that perform direct pedestrians detection, using both intensity and motion information at the same time, are not known. Pedestrians have been detected in a static intensity image by first extracting edges and then matching edges with a set of examples, see Gavrila et al, “Real-time object detection for “smart” vehicles,” IEEE International Conference on Computer Vision, pages 87–93, 1999. Their system is a highly optimized, and appears to have been a candidate for inclusion in automobiles. Nevertheless published detection rates were approximately 75%, with a false positive rate of 2 per image. Other related work includes that of Papageorgiou et. al, above. That system detects pedestrians using a support vector machine trained on an overcomplete wavelet basis. Based on the published experimental data, their false positive rate is significantly higher than for the related face detection systems.
Therefore, it is desired to extract directly short term patterns of motion and appearance information, from a temporal sequence of images in order to detect instances of a moving object, such as pedestrians.