In the United States alone, according to the National Highway Traffic Association, there were over 180,000 pedestrian fatalities between 1975 and 2005, accounting for 11 percent of total motor vehicle casualties. The majority of pedestrian related accidents occur in urban areas where a pedestrian may dash in front of a vehicle, leaving an inattentive driver with very little time to react and to avoid hitting the pedestrian. As a result, there is a need in the art for an automated driver assistance apparatus and method that alerts a driver in a moving vehicle if and when a pedestrian may cross the path of the moving vehicle.
Computer vision systems and method provide a relatively inexpensive means of sensing pedestrians from within a vehicle, offering a wider field of view and higher resolution compared to radar systems currently in use in high-end automobiles. More particularly, stereo vision systems are superior to monocular vision systems since stereo vision systems permit calculation of distances to a target pedestrian by employing relatively high resolution 3D depth maps.
In an article by B. Leibe, N. Cornelis, and L. V. G. K. Cornelis, titled, “Dynamic 3d scene analysis from a moving vehicle,” CVPR, 2007 (hereinafter “Leibe et al.”), a stereo based system for 3D dynamic scene analysis from a moving vehicle is described that integrates sparse 3D structure estimation with multi-cue image based descriptors to detect pedestrians. Leibe et al. shows that employing sparse 3D structure significantly improves the performance of a pedestrian detector. Unfortunately, the best performance cited is 40% probability of detection at about 1.65 false positives per image frame.
In an article by D. M. Gavrila and S. Munder, titled, “Multi-cue pedestrian detection and tracking from a moving vehicle,” IJCV, 73:41-59, 2007 (hereinafter “Gavrila and Munder”), a realtime stereo system for pedestrian detection and tracking is proposed called PROTECTOR. PROTECTOR employs sparse stereo to generate putative pedestrian regions-of-interest (ROIs) in an image, which are subsequently pruned using shape (contour) and texture information. The choice of sparse/dense stereo processing stages is justified based on real-time limitations in stereo computation for an entire image. Gavrila and Munder reports a 71% pedestrian detection performance at a 0.1 false alarms/frame without using a temporal constraint with pedestrians located less than 25 meters from the cameras. Temporal information is also employed to increase the reliability of the system and to mitigate missing detections, albeit at the price of increased latency of alerting the driver.
A real-time, monocular vision system for pedestrian detection known in the art has been proposed in an article by A. Shashua, Y. Gdalyahu, and G. Hayun, titled, “Pedestrian detection for driver assistance systems: Single-frame classification and system level performance,” in Proc. of the IEEE Intelligent Vehicle Symposium, 2004, (hereinafter “Shashua et al.”). Shashua et al. employs a focus of attention mechanism to detect window candidates very rapidly. The window candidates (approximately 70 per frame) are classified into pedestrian or non-pedestrians using a two-stage classifier. Each input window is divided in 13 image sub-regions. At each region, a histogram of image gradients is computed and used to train a support vector machine (SVM) classifier. The training data is divided into 9 mutually exclusive clusters to account for pose changes in the human body. The 13×9 dimensional vector containing the response of the SVM classifiers for each 9 training clusters is used to train an AdaBoost second-stage classifier. A practical pedestrian awareness system needs to produce very few false positive per hour of driving, hence Shashua et al. employs temporal information to improve the per-frame pedestrian detection performance and to separate between in-path and out-of-path pedestrian detections.
3D systems and methods known in the art may provide a low false positive rate at the expense of speed, while 2D methods and system have been shown to produce low false positive rate and high detection rates. Accordingly, what would be desirable, but has not yet been provided, is a 3D method and system for detecting pedestrians from moving vehicles in cluttered environments having low false positives and high detection rates, while maintaining real-time processing speed.