Many computer vision applications use stereo images acquired by a stereo camera to detect objects. A stereo camera typically has multiple lenses and sensors. Usually, the intra-axial distance between the lenses is about the same distance as between the eyes to provide overlapping views.
FIG. 1 shows a conventional system for stereo-based object detection. A stereo camera 101 acquires stereo images 102. The detection method can include the following steps: stereo imaging 100, cost volume determination 110, depth/disparity map estimation 120, and object detection 130.
Most of the conventional methods for stereo-based object detection rely on per pixel depth information in the overlapping area 120. This step is generally referred as depth/range map estimation. This step can be achieved by determining disparity values, i.e., translation of corresponding pixels in the two images, determining the depth map. The depth map can then be used for object detection 130, e.g., a histogram of oriented gradients (HoG) of the depth map is used for object description. One method estimates the dominate disparity in a sub-image region, and uses a co-occurrence histogram of the relative disparity values for object detection.
Depth/range/disparity map estimation is a challenging problem. Local methods suffers from inaccurate depth determination, while global methods require significant computational resources, and are unsuited for real-time applications.
Several methods avoid the depth map determination step by using stereo cues for region of interest generation. For example, one method determines a stixel map which marks the potential object locations. Each stixel is defined by a 3D position relative to the camera and stands vertically on a ground plane. A detector based on the color image content is then applied to the locations to detect objects.
U.S. Publication 20130177237 uses a range map to determine an area of interest, and uses a classifier based on an intensity histogram to detect objects.
Region of interest methods cannot be directly applied to object detection. They have to be applied in conjunction with other object detectors. In addition, miss detection is certain when the area of interest does not cover the object.