(1) Field of Invention
The present invention relates to a system for detecting obstacles reliably with their ranges by a combination of two-dimensional and three-dimensional sensing and, more specifically, to such a system used to generate an accurate time-to-contact map for purposes of autonomous navigation.
(2) Description of Related Art
Obstacle detection and avoidance is a crucial task that is required to realize autonomous robots and/or navigation. Some systems utilize range sensors, such as LIDAR or RADAR sensors (see the List of Incorporated Literature References, Literature Reference No. 1), that have the ability to provide accurate estimation of looming obstacle collisions. Others have attempted to use smaller sensors such as monocular cameras to detect and avoid looming obstacles (see Literature Reference Nos. 2, 3, 4 and 5). Monocular cameras achieve the low SWaP requirements for autonomous systems; however, one of the main challenges with using monocular cameras is that each camera frame by itself inherently cannot provide depth data from the scene. Thus, depth information and subsequent camera frames are typically used to give an estimation of the depth of the scene.
However, it is challenging to detect obstacles and estimate time-to-contact or time-to-collision (TTC) values reliably and rapidly from passive vision (optical flow, stereo, or structure from motion) due to inconsistent feature tracking, texture-less environments, limited working ranges, and/or intensive computation required. Active range sensing can provide absolute and error-less distances to (both far and near) obstacles; however, these types of sensors (i.e., two-dimensional (2D) laser scanners, three-dimensional (3D) light detection and ranging (LIDAR) or red/green/blue/depth (RGB-D) cameras) are usually heavy/bulky, output sparse point clouds, operate at low frame-rates or are limited to reliably working indoors.
There are many techniques developed for obstacle detection and TTC estimation for autonomous navigation (and also generally for computer vision and robotics applications). For example, most monocular/optical flow based approaches require expensive computations and could produce an unacceptable amount of false detections while providing relative TTC only. Stereo-based depth estimation is limited to the working range (usually shorter look-ahead) constrained by the baseline length and performs very poor in texture-less environments and on homogeneous surfaces. Structure from motion requires at least several frames taken at different viewpoints. Depth estimation by passive sensing (i.e., using cameras) inherently involves errors propagated from the uncertainty in the pixel domain (miss matching, lack of features). On the other hand, active sensing by a laser scanner or a 3D LIDAR sensor can provide absolute and more accurate TTC or depth measurement than 2D, but these types of sensing mostly require high SWaP (i.e., size, weight, and power) and produce sparse point clouds. Optimal fusion using 2D and 3D sensors has not been well exploited for high speed navigation.
Existing TTC Map (or depth map) estimation can be broken down by sensor modality. The most relevant for low SWaP constraints is the usage of a single passive sensor (monocular camera). Methods based on scale change (see Literature Reference Nos. 5 and 6) are often very computationally expensive as they rely on feature tracking and scale change detection via methods like template matching. These methods also provide only relative depth of objects, as they must rely on image segmentation to (for example) distinguish only foreground from background. The lack of absolute TTC and slow process rate does not make them suitable for maneuvers where a quick reaction must be achievable.
Obtaining more accurate depth maps can be done by using learning methods (see Literature Reference Nos. 7 and 8). These methods operate at a lower image domain (pixels/filters on features) and can provide a relative depth map quickly, but do not generalize well to cluttered environments as the learned templates for classifying the image may not cope well with unseen structure or objects.
One of the more popular methods for TTC estimation involves computation of optical flow (see Literature Reference Nos. 6, 8 and 9). However, estimating the optical flow relies on motion parallax. This method often requires tracking feature motion between frames (consuming computation time) and fails for obstacles found along the optical axis of the camera. Another popular method for building TTC Maps is achieved by stereo (see Literature Reference Nos. 10 and 11). Both of these methods quickly compute accurate depth maps, but they are limited to the camera pair baseline with regard to look-ahead time and object perception is limited by the texture and homogeneity of the surface. If one desires to build more accurate TTC/depth maps using structure from motion (as is usually the case in stereo configurations) then one needs to use sufficient (30 or more frames) (see Literature Reference No. 2) to build a dense map where an object can be identified well. Alternatively, although real-time depth maps can be obtained as in (see Literature Reference No. 14) at the loss of point density, such a technique is not suitable for accurate representation of an object.
There are also methods which attempt to fuse multiple sources of information (stereo and monocular cameras) (see Literature Reference No. 9) and (sonar, stereo, scanning laser range finder) (see Literature Reference No. 12). While the depth map accuracy improves significantly, the excessively high SWaP requirement to operate their system limits the mission duration as well as the maneuverability of a robot or autonomous platform. Other methods (see Literature Reference No. 13) provide a depth map wherein a robot could be accurately localized, but objects would be sparsely represented, as structure from motion is the key method of registering auxiliary depth information.
While each of the state-of-the-art methods mentioned above work well in their own regard, they do not yet have the ability to achieve high-speed agile exploration and navigation in cluttered environment under low SWaP constraints. Thus, a continuing need exists for a system that provides for fast and successful obstacle detection and avoidance in autonomous navigation in a timely fashion for a variety of different tasks.