Recently, consumer and commercial unmanned aerial vehicles (UAVs) or “drones,” a type of miniature pilotless aircrafts, have gained tremendous popularity and commercial success worldwide. An UAV or a drone is generally controlled by a remote controller and/or software and uses aerodynamic effects, e.g., generated by multirotors, to maneuver through the air with very high stabilities and to perform various designed functionalities, such as surveillance and package delivery. One of the most popular applications of consumer UAVs or drones is aerial photography, i.e., to take still photographs or record videos from a vantage point above the subject being photographed. The latest versions of consumer drones are generally lightweight and easy to control, so that they can be navigated safely by casual users. In addition to the manual control functionality, some high-end consumer drones are also equipped with obstacle detection and avoidance functionalities which provide essential and additional safety for UVA navigation. Such functionalities would allow a drone to stop in front of an obstacle or fly around the obstacle by changing the flight path automatically if necessary.
Many sensor-based obstacle-sensing techniques have been explored to detect and avoid obstacles for drones. Depending on the applications, multiple sensors can be employed either independently or jointly to estimate the distance or depth of an obstacle in front of a flying drone. For example, time-of-flight sensors can provide accurate distance measurements up to 2 meters, while more expensive LIDAR sensors can detect obstacles at a range of more than 200 meters. Compared to the above and other types of sensors, sensors that employ stereo vision techniques can achieve both long detection range (up to 15 meters) and low production cost. As a result, drones equipped with stereo-vision-based obstacle avoidance features have gained great popularity, e.g., DJI Phantom 4.
A stereo vision system typically computes three-dimensional (3D) information based on pairs of images (also referred to as “stereo images”) captured by two cameras positioned at slightly different viewpoints. One of the key steps of estimating 3D information based on the captured stereo images is “stereo matching.” The objective of stereo matching is to establish the correspondences between pair of points in a pair of stereo images. Based on the matched pairs of points, a disparity map can be computed. Once the disparity map is obtained, depth and 3D information can be quickly obtained based on the disparity map through triangulation. One of the effective stereo matching techniques is semi-global block matching or “SGBM” (see Hirschmuller, “Accurate and efficient stereo processing by semi-global matching and mutual information,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 807-814), which has shown to produce robust dense stereo matching results. However, the computational cost of SGBM is generally too high for embedded systems such as drones, which have computational resource limitations. For example, some experiments have shown that a stereo matching operation using SGBM technique for a pair of 640×480 pixels resolution images can only run at about 4.3 frames per second (fps) on RK3399, a six-core high performance embedded platform using two ARM Cortex-A72 CPUs. To speed up the SGBM-based stereo matching operation on embedded systems, a dedicated field programmable gate array (FPGA) can be employed to perform SGBM operations in the hardware. This technique has shown processing speed of 60 fps when calculating dense disparity map on images with 752×480 pixels resolution. However, using dedicated FPGA in an embedded system inevitably increases the overall system cost.
Contrary to the techniques aimed at generating dense disparity maps at high hardware cost, a high efficiency “pushbroom stereo” technique was proposed for high-speed autonomous obstacle detection and avoidance (see Barry et al., “High-speed autonomous obstacle avoidance with pushbroom stereo,” Journal of Field Robotics, vol. 35, no. 1, pp. 52-68, 2018). Instead of performing dense stereo matching for each image frame, the pushbroom stereo technique only searches for a single depth value (at a fixed distance) of stereo correspondence. The missing depth information other than currently searched value is then recovered by integrating drone's odometry data and previously determined single-disparity results. Next, the pushbroom stereo results are combined with a model-based control system to enable a high-speed flight (10-14 m/s) in natural environments, while automatically avoiding obstacles such as trees. Unfortunately, the pushbroom stereo technique scarifies a level of reliability for the high speed. Without the dense disparity map, the technique may function relatively well in some simple obstacle environments, but not for many of the more complex obstacle environments.
Hence, there is a need for a relatively low-complexity, low-cost and yet relatively high-speed, high-reliability obstacle detection and avoidance system and technique without the problems described above.