Practical considerations dictate that robotic vehicles be operable in both teleoperated and semi-autonomous modes. In the teleoperated mode, stereo cameras on board the vehicle may provide three-dimensional scene information to human operators via stereographic displays. In the semi-autonomous mode, three-dimensional information is also required for automatic obstacle avoidance and must be provided by onboard rangefinders.
Automatic stereo triangulation, or "stereo vision," is a very attractive approach to onboard rangefinding, in part because the necessary video hardware is already required for teleoperation and in part because stereo has a number of potential advantages over other rangefinding technologies. These advantages include the fact that stereo is passive, nonscanning, nonmechanical, and uses very little power.
The practicality of stereo vision has been limited by the slow speed of existing systems and a lack of consensus on basic paradigms for approaching the stereo problem. The present invention uses a synthesis of approaches based on area correlation, random field modeling, and compact, commercial hardware to produce a stereo system that gives range images from 60.times.64 stereo pairs at rates of up to 2 seconds per frame.
Previous stereo vision work has been grouped into categories according to which geometric model of the world was employed, which optimization (i.e. search) algorithms were employed for matching, and which constraints were imposed to enhance the reliability of the stereo matching process. Primary approaches to geometry have been to use either feature-based or field-based world models.
Feature-based approaches typically extract two-dimensional points or line segments from each image, match these, and output the parameters of the corresponding three-dimensional primitives. Field-based models consist of discrete raster representations, in particular the "disparity field" specifying the stereo disparity at each pixel in the image.
Field-based approaches typically perform matching by area correlation. A wide variety of search algorithms have been used, including dynamic programming, gradient descent, simulated annealing, and deterministic, iterative "local support" methods.
Coarse-to-fine search techniques using image pyramids can be combined with most of these methods to greatly improve their efficiency. Finally, many sources of search constraint have been used to reduce the likelihood of false matches, including multispectral images, surface smoothness models, and redundant images, as in trinocular stereo or motion-based bootstrap strategies.
The application of statistical modeling and estimation methods has been growing in both feature-based and field-based approaches. The use of surface smoothness models, which is known to be effective in practice, is fitting into the statistical framework through a relationship to prior probabilities in Bayesian estimation. The power of coarse-to-fine search, redundant images, and "active" or "exploratory" sensing methods is well known.
A basic issue is the question of which type of feature- or field-based model might provide the most general approach to stereo vision. The roots of stereo vision lie in the use of area correlation for aerial triangulation. In the machine vision community of the 1970s and 1980s, correlation was believed by many to be too slow or to be inappropriate for other reasons, so methods based on edges or other types of features became popular. However, feature-based methods also have limitations due to feature instability and the sparseness of estimated range images. The present invention shows that correlation methods can be fast, computationally inexpensive, and potentially useful in many contexts.
Another important issue is which combination or combinations of search algorithms and constraints provide the most efficient and reliable performance. Powerful global search algorithms such as simulated annealing and three-dimensional dynamic programming may give accurate results, but they are very expensive computationally. Analogously, using multispectral or redundant images provides more information, but increases the hardware and computational cost of a system. It is likely that comparatively simple methods will lead to fast and usually reliable performance, as shown in the paper "Practical Real-Time Imaging Stereo Matcher," by H. K. Nishihara, published in the September/October 1984 issue of Optical Engineering, volume 23, number 5.
The question then arises whether there are inexpensive performance metrics that can be used to determine when matching is not reliable, and whether such metrics can be used to control switching between simple, fast procedures when these work and more powerful, expensive procedures when scene characteristics make them necessary.
U.S. Pat. No. 4,905,081 to Morton discloses a method and apparatus for transmitting and receiving three-dimensional video pictures. Transmission of video pictures containing depth information is achieved by taking video signals from two sources, showing different representations of the same scene and correlating them to determine a plurality of peak correlation values which correspond to vectors representing depth information. The first video signal is divided into elementary areas and each block is tested, pixel by pixel, with each vector to see which vector gives the best fit in deriving the second video signal from the first. The vectors which give the best fit are then assigned to their respective areas of the picture and constitute difference information. The first video signal and the assigned vectors are then transmitted in parallel. The first video signal can be received as a monoscopic picture, or alternatively the vectors can be use to modified the first signal to form a display containing depths.
As mentioned in the patent to Morton, the method can be used as a remote sensing technique for use with robots in hazardous environments. Such robots often use stereoscopic television to relay a view of their surroundings to an operator, and the technique described could be used to derive and display the distance of an object from a robot to avoid the need for a separate rangefinder. For autonomous operation of the robot, however, information concerning the distance to a hazardous object in the environment of the robot must be available in near real-time.
The slow speed of prior-art stereo vision systems has posed a major hurdle in the performance of semi-autonomous robotic vehicles. Semi-autonomy in combination with teleoperation is desired for many tasks involving remote or hazardous operations, such as planetary exploration, waste cleanup, and national security. A major need has been a computationally inexpensive method for computing range images in near real time by cross-correlating stereo images.