1. Field of the Invention
This invention relates generally to an apparatus and method for real-time, adaptive, self-calibration of a visual time-to-contact sensor, mounted on a manned or mobile vehicle, to predict the time-to-contact with stationary and moving obstacles in a moving vehicle's immediate path. More particularly, the invention relates to an apparatus and method for a self-calibrating visual time-to-contact sensor, actively centering the focus of expansion (FOE) in the image sequence, using optical flow, and continuously calibrating the pan and tilt of the sensor to point in the direction of the FOE, in order to accurately predict the time-to-contact with stationary and moving obstacles in a moving vehicle's immediate path, and enable warning or evasive action.
2. Description of the Related Prior Art
The present invention relates to a real-time navigation and obstacle avoidance vision system for a manned or autonomous moving vehicle and, in particular, to a vision system which includes an apparatus and a method for camera alignment along the direction of motion in order to control vehicle's movement, enabling warning or evasive action, and with the ability to accurately predict the time-to-contact with stationary and moving obstacles in a moving vehicle's immediate path.
In the case of an autonomous vehicle, the vehicle typically includes some type of a sensor system for sensing and detecting obstacles within the path of the robot so that the appropriate action may be taken. This action may include altering the path of the robot in order to get around the obstacle. Systems employing ultrasonic detectors, mechanical contact devices and laser ranging apparati are known in the art. Other systems, which include a camera to observe the environment and a passive image processing system, are also known.
The field of computer vision includes the computer analysis of scenes projected into an electronic camera. The camera generates images of the scenes, and a computer analyzes these images and draws useful conclusions. The complexity of the relationships challenges real-time functioning because of the sophisticated and complex mathematical relationships which require additional processing time. Traditional, passive scene analysis vision systems, require large mounts of computing power, are relatively slow and often yield erroneous results. Typically, the interpretation of data is too slow to be useful for real-time navigation and may result in navigation errors.
Conventionally, various image control devices have been proposed in order to automatically move a vehicle or a robot in an autonomous manner. They include digitizing images from a forward looking camera mounted on the vehicle body and changing a course according to the received image. However, control of a robot course while the body is moving is very difficult and requires evaluation of a plurality of images obtained from the camera, wherein each set of data is obtained at a different time and position.
A serious shortcoming of almost all known computer vision robotic and navigation systems, especially with regard to low cost commercial systems, is that some son of pre-known relationship, correlation, or calibration must be made between the major components of the system and the environment in which the robot operates. One example is U.S. Pat. No. 4,789,940 to Christian, entitled "Method and Apparatus For Filtering Reflections From Direct Images For Mobile Robot Navigation". In this patent, a camera is viewing an area and keeping track of mobile robots so that their movement can be controlled by a control system. A limitation of this system is that the area of movement and the field of view of the camera must be carefully pre-calibrated since the camera is always in a predetermined known position. Movement of robots is directed in the two dimensional plane of the surface, which precludes its use with wheeled or "nonholonomic" robots.
In general, self-calibrating visual time-to-contact sensors have been developed for use in the autonomously moving unmanned systems, including ground and air vehicles, and in automotive and robot applications. Therefore, these systems need to have the ability to accurately and quickly predict the time-to-contact with stationary and moving obstacles in the vehicle's immediate path, enabling evasive action. They also have to be compact in size, with low power hardware implementation and low cost.
Conventionally, there are numerous calibration techniques used, divided into two groups: one-time, high-accuracy parameters calculation methods; and, task-oriented, coarse, inexact calibration methods. The high-accuracy methods use algorithms for computing the focus of expansion (FOE) of a sensor pointed in the direction of motion of a vehicle (Ballard, D. H. and C. M. Brown, "Computer Vision," Prentice Hall Inc., 1982; Brooks, R. A., A. M. Flynn, and T. Marill, "Self-calibration of Motion and Stereo Vision for Mobile Robot Navigation," MIT AI Lab, AI Memo 984, August, 1987; Negahdaripour, S. and B. K. P. Horn, "A Direct Method for Locating the Focus of Expansion," MIT AI Lab, AI Memo 939, January, 1987; Tsai, R. Y. "A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses," IEEE JRA, RA-3(4): 323-344, 1987). These methods often rely on accurate computation of the optical flow, and in some cases use specific assumptions about the environment or sensor parameters to achieve better accuracy. The calibration of the equipment is accomplished only once, at the beginning.
For example, Tsai's work, supra, describes camera calibration techniques for computing both intrinsic and extrinsic parameters. Intrinsic parameters are those that are particular to a specific camera and lens, while extrinsic parameters relate the camera to other world coordinate systems. The process is divided into two stages for computation of six extrinsic parameters (camera yaw, pitch, roll, and three translation parameters), followed by computation of six intrinsic parameters (focal length, two lens distortion parameters, camera scanning and acquisition scale factor, and image center). While this and many similar methods can compute the parameters quickly, they depend on detailed real world knowledge, accurately known correspondences, and object recognition capabilities. This renders them impractical for mobile platforms where vibrations, sensor drift, noise and environmental variability have a significant impact.
More recently, coarse calibration techniques have been used with mobile robot platforms interacting in complex environments. Brooks et al., supra, describe a method by which forward motion vision is used to calibrate stereo vision, a type of bootstrapping calibration. Their calibration occurs in a velocity-dependent coordinate system that is more natural for obstacle avoidance maneuvers. Similar methods, which require minimal or coarse calibration, have been developed for many active vision algorithms, especially those that do not require accurate quantitative information, termed "inexact vision". There are many advantages to these inexact approaches to calibration for visual navigation, including continuous calibration updates, simplicity, quick computation, adaptation to changing sensor and world conditions, and specialization for a particular robot's tasks.
The time-to-contact is defined as the time required for an object to travel from its current position to the surface directly in front of the object, in the direction of motion of the object. The calculation of time-to-contact relies directly on the computed location of a focus of expansion (FOE) in a sequence of images of the environment. The FOE in an image is equivalent to the projection into the image plane of the three dimensional vector in the direction of motion of the object. Traditional methods exist for the computation of the FOE and are typically derived from the two-dimensional optical flow, although other direct methods exist. Optical flow is the apparent or perceived variation or motion of the image brightness patterns, in the image plane, arising from relative motion of objects and an observing system. By computing spacial and temporal derivatives of the changing image brightness pattern it is possible to obtain estimates of the optical flows.
In theory, in the conventional calibration methods, after a method has been used to compute the FOE, the sensor can then be pointed in the direction of motion of the vehicle in one step. However, these methods rely on accurate local information within each optical flow computed in order to locate the FOE, and do not set forth a robust strategy. Moreover, when the FOE of an image sequence is outside of the field of view of the sensor, computation of the FOE and thus time-to-contact is extremely error prone and inaccurate. No previously known algorithms can cope with this problem since it requires actively repositioning the sensor to locate the FOE within the field of view of the sensor more accurately. Further, when the FOE is located within the field of view of the sensor, but not centered in the image plane, computation of the time-to-contact may also be inaccurate, since there is little data available at the edges of the visible portion of the image. By actively centering the FOE in the image sequence, better accuracy is possible.