1. Field of the Invention
This invention relates to use of an imaging system to determine the time remaining before two objects or surfaces touch. The image sensor may, for example, be mounted on a manually, remotely or automatically operated vehicle of land, sea or air, or on a kinematic chain or robotic arm used in automated assembly or other factory automation.
2. Description of Related Art
Active systems using sonar, lasers, and radar are known in the art for estimating the distance or range to an object or to a point on an object. At times such systems can also be used to estimate velocity by differentiating the range, or by using the Doppler effect. The ratio of distance to velocity could then be used to estimate the time to contact. Such systems actively emit some form of radiation and detect radiation reflected from an object. The distance is estimated from the time delay between emitted and received radiation, the phase shift between emitted and received radiation or by a triangulation method where the radiation sensor is angularly separated from the radiation source.
Such “active” systems can be complex, expensive, subject to interference and limited in spatial resolution, or limited in application because of the need to actively emit radiation and the requirement that the object reflect the radiation. So-called “passive” systems instead do not emit radiation and work with natural illumination and reflection or emission of radiation from the object.
Passive optical systems include ones that estimate the distance using two imaging systems and a binocular stereo matching method. The time to contact could then be calculated if the velocity was known. Binocular stereo systems are expensive, require careful calibration, and make heavy computational demands because of the stereo matching operation (Lucas & Kanade 1981).
FIG. 16 illustrates the case in which a subject vehicle 10 is approaching another vehicle 20 traveling across the road on which subject vehicle 10 proceeds. As schematically shown in FIG. 16, the time to contact (TTC) could also be estimated by determining the image motion of certain characteristic “features” such as grey-level corners. Such features may also be edges, points or small areas identified by so-called “interest operators”. However, “feature tracking” methods may be unreliable in estimation of the TTC when errors are contained in the extraction results of the features or feature points and in the calculation therewith in the case in which, for example, the feature points are erroneously correlated with each other, or in the case in which the feature points are erroneously extracted or not extracted and the correlations between the feature points are failed (schematically shown in FIG. 17). In addition, because the extraction of the feature points requires a significant amount of calculation, it is difficult to realize a high-speed estimation of the TTC. As a result, the “feature tracking” methods have not been used successfully to estimate the TTC.
Time to contact could be estimated by a two stage method based on “optical flow”. “Optical flow” is a vector field that specifies the apparent motion of the image at each point in the image (Horn & Schunck 1981). The motion of the imaging system relative to the object could be estimated from the optical flow (Bruss & Horn 1983). The time to collision in turn could then be estimated from the relative motion if the distance to the object was known.
Methods for estimating optical flow have been known since the 1980s (Horn & Schunck 1981). The derivatives of image brightness in the spatial and time dimensions constrain the velocity components of the optical flow. Methods based on the spatial and temporal derivatives of image brightness are also known as “gradient-based” methods. Optical flow methods suffer from high computational cost, however, since the optical flow is usually estimated by numerically solving a pair of second order partial differential equations for the two components of velocity.
Optical flow can be estimated in a number of alternate ways, particularly if the result is only needed on a coarse grid or sparse set of points. One such alternative is a least-squares method based on the assumption that the optical flow velocity is constant or fixed within subimage blocks into which the full image is divided (sec. 4.3 in Horn 1988). Such a “fixed flow” method, while computationally less intensive than the full optical flow method above, suffers from limited resolution due to the assumption that the flow is constant within each block, and requires some means to determine what image block size presents a favorable compromise between resolution and accuracy.
Some optical mice estimate the lateral motion of the mouse over a surface by estimating the “fixed flow” using spatial and temporal derivatives of image brightness. See for example patents U.S. Pat. No. 5,793,357, U.S. Pat. No. 6,084,574, and U.S. Pat. No. 6,124,587.
Optical flow can be estimated instead by finding the shift of each block of an image frame relative to the previous frame that best brings it into alignment. The measure of how well image blocks match may be, for example, correlation, normalized correlation, sum of absolute values of differences, or sum of squares of differences (see e.g., Tanner & Mead 1984). These methods have the same disadvantages of low resolution as the previously mentioned method, and in addition are computationally expensive because of the need to search for the shift that yields the best match.
All optical flow based methods are subject to the so-called “aperture effect” which limits how accurately the optical flow can be estimated. The “aperture effect” is due to the inability of local image information to constrain the component of optical flow in the direction of the isophotes (i.e., perpendicular to the local brightness gradient).
Other methods proposed for estimating the “time to collision” attempt to simulate some assumed neural structure of biological vision systems (Galbraith et al 2005). These tend to be computationally extremely demanding, have low accuracy, and produce results only after a significant delay. Naturally, a collision warning system is of little practical use if it produces a warning only when it is too late to take evasive action.
In estimating rigid body motion, so-called “direct methods” have been proposed which avoid the two-stage approach by bypassing the estimation of the optical flow, instead working directly with spatial and temporal derivatives of image brightness. Methods for estimating rotation of a camera in a fixed environment, as well as method for estimating translation of a camera in a fixed environment have been described. See for example Horn & Negadharipour 1987 and Horn & Weldon 1988, which are hereby incorporated by reference. Such methods have, however, not previously been applied to the problem of determining the time to contact or time to collision.
In order for a method for determining time to contact to be of practical interest, it is necessary for a low end general purpose computer, or a cheap special purpose circuit (such as DSP, FPGA, PLD, or ASIC), to be able to perform the required computation. Furthermore, the result of the computation needs to be available almost immediately, for example, within one frame time of the image sequence being analyzed. This argues against extensive pipelining of the computation, which increases latency.