The present invention relates generally to methods and apparatus for determining the distances between surface patches of three-dimensional spatial scenes (hereinafter "3-D scene") and a camera system. More particularly, however, the present invention relates to methods of and apparatus for determining or finding such distances on the basis of at least a pair of two-dimensional images (hereinafter "images") of the 3-D scene, each of which have been recorded with a camera system having a different set of camera parameter values, and upon which two-dimensional digital processing has been performed in a parallel manner, in order to derive the distances (i.e. ranges) of the surface patches from the camera system.
Determining the distance of objects or visible surface patches in a three-dimensional spatial scene is an important problem in robot and other forms of computer vision processes.
A wide variety of optical range (i.e. distance) finding apparatus and processes are known. Such apparatus and processes may be characterized as cameras which record distance information which is often referred to in the literature as "depth maps" of three-dimensional spatial scenes.
Some conventional two-dimensional range finding cameras record the brightness of objects illuminated by incident or reflected light. Such range finding cameras record images and analyze the brightness of the two-dimensional image to determine its distance from the camera. Such cameras and methods have significant drawbacks as they require controlled lighting conditions and high light intensity discrimination.
There are essentially two-types of optical range finding cameras, referred to as active and passive types, the distinction based upon how the target object or surface patches are illuminated. Active range finding cameras control the source of illumination of the target, whereas passive systems depend upon ambient illumination.
In contrast with passive range finding to be discussed hereinafter, active range finding with light requires a source of illumination controlled by the optical range finding camera. The most intensely researched areas of active range finding are triangulation analysis, time of flight analysis (i.e., LADAR) projection pattern (i.e., Moire) analysis, and focus calibration analysis.
In triangulation analysis, the second camera of a stereo camera system is replaced by a structured light source such as a projector. Typically, the projector originates a pattern of light containing straight edges. If viewed directly on axis with a light projector, the edges would appear as a straight line, regardless of the depth contour of the surface it strikes. Alternatively, if viewed from an offset position, the edges appear bent. The contour of the bend in the edges can be easily correlated to depth.
Another structured light method requires the projection of a pair of regularly spaced two-dimensional patterns on the subject. The two patterns interfere with each other to create a Moire pattern which can be easily photographed. The topographical contours in a Moire pattern are proportional to change in distance of the subject from the camera.
LADAR is similar to electromagnetic ranging by RADAR. The difference is that a pulse modulated laser is used as the active source of illumination. Specially designed sensors in the range finder measure the time of flight of the pulse from the laser to the target and back to the rangefinder. LADAR systems are slow in that they require sequential scanning of the scene to generate a depth-map. Also, they are expensive.
Another system of active light ranging is focus calibration analysis. In such a system, a pencil beam of light is sent out from the camera. The radius of the circle of confusion of the beam, as seen through a calibrated lens, is a measure of the target's distance.
There are three major principles applied in passive range finding: (1) shape analysis; (2) multiple (typically stereo) view analysis; and (3) depth-of-field or optical focus analysis. Embodiments of all three passive range finding principles can be realized with conventional two-dimensional cameras.
For example, one form of shape analysis is realized by observing and recording (e.g., photographing) a target object or surface patch of known size and determining its distance from the camera by simply measuring its recorded size. Alternatively, if two horizontally offset views of an object are photographed, and the two photographs are placed in registration at some point of known depth, the range of any other correlated elements can be measured by measuring their disparity in the horizontal dimension between the registered photographs. This method is also known as stereo vision.
The final passive category is depth-of-field or focus analysis. Optical range finding cameras falling in this category rely on the fact that depth information can be obtained from focal gradients resulting in the limited depth of field which is inherent in most optical systems.
An approach which requires searching for the lens setting that gives the best focused image of the object can be found in automatic focusing methodology. Auto focus methods all measure depth (i.e. distances between the camera and points in the scene) by searching for the lens setting that gives the best focus at a particular point. A survey of such techniques can be found in the paper "A Perspective On Range-Finding Techniques For Computer Vision" by R. A. Jarvis, in IEEE transaction on Pattern Analysis and Machine Intelligence, Volume PAMI-5, pages 122-139, March 1983. The limitations of the basic method are that it measures depth at only one point at a time, and it requires modifying the lens setting over a wide range of values, in order to search for the setting that yields the best focus.
Auto focus methods can, however, be improved by either storing the images acquired at each lens setting, and then searching the stored images for the best focal state at each point, or by employing a large number of specialized focus-measuring devices that conduct a parallel search for the best lens setting. Both alternatives have severe drawbacks in that the first alternative involves acquiring and storing, for example, 30 or more images, while the second alternative requires sophisticated parallel hardware.
Another method based on the depth-of-field analysis, involves measuring the error in focus (i.e. the focal gradient), and employs that measure to estimate the depth. Such a method is disclosed in the paper entitled "A New Sense for Depth Field" by Alex P. Pentland published in the Proceedings of the International Joint Conference on Artificial Intelligence, August, 1985 and revised and republished without substantive change in July 1987 in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume PAMI-9, No. 4.
Pentland proposed two methods of depth-map recovery. The first method uses a single image of a scene, containing edges which are step discontinuities in the focused image. This method requires the knowledge of the location of these edges and this method cannot be used if there are no perfect step edges in the scene.
In the second method, Pentland discussed a method for measuring the spread parameter of the camera system which involves forming two images through different aperture settings, one of which is required to be that of a pin hole camera. Using the spread parameters computed by this approach, depth estimates of a scene can be calculated using a lens system formula derived using geometrical considerations. Application of Pentland's method poses serious practical difficulties, as it inherently requires forming an image through a pin hole camera (i.e. a very small aperture) which (i) gathers only a very small amount of light, and (ii) increases the corruptive effects of diffraction introduced thereby, which distorts the formed images.
Another method using depth-of-field analysis, is disclosed in the paper entitled "Depth From Focus" by Paul Grossmann in Pattern Recognition letters, Volume 5, pages 63-69, Elsevier Science Publishers, B.V. This method is very similar to Pentland's first method and suffers from the same drawbacks, in that (i) it requires the knowledge of the location of edges, and (ii) it cannot be used unless there are perfect step edges present in the scene.
Optical range finding, as well as "depth-map" recovery can also be passively realized by using stereo vision processes, as disclosed for example in U.S. Pat. No. 4,601,053 to Grumet. Grumet discloses a method for the automatic quantitive ranging on at least a single remote passive object, employing a pair of spaced TV cameras (i.e. binocular vision). Small displacements of corresponding points in a stereo image pair are measured and converted into a range measurement.
However, while methods of depth-map recovery employing stereo vision are known, problems involving shape recovery and image correspondence between different images of a given scene, tend to make such techniques (i) technically difficult to implement (ii) computationally complex, and (iii) error prone because it involves assumptions about the scene geometry.
In view of the prior art discussed hereinabove, it is apparent that there is a great need in the range-finding art, in general, for a generalized method of determining in real-time, either passively or actively, the distance of objects and surface patches of three-dimensional scenes located from a camera system without the accompanying shortcomings and drawbacks of the prior art methods and apparatus.
Accordingly, it is a primary object of the present invention to provide a method and apparatus for determining the distance between a surface patch of a three-dimensional scene and a camera system, on the basis of a pair of two-dimensional images, each of which have been formed using a different set of camera parameter values, and wherein the changes in the values of camera parameters can occur in any of at least one or more of the following camera parameters:
(i) the distance between the second principal plane of the image forming system and the image detector plane of the camera system; PA1 (ii) the diameter of the camera aperture; and PA1 (iii) the focal length of the image forming system.
Another object of the present invention is to provide a method of simultaneously determining the distance of a plurality of surface patches of a three-dimensional scene measured from a camera system (i.e., depth-map recovery), on the basis of a pair of two-dimensional images of the same three-dimensional scene, each of which have been formed through a camera system having a dissimilar set of camera parameter values, and irrespective of whether any part of the image is in focus or not.
A further object of the present invention is to provide such a depth-map recovery process, which is parallel and involve only local computations. With the method of the present invention, there are no restrictions requiring that the camera parameters fall within any particular range, nor are there any assumptions made about the three-dimensional spatial scene being analyzed. The only requirement of the present method is the knowledge of the camera parameters and camera characteristics. The camera characteristic can be acquired initially using a suitable camera calibration procedure which need not be repeated during the process of depth recovery.
An even further object of the present invention is to provide a camera system and a method of rapidly and automatically focusing the camera system by employing the distance measurements determined using the above-mentioned method.
An even further object of the present invention is to provide a method of generating an improved-focus two-dimensional image using the method of depth map recovery of the present invention. This aspect of the present invention can be useful in television broadcasting of scenes containing objects at different distances. It can also be useful in processing images of three-dimensional specimens obtained from television microscopes used in medicine (e.g., opthamalic surgery) and microbiology.
Other and further objects will be explained hereinafter, and will be more particularly delineated in the appended claims, and other objects of the present invention will become apparent hereinafter to those with ordinary skill in the art to which the present invention pertains.