Stereo vision consists of three-dimensional data recovery of an object viewed from two or more view-points. Some fields of application for stereo vision are industrial, such as quality control of production lines, where concepts of inspection and sampling to remove defective products are required. Medicine is another important field where highly accurate models are required by specialists. Obtaining dense and accurate three-dimensional models is computationally expensive and can cause a bottleneck on production lines.
Stereo vision generally involves several stages. First, a calibration process is necessary. This process comprises both stereo and radiometric aspects. After that, a correspondence analysis is applied to the stereo images and finally the three dimensional model is obtained.
The calibration process generally consists of stereo and radiometric stages. The stereo calibration stage is solved by the geometric calibration of each camera independently and then a geometric transformation is applied to find out the geometry of the stereo setting. This geometric calibration leads to knowledge of rotation and position of the camera (commonly called the extrinsic camera parameters) and its internal characteristics (intrinsic camera parameters) such as focal length, position of the principal point, difference in scale of the image axes and so on.
There are many calibration methods that have been described for use with commercially available cameras. One such example is described in TSAI, R. Y., “An efficient and accurate camera calibration technique for 3D machine vision,” Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami Beach, Fla., 1986, pp. 364-374, and also in LENZ, R. K. and TSAI, R. Y., “Techniques for calibration of the scale factor and image center for high accuracy 3-D machine vision metrology,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5):713-720, September 1988, both of which are incorporated herein by reference.
In the articles cited above, Tsai and Lenz proposed a practical solution for off-the-shelf camera calibration. This solution estimates the relative rotation and translation by introducing a radial alignment constraint. Using these estimated values, the camera parameters can be derived by optimization. Although Tsai's model is widely used in computer vision, it produces the poorest result among other models widely used.
Another popular method is known as “Direct Linear Transformation” (DLT), and is described in ABDEL-AZIZ, Y. I. and KARARA, H. M., “Direct linear transformation into object space coordinates in close-range photogrametry,” Proceedings Symposium Close-Range Photogrametry, pp. 1-18, University of Illinois, Urbana, 1971, incorporated herein by reference. A limitation of the DLT model is that it does not take care of lens distortion, which may severely affect the measurement accuracy.
Another classical calibration techniques is known as Haralick's standard solution, and is described in HARALICK, R. and SHAPIRO, L., Computer and Robot Vision, Volume II, Chapter 13, “Perspective Projective Geometry,” pp. 43-124, Addison Wesley, Reading, Mass., 1993 and in POELZLEITNER, W. and ULM, M., “Comparative study of camera calibration algorithms with application to spacecraft navigation, “Proceedings of SPIE, vol. 2350, Videometrics III, Sabry F. El-Hakim, Editor, October 1994, pp. 187-196, both of which are incorporated herein by reference.
Haralick's standard solution uses an iterative method to find three extrinsic camera parameters (three angles for the rotation between world and pinhole coordinate system) and seven intrinsic parameters (three camera distortion factors, image center as well as scale factors in both horizontal and vertical image coordinates). One problem with Haralick's standard solution is that the non-linear equations make it impossible to get a direct solution. Thus, partial differentiation of the non-linear equation is generally used and the high order non-linear terms are omitted before iteration can be performed. This means a good guess of initial parameters must be available and it cannot be guaranteed that the iteration will get a convergent result. A similar non-linear optimization method is described in BROWN, D. C., “Close-range camera calibration,” Photogrammetric Engineering, vol. 37, no. 8, pp. 855-866, 1971, incorporated herein by reference.
Many of the disadvantages of the methods discussed above are addressed with a new calibration approach derived from the Haralick model, known as the Gintic model, and is described in JIAN, X., MALCOLM, A., and ZHONGPING, F., Camera Calibration with Micron Level Accuracy, SIMTech Technical Report (AT/01/037/PS), Singapore, 2001, incorporated herein by reference. The Gintic model simplifies the Haralick model equations and guarantees that the optimal solution is always found, and this approach finds the camera parameters which best describe the camera behavior.
The second stage of calibration is radiometric calibration. This step is necessary to recover depth information from the objects in the scene considering an arbitrary or unknown surface reflectance. The calibration of light anisotropy and the relative orientation of the anisotropy with respect to the camera are fulfilled after the radiometric calibration. Radiometric calibration is required on the assumption that camera photo-response is not linear and spatially non uniform and the lights are not isotropic. The article JANKO, Z., DRBOHLAV, 0., and SARA, R., “Radiometric calibration of a Helmholtz stereo rig,” Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 27 Jun.-2 Jul. 2004, vol. 1, pp. 166-171, incorporated herein by reference, illustrates that radiometric calibration improves accuracy on depth data recovery.
After radiometric and geometric calibration, the next step for three-dimensional stereo recovery is the correspondence analysis which is commonly classified as either active or passive, depending on the illumination control. Active approaches adopt light scene manipulation, for example photometric stereo illumination, while passive methods do not depend on the scene illumination. The surface reflectance distribution depends on the incident illumination over the scene objects and the material of the object. Therefore, active and passive approaches both require a previous consideration of surface reflectance, which is assumed, either implicitly or explicitly.
Passive stereo approaches assume implicitly the reflectance of surfaces. That is, they assume that the reflectance of the scene objects is the same regardless the acquisition view-point. On the other hand, active stereo approaches establish a known parametric reflectance form. That is, they calculate a reflectance function from different physical models, and later they include this function in the recovery constraint. Both considerations are not valid in the real world, since reflectance depends on different aspects, such as optical characteristics of surfaces, illumination incidence angle, optical view-points positions, etc.
Other three dimensional recovery techniques refer to systems in which the scene is illuminated by a known geometrical pattern of light. These are known as structured lighting systems, and they make use of the triangulation principle to compute depth. The main disadvantage of structured lighting methods is that they require several image acquisitions (in the order of 4 to 32) under different lighting conditions in order to obtain a dense depth map, so that, these recovery techniques are computationally expensive methods, and they are not presently practical for fast 3D acquisition applications like production lines inspection or object modeling
Recently a new stereo technique has been proposed in MAGDA, S., KRIEGMAN, D., ZICKLER, T., and BELHUMEUR, P., “Beyond Lambert: Reconstructing Surfaces with Arbitrary BRDFs,” Proceedings of the International Conference on Computer Vision (ICCV), 2001, pp. 391-399, and ZICKLER, T., BELHUMEUR, P., and KRIEGMAN, D., “Helmholtz Stereopsis: Exploiting Reciprocity for Surface Reconstruction,” Proceedings of the 7th European Conference on Computer Vision (ECCV), May 2002, vol. 3, pp. 869-884, both of which are incorporated herein by reference. This new approach is proposed to achieve the recovery of depth information of scene objects considering an arbitrary or unknown surface reflectance.
This recent stereo technique takes advantage of the reflectance symmetry of surfaces. Reflectance symmetry, or reciprocity, allows an arbitrary form of surface reflectance. Thus, under a controlled illumination environment, the restrictions caused by optical properties inherent to surface materials are eliminated, and depth recovery from any kind of surfaces can be obtained.
The stereo technique based on the reflectance reciprocity has other advantages, for example, since stereo images have reciprocal irradiance, the specularities appear fixed over the surface, which is an advantage over other stereo approaches, because it is possible to match corresponding specular regions. Moreover, half-occluded regions are corresponding. That is, a half-occluded region on the left stereo image appears shadowed on the right stereo image, and vice versa. This property may enhance the quality of depth reconstruction since it allows determining depth discontinuities. Furthermore, textureless and flat surfaces cannot be recovered using either active or passive conventional stereo techniques.
A simplification of the multi-ocular stereo case is presented in ZICKLER, T., HO, J., KRIEGMAN, D., PONCE, J., and BELHUMEUR, P., “Binocular Helmholtz Stereopsis,” Proceedings of the International Conference on Computer Vision (ICCV), 2003, pp. 1411-1417, incorporated herein by reference. This article proposes a method for how a dense depth map can be retrieved from a single pair of reciprocal stereo images, considering orthographic point's projection.
Unfortunately, this article does not illustrate how such a technique can be easily integrated with the calibration process. In particular, a mechanical calibration process is assumed in which the cameras are re-positioned and detailed measurements are made. The aforementioned article also does not illustrate the steps involved in the proposed approach. Furthermore, Zickler's approach establishes a stereo analysis considering an orthographic point projection. This kind of projection geometry can be achieved if the stereo setup is far away from the scene objects. Therefore, this restriction eliminates the possibility of a practical set up which can be used on industrial tasks where space is an important constraint.
Accordingly, there is a need for a method and apparatus for three-dimensional depth recovery utilizing the concept of reflectance reciprocity that does not have the limitations stated above. In particular, there is a need for a complete system, that is, a practical method and apparatus, which tackles all the stages of a highly accurate three-dimensional recovery process, from camera calibration to a final grid adjustment process, including some specific tasks related with the reciprocal reflectance property. Moreover, a complete system should fulfill the real-time requirements of applications such as industrial inspection.