Three dimensional object recognition is presently an active area of vision research. The limitations of two dimensional analysis have been realized in many applications. In a typical bin picking operation, the position and shape of an object must be determined to enable the robot to securely grasp the object. An essential part of a three dimensional recognition system is shape extraction. Any ambiguity in the physical shape of an object generally renders the recognition problem more difficult. Hence, the advent of three dimensional vision systems has created considerable interest in the development of high quality depth sensors.
Stereo is a popular technique for depth perception. It has generated much interest in the research community due to its apparently strong resemblance to the mammalian approach to depth perception. In stereopsis, images of the scene are recorded from two different perspectives. The two perspectives are obtained by using two cameras to observe the scene. Features such as edges are extracted from both images and a point-to-point correspondence is established between the images on the basis of feature values. Range or depth is recovered from each pair of corresponding points by triangulation. The passive nature of stereopsis makes it an attractive depth perception method. It is suited to most applications, unlike "active" sensing methods such as radar, laser ranging and structured light.
Stereo systems have a limited field of view. The depth of a point can be measured only if the point is seen by both cameras. Therefore, the field of view of a stereo system is the intersection of the fields of view of the two cameras. The field of view of a typical stereo system, is shown in FIG. (8). Depth can be measured only at those points that can be seen in both the camera images. Therefore, the field of view of a stereo system is the intersection of the fields of view of the two cameras. A large field of view can be obtained by minimizing the baseline D and keeping the viewing directions of the two cameras almost equal. Such an arrangement, however, results in lower depth resolution. A high resolution in depth is obtained by making the viewing directions of the two cameras orthogonal to each other. However, this configuration drastically reduces the field of view of the stereo system. Also, in stereo, objects must be placed close to the focal plane of both cameras in order to avoid the blurring of image points.
Furthermore, stereo systems are posed with the acute problem of calibration. Corresponding points in the two images are projections of single point in the three dimensional scene. In order to triangulate and determine the three dimensional coordinates of a point, the parameters of the two cameras must be known. Therefore, for a given configuration of the cameras, calibration of the intrinsic and extrinsic parameters of the cameras is necessary. Many researchers have studied the stereo calibration problem. One approach is to independently calibrate the two cameras by using a set of points at known locations in a common frame of reference. An alternative method does not rely on knowing the locations of the calibration points, but rather the correspondence between the points in the images. D. B. Gennery, "Stereo-Camera Calibration", Proceeding Image Understanding Workshop, pp. 101-108, November, 1979. proposed performing the calibration by a generalized least-squares adjustment. Errors are formulated by using the epipolar constraint. Minimization of the errors result in estimates of the camera parameters. O. D. Faugeras and G. Toscani, "The Calibration Problem for Stereo", IEEE, 1986. Have suggested a recursive estimation of the camera parameters by using extended Kalman filtering. The complexity of the calibration procedure has limited the applicability of stereo systems. Since it is computationally inefficient to perform the calibration on-line, the relative positions and orientations of the cameras need to be rigidly fixed.
The present invention is a new approach to stereo vision. Essentially, two spheres with highly reflective surfaces are placed in the view of a single camera. Reflections of the three dimensional scene are recorded in the image of the spheres. Hence, a single camera image has two different perspectives of the three dimensional world. These two perspectives are equivalent to images obtained from two different camera locations in stereo systems. The use of a single fixed camera avoids the stereo calibration problem. However, the position of the two spheres must be known to be able to recover depth by triangulation. To this end, a simple calibration procedure determines the location of the two spheres in real time. In other words, each camera image contains information regarding the positions of the spheres and the depth of points in the three dimensional world, at the same instant of time. Hence, the positions of the spheres are first determined and then used to compute the depth of points in the scene. Such a system is known as a sphereo system.