Reconstruction of three-dimensional objects in a scene from multiple two-dimensional images of the scene has been the subject of research since the late 19th century. Such reconstruction may be useful in a number of areas, including obtaining information about physical (three-dimensional) characteristics of objects in the scene, such as determination of the actual three-dimensional shapes and volumes of the objects. Reconstruction has also recently become particularly important in, for example, computer vision and robotics. The geometric relation between three-dimensional objects and the images created by a simple image recorder such as a pin-hole camera (that is, a camera without a lens) is a source of information to facilitate a three-dimensional reconstruction. Current practical commercial systems for object reconstruction generally rely on reconstruction from aerial photographs or from satellite images. In both cases, cameras are used which record images from two locations, whose positions relative to a scene are precisely determinable. In reconstruction from aerial photographs, two cameras are mounted with precise spacing and orientation on a common airborne platform, which ensures that the geometries of the cameras relative to each other are fixed in a known condition. With satellites, the positions and orientations of the satellites can be determined with great accuracy, thereby providing the required geometrical information required for reconstruction with corresponding precision. In any case, reconstruction of the desired objects shown in the images can be performed from two-dimensional photographic or video images taken from such an arrangement.
Generally, reconstruction methods are non-linear and they generally do not behave well in the presence of errors in measurement of the various camera calibration parameters and in the images from which the objects are to be reconstructed. Conventional reconstruction methods rely on the successful decoupling of two sets of parameters known as intrinsic and extrinsic parameters. The extrinsic parameters are related to the external geometry or arrangement of the cameras, including the rotation and translation between the coordinate frame of one camera in relation to the coordinate frame of the second camera. The intrinsic parameters associated with each camera is related to the camera's internal geometry in a manner that describes a transformaton between a virtual camera coordinate system and the true relationship between the camera's image plane and its center of projection (COP). The intrinsic parameters can be represented by the image's aspect ratio, the skew and the location of the principal point, that is, the location of the intersection of the camera's optical axis and the image plane. (Note that the camera's focal length is related to the identified intrinsic parameters, in particular the aspect ratio, and thus it need not be considered as a parameter.)
These intrinsic and extrinsic parameters are coupled together and it is possible to recover the Euclidean three-dimensional structure of the scene depicted in two views only if these two sets of parameters can be decoupled. The precise manner in which the intrinsic and extrinsic parameters are coupled together is as follows. If the intrinsic parameters for the cameras are used to form respective three-by-three matrices M and M', and R and "t" represent the rotational and translational external parameters, then for points p=(x,y,1).sup.T and p'=(x',y',1).sup.T ("T" represents the matrix transpose operation) representing the projection in the two images of a single point in the scene, EQU z'p'=zM'RM.sup.-1 p-M't Eqn.(1)
where z and z' represent respective depth values relative to the two camera locations.
In general, there are two conventional methods for reconstruction. In one method, the values of the internal parameters are determined by a separate and independent "internal camera calibration" procedure that relies on images of specialized patterns.
In the second reconstruction method, more than two views of a scene are taken and processed and the two sets of parameters are decoupled by assuming that the internal camera parameters are fixed for all views. Processing to determine the values of the parameters proceeds using non-linear methods, such as recursive estimation, non-linear optimization techniques such as Levenberg-Marquardt iterations, and more recently projective geometry tools using the concept of "the absolute conic."
One significant problem with the first approach (using a separate internal camera calibration step) is that even small errors in calibration lead to significant errors in reconstruction. The methods for recovering the extrinsic parameters following the internal calibration are known to be extremely sensitive to minor errors in image measurements and require a relatively large field of view in order to behave properly. In the second approach (using more than two views of a scene) the processing techniques are iterative based on an initial approximation, and are quite sensitive to that initial approximation. In addition, the assumption that the internal camera parameters are fixed is not always a good assumption.