Image-based object reconstruction is the process of estimating shape, volume, and surface reflectance properties on an object from its images. Reconstruction of three-dimensional objects in a scene from multiple two-dimensional images of the scene has been the subject of research since the late 19th century. Reconstruction has also recently become particularly important in, for example, computer vision and robotics. The geometric relation between three-dimensional objects and the images created by a simple image recorder such as a pinhole camera (e.g., a camera without a lens) is a source of information to facilitate a three-dimensional reconstruction. Current practical commercial systems for object reconstruction generally rely on reconstruction from aerial photographs or from satellite images. In both cases, cameras are used which record images from two locations, whose positions relative to a scene are precisely determinable. In reconstruction from aerial photographs, two cameras are mounted with precise spacing and orientation on a common airborne platform, which ensures that the geometries of the cameras relative to each other are fixed in a known condition. With satellites, the positions and orientations of the satellites can be determined with great accuracy, thereby providing the geometrical information required for reconstruction with corresponding precision. In any case, reconstruction of the desired objects shown in the images can be performed from two-dimensional photographic or video images taken from such an arrangement.
Generally, reconstruction methods are non-linear and do not behave well in the presence of errors in measurement of the various camera calibration parameters and in the images from which the objects are to be reconstructed. Conventional reconstruction methods typically rely on successful decoupling of two sets of parameters known as intrinsic and extrinsic parameters. Extrinsic parameters are related to an external geometry or arrangement of the cameras, including rotation and translation between a coordinate frame of one camera in relation to a coordinate frame of a second camera. Intrinsic parameters associated with each camera is related to the camera's internal geometry in a manner that describes a transformation between a virtual camera coordinate system and a true relationship between the camera's image plane and its center of projection (COP). The intrinsic parameters can be represented by the image's aspect ratio, the skew and the location of the principal point, that is, the location of the intersection of the camera's optical axis and the image plane.
These intrinsic and extrinsic parameters are coupled together and it is possible to recover a Euclidean three-dimensional structure of a scene depicted in two views only if these two sets of parameters can be decoupled. The precise manner in which the intrinsic and extrinsic parameters are coupled together is as follows. If the intrinsic parameters for the cameras are used to form respective three-by-three matrices M and M′, and R and “t” represent the rotational and translational external parameters, then for points p=(x,y,1)T and p′=(x′,y′,1)T (“T” represents the matrix transpose operation) representing the projection in the two images of a single point P in the scene,z′p′=zM′RM−1 p−M′t (1)where z and z′ represent respective depth values for point P relative to the two camera locations.
There are several general methods for reconstruction. In one set of methods, the values of the various parameters in equation (1) are determined. In one such method the values of the internal parameters are determined by a separate and independent “internal camera calibration” procedure that relies on images of specialized patterns. In a second such method, more than two views of a scene are recorded and processed and the two sets of parameters are decoupled by assuming that the internal camera parameters are fixed for all views. One significant problem with the first approach (using a separate internal camera calibration step) is that even small errors in calibration lead to significant errors in reconstruction. The methods for recovering the extrinsic parameters following the internal calibration are known to be extremely sensitive to minor errors in image measurements and require a relatively large field of view in order to behave properly. In the second approach (using more than two views of a scene) the processing techniques are iterative based on an initial approximation, and are quite sensitive to that initial approximation.
Another set of methods does not require determining the values of the various parameters in equation (1). Instead, reconstruction is performed from an examination of various features of the scene which are present in images of the scene that are recorded from a plurality of diverse locations. All of these methods require that corresponding points and/or lines, that is, points and/or lines in the views which are projections of the same points and/or lines in the scene, be located in all of the three views. In some applications, locating corresponding points and/or lines in the three views can be difficult or impossible.
When the relative dimensions of the object are determined from the image, the conventional processes require an extra measurement to obtain the actual physical dimensions. The extra measurement can be performed utilizing a ruler positioned on the object, or through range measurement of the object from the camera. Ranging can be performed by utilizing stereovision, an ultrasonic sensor, a calibrated focusing mechanism or a defocus measurement from images. However, stereovision and ultrasonic sensor techniques present the high system cost; the calibrated focusing mechanism requires an adjustable focus system which may not be available on all imaging systems and a relatively fast optical system; and the defocus measurement also requires a relatively fast optical system, as well as, two images taken with different optical parameters.