Reconstruction of 3-D objects or scenes from a sequence of uncalibrated images is currently a hot topic in the field of image processing and computer vision. Scene reconstruction techniques has gained much interest recently partly because of a rapidly increasing number of scene reconstruction applications together with the wide-spread use of digital cameras, film scanners, photographic scanners and other digitizing equipment. Examples of scene reconstruction applications include reconstruction of 3-D scenes and objects, scene determination and object localization for robot navigation, automatic construction of 3-D CAD models as well as creation of virtual reality environments, even generation of real-time 3-D views of dynamic scenes. For instance, a person could walk around his or her house with a camera, taking images from different views, and feed the images into a computerized 3-D scene builder to obtain a virtual 3-D visualization of the house.
The basic reconstruction problem can be formulated in the following way. Based on a sequence of uncalibrated images of a 3-dimensional scene or object taken by one or more cameras from different views, it is desired to recover the general 3-dimensional structure of the scene, as well as the position and orientation of the camera for each camera view. For simplicity, each unique position and orientation of the camera is often referred to as a “camera” in the scientific literature, although all the images may have been taken by a single camera, or even generated artificially by a computer. In the case of artificially generated computer images, each image view is associated with an “imaginary” camera, having a unique position and orientation.
In a typical approach for solving the reconstruction problem, a so-called projective reconstruction is first established based on image correspondences between the uncalibrated images and then the projective reconstruction is successively refined into a “normal” Euclidean reconstruction. When starting from a sequence of uncalibrated images, the best initial reconstruction that can be obtained based on image correspondences, i.e. identification of matching feature points in the images, is generally a projective reconstruction. A projective reconstruction is a configuration of scene points and cameras that is a projective transformation away from the true Euclidean configuration that was imaged. In similarity to Euclidean and so-called affine transformations, the unknown projective transformation is capable of translating, rotating and skewing the configuration of scene points and cameras. However, a projective transformation can also move the plane at infinity, which means that parallelism is generally not preserved. In order to be able to view a reconstruction of the scene in Euclidean space, the unknown projective transformation has to be determined. In practice, the determination of the unknown projective transformation has turned out to be a very difficult and complex task.
In the prior art, attempts have been made to determine the projective transformation by enforcing constraints on the camera views requiring that the cameras all have the same intrinsic calibration. Although a general projective transformation does not change the reprojected images, it can distort them very much to something that is not expected. The constraints imposed on the calibration facilitate the search for a member of the family of possible reconstructions that is likely. The process of finding such a likely reconstruction using constraints imposed on the calibration is generally referred to as auto-calibration or self-calibration, and is described for example in [1]. In an extension of the basic theory of auto-calibration it has been observed, for example in [2] and [3], that auto-calibration is possible under much looser assumptions, namely under the minimal assumption that the cameras have no skew, or that the pixels have a fixed or known aspect ratio. This opens up for auto-calibration on video sequences with a zooming camera.
Although the above auto-calibration procedures generate quite satisfactory results in some cases, they often produce rather poor results and sometimes even lead to complete failures.
It is has been observed that one of the main difficulties in auto-calibration is to find the true plane at infinity in the projective reconstruction, and therefore it has been proposed in references [4], [5] and [6] to impose additional constraints on the reconstruction by considering so-called cheirality. As defined in [5], object space is the 3-dimensional Euclidean space R3. Similarly. image space is the 2-dimensional Euclidean space R2. Euclidean space R3 is embedded in a natural way in projective 3-space P3 by the addition of a plane at infinity. Similarly. R2 may be embedded in the projective 2-space P2 by the addition of a line at infinity. The (n−1)-dimensional subspace at infinity in projective space Pn is referred to as the plane at infinity, except where we are specifically considering P2. The true plane at infinity p∞ (in other words the plane to be mapped to infinity in Euclidean space) has a well-defined but initially unknown position in the projective reconstruczion. As defined in reference [5], which provides a basic presentation of the concept and theory of cheirality, the property of a point that indicates whether it lies in front of or behind a given camera is generally referred to as the cheirality of the point with respect to the camera. The additional cheirality constraints imposed on the projective reconstruction require that all the reconstructed scene points must lie in front of the cameras that imaged them. This is not true for an arbitrary projective reconstruction. By using the cheirality constraints, expressed in terms of so-called cheiral inequalities, for all points in the projective reconstruction, the search for the true plane at infinity can be narrowed down considerably. This is generally accomplished by making a preliminary transformation of the initial projective reconstruction to a so-called quasi-affine reconstruction of the scene points and cameras based on the given cheiral inequalities. A quasi-affine reconstruction of a scene is a projective reconstruction in which the reconstructed scene is not split by the plane at infinity.
Although the introduction of cheirality-based scene reconstruction methods constitutes a great advance in the field of auto-calibration, there are still remaining problems with regard to convergence and stability.