Reconstructing 3D models from video sequences is an important problem in computer graphics with applications to recognition, medical imaging, video communications, etc. Though numerous algorithms exist which can reconstruct a 3D scene from two or more images using structure from motion (SfM) (R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000), the quality of such reconstructions is often insufficient. The main reason for this is the poor quality of the input images and a lack of robustness in the reconstruction algorithms to deal with it (J. Oliensis, “A critique of structure from motion algorithms,” Tech. Rep. http://www.neci.nj.nec.com/homepages/oliensis/, NECI, 2000). The sensitivity and robustness of the existing reconstruction algorithms have been thoroughly analyzed. The work of Weng, et al. (J. Weng, N. Ahuga, and T. S. Huang, Optimal motion and structure estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15:864–884, September 1993; J. Weng, T. S. Huang, and N. Ahuja. 3D motion estimation, understanding, and prediction from noisy image sequences. IEEE Trans. on Pattern Analysis and Machine Intelligence, 9:370–389, 1987) is one of the earliest instances of estimating the standard deviation of the error in reconstruction using first-order perturbations in the input. The Cramer-Rao lower bounds on the estimation error variance of the structure and motion parameters from a sequence of monocular images was derived in T. J. Broida and R. Chellappa. Performance bounds for estimating three-dimensional motion parameters from a sequence of noisy images. Journal of the Optical Society of America, A., 6:879–889, 1989. Young and Chellappa derived bounds on the estimation error for structure and motion parameters from two images under perspective projection (G. S. Young and R. Chellappa, Statistical analysis of inherent ambiguities in recovering 3D motion from a noisy flow field. IEEE Trans. on Pattern Analysis and Machine Intelligence, 14:995–1013, October 1992) as well as from a sequence of stereo images (G. S. Young and R. Chellappa. 3D motion estimation using a sequence of noisy stereo images: Models, estimation, and uniqueness results). Pattern Analysis and Machine Intelligence, 12(8):735–759, August 1990). Similar results were derived in (K. Daniilidis and H. H. Nagel. The coupling of rotation and translation in motion estimation of planar surfaces. In Conference on Computer Vision and Pattern Recognition, pages 188–193, 1993) and the coupling of the translation and rotation for a small field of view was studied. Daniilidis and Nagel have also provided that many algorithms for three-dimensional motion estimation, which work by minimizing an objective function leading to an eigenvector solution, suffer from instabilities (K. Daniilidis and H. H. Nagel. Analytic results on error sensitivity of motion estimation from two views. Image and Vision Computing, 8(4):297–303, November 1990). They examined the error sensitivity in terms of translation direction, viewing angle and distance of the moving object from the camera. Zhang's work (Z. Y. Zhang. Determining the epipolar geometry and its uncertainty: A review. International Journal of Computer Vision, 27:161–195, March 1998) on determining the uncertainty in estimation of the fundamental matrix is another important contribution in this area. Chiuso, Brockett and Soatto (S. Soatto and R. Brockett. Optimal structure from motion: Local ambiguities and global estimates. In Conference on Computer Vision and Pattern Recognition, pages 282–288, 1998) have analyzed SfM in order to obtain probably convergent and optimal algorithms. Oliensis emphasized the need to understand algorithm behavior and the characteristics of the natural phenomenon that is being modeled (J. Oliensis. A critique of structure from motion algorithms. Technical Report http://www.neci.nj.nec.com/homepages/oliensis/, NECI, 2000). Ma, Kosecka and Sastry (Y. Ma, J. Kosecka, and S. Sastry. Linear differential algorithm for motion recovery: A geometric approach. International Journal of Computer Vision, 36:71–89, January 2000) also addressed the issues of sensitivity and robustness in their motion recovery algorithm. Sun, Ramesh and Tekalp (Z. Sun, V. Ramesh, and A. M. Tekalp. Error characterization of the factorization method. Computer Vision and Image Understanding, 82:110–137, May 2001) proposed an error characterization of the factorization method for 3-D shape and motion recovery from image sequences using matrix perturbation theory. Morris and Kanatani extended the covariance-based uncertainty calculations to account for geometric indeterminacies, referred to in the literature as gauge freedom (D. D. Morris, K. Kanatani, and T. Kanade. 3D model accuracy and gauge fixing. Technical report, Carnegie-Mellon University, Pittsburgh, 2000).
However, despite of such a rather extensive research and studies, the quality of prior art reconstruction techniques results in a non-optimized simulation of the reconstructed object.
One particular application of 3D reconstruction from 2D images is in the area of modeling a human face from video. The successful solution of this problem has immense potential for applications in face recognition, surveillance, multimedia, etc. A few algorithms exist which attempt to solve this problem using a generic model of a face (P. Fua, “Regularized bundle-adjustment to model heads from image sequences without calibration data,” International Journal of Computer Vision, vol. 38, no. 2, pp. 153–171, July 2000, Y. Shan, Z. Liu, and Z. Zhang, “Model-based bundle adjustment with application to face modeling,” in International Conference on Computer Vision, 2001, pp. 644–651). Their typical approach is to initialize the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to the initial value of the generic model, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. This method may produce acceptable results when the generic model has significant similarities with the particular face being reconstructed. However, if the features of the generic model are very different from those being reconstructed the solution from this approach may be highly erroneous.
An alternative approach to reconstruction of a 3D model of a face would be therefore highly desirable in the area of modeling of an object such as a human face which would permit generating a model of the object or human face retaining the specific features of the object or face even when these features are different from those of the generic model.