In computer graphics, it is still a fundamental problem to synthetically construct realistic human heads, particularly the face portion. Hereinafter, when referring to ‘head’ or ‘face’, the invention is most interested in that portion of the head extending from chin-to-brow, and ear-to-ear. Most prior art methods require either extensive manual labor by skilled artists, expensive active 3D scanners, Lee et al., “Realistic Modeling for Facial Animations,” Proceedings of SIGGRAPH 95, pages 55-62, August, 1995, or the availability of high quality of texture images as a substitute for exact face geometry, see Guenter et al., “Making Faces,” Proceedings of SIGGRAPH 98, pages 55-66, July 1998, Lee et al., “Fast Head Modeling for Animation,” Image and Vision Computing, Vol. 18, No. 4, pages 355-364, March 2000, Tarini et al., “Texturing Faces,” Proceedings Graphics Interface 2002, pages 89-98, May 2002.
To acquire 3D models for human faces by active sensing requires costly scanning devices. Therefore, a number of techniques have been developed to recover the 3D shape of faces from 2D images or ‘projections’. Some of those methods are based on a direct approach, which obtains 3D location of reference points on the face using dense 2D correspondences of the images, P. Fua, “Regularized bundle-adjustment to model heads from image sequences without calibration data,” International Journal of Computer Vision, 38(2) pp. 153-171, 2000, F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. Salesin, “Synthesizing realistic facial expressions from photographs,” Proceedings of SIGGRAPH 98, 1998. and Y. Shan, Z. Liu, and Z. Zhang, “Model-based bundle adjustment with application to face modeling,” Proceedings of ICCV 01, pp. 644-651, July 2001.
Other methods parameterize 3D face models, and search for optimal parameters that best describe the 2D input images, V. Blanz and T. Vetter, “Face recognition based on fitting a 3D morphable model,” PAMI, 25(9), 2003, J. Lee, B. Moghaddam, H. Pfister, and R. Machiraju, “Silhouette-based 3D face shape recovery,” Proc. of Graphics Interface, pp. 21-30, 2003, and B. Moghaddam, J. Lee, H. Pfister, and R. Machiraju. “Model-based 3D face capture using shape-from-silhouettes,” Proc. of Advanced Modeling of Faces & Gestures, 2003.
In either case, the number of viewpoints and 2D input images is an important parameter for high quality 3D model reconstruction. Intuitively, the more input images that are taken from different viewpoints, the higher the quality of the 3D model and subsequent reconstructions. But, that increases processing time and the cost of equipment.
However, if an optimal set of viewpoints can be determined, then it becomes possible to use a smaller number of cameras and their resulting 2D images provide better 3D modeling accuracy.
Up to now, a systemic method for determining the optimal number of viewpoints and, thus, input images, for the purpose of constructing a 3D model of a face has not been available. It would also be advantageous to select automatically specific images out of a sequence of images in a video, the selected images corresponding to optimal viewpoints to improve face recognition.
It is known that different objects have different prototype or aspect viewpoints, C. M. Cyr and B. B. Kimia, “Object recognition using shape similarity-based aspect graph,” Proc. of ICCV, pp. 254-261, 2001.
It is desired to determine a canonical set of optimal viewpoints for a specific class of objects with notably high intra-class similarity, specifically the human face.
When dealing just with illumination, it is possible to determine empirically an optimal configuration of nine point sources of light which span a generic subspace of faces under variable illumination, K. Lee, J. Ho, and D. Kriegman, “Nine points of light: Acquiring subspaces for face recognition under variable lighting,” Proc. of CVPR, pp, 519-526, 2001.
It is desired to solve a related problem for subject pose, or equivalently camera viewpoint. That is, it is desired to determine an optimal set of K viewpoints corresponding to a spatial configuration of K cameras that best describe a 3D human face by way of projections from the viewpoints, i.e., shape silhouettes in 2D images.