1. Field of the Invention
The present invention is a system and method for human face modeling from multiple images of the face using demographics classification for an improved model fitting process.
2. Background of the Invention
Three-dimensional (3D) modeling of human faces from intensity images is an important problem in the field of computer vision and graphics. Applications of such an automated system range from virtual teleconferencing to face-based biometrics. In virtual teleconferencing applications, face models of participants are used for rendering scenes at remote sites, with only the need for incremental information to be transmitted at every time instance. Traditional face recognition algorithms are primarily based on the two-dimensional (2D) cues computed from an intensity image. The 2D facial features provide strong cues for recognition. However, it cannot capture the semantics of the face completely, especially the anthropometrical measurements. Typical examples of these would be the relative length of the nose bridge and the width of the eye, the perpendicular distance of the tip of the nose from the plane passing through the eye centers and the face center, etc.
The technique discussed by Aizawa and Huang in “Model-Based Image Coding: Advanced Video Coding Techniques for Very Low Bit-Rate Application,” Proceedings IEEE, vol. 83, pp. 259-271, August 1995 adjusts meshes to fit the images from a continuous video sequence. In a surveillance scenario, we may have only the key frames from a single, or a multiple camera system, for specific time instances. Thus the computation of optical flow between consecutive image frames, captured by each of the cameras, will not be possible.
The techniques discussed by Jebara and Pentland in “Parameterized structure from motion for 3D adaptive feedback tracking of faces,” Proceedings Computer Vision and Pattern Recognition, pp. 144-150, June 1997 also uses optical flow computed from consecutive frames in a video to compute the model.
Fua and Miccio in “Animated Heads from Ordinary Images: A Least-Squares Approach,” Computer Vision and Image Understanding, vol. 75, No. 3, pp. 247-259, September 1999 use a stereo matching based technique for face modeling. Under multiple camera surveillance, the camera system may not be calibrated properly. This is because these cameras can be moved around, whenever required. Thus, the assumption of the knowledge of calibration parameters, especially in stereo-based techniques, breaks down.
U.S. Pat. No. 6,556,196 describe a morphable model technique which require a frontal shots of the face. The single view based modeling approaches works well with cooperative subjects, where the entire frontal view of the face is available. Again, in vieo surveillance, it may be difficult to control the posture of the subject's face.
U.S. Pat. No. 6,016,148 discusses a method of mapping a face image to a 3D model. The 3D model is fixed, and general. No knowledge of the demographics of the person is used, and this mapping can be erroneous, especially while using a generic model for any race or gender.
U.S. Pat. No. 5,748,199 discusses a method of modeling three-dimensional scenes from a video, by using techniques similar to structure from motion. This technique would not be successful if continuous video feed is not provided to the system. Similar modeling technique is discussed in U.S. Pat. No. 6,047,078. U.S. Pat. No. 6,492,986 combines optical flow with deformable models for face modeling. As before, these techniques will not be successful when there is no continuous video stream.
U.S. Pat. No. 5,818,959 discusses a method similar to space curving for generating three-dimensional models from images. Although these images need not be from continuous video sources, they need to be calibrated a-priori. Camera calibration is not a trivial task, especially for portable camera systems.