Very low bit-rate communication channels such as wireless telephone lines require extremely high compression rates. One particular need is in video conferencing. It is generally assumed that the traditional, signal processing based compression schemes are not going to be sufficient to achieve such high compression rates. The alternative is to use as much domain knowledge which the sender and receiver share and only send the information which is specific to the particular scene and/or situation which would allow the receiving end to reconstruct the visual information.
There are a number of previous works in the area of model based video coding which are relevant to the invention described here. (See for example, K. Aizawa and T. S. Huang, "Model-based image coding: Advanced video coding techniques for very low bit rate applications." Proceedings of IEEE, 83(2): 259-271, February 1995; Demetri Terzopoulos and Keith Waters, "Analysis and synthesis of facial image sequences using physical and anatomical models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(6): 569-579, June 1993.; and Haibo Li, Pertti Roivainen, and Robert Forchheimer. "3-D motion estimation in model-based facial image coding. IEEE Transaction on Pattern Analysis and Machine Intelligence, 15(6):545-555, June 1993.) Two works that are closely tied to the current invention are the eigenface coding by Pentland, et al. (See Baback Moghaddam and Alex Pentland, "An automatic system for model-based coding of faces," In Proc. of the IEEE Data Compression Conference, Snowbird, Utah, March 1995. IEEE) and the feature based facial model fitting by Li-An Tang from the University of Illinois. (See Li-An Tang. Human Face Modeling, Analysis and Synthesis. Ph.D. thesis, Electrical Engineering Department, University of Illinois at Urbana-Champain, Urbana, Ill., 1996.) In the case of the eigenface coding, the coding is done on the images and there is no further 3D modeling involved. In the case of the facial feature based model fitting, the facial model is fit, but the texture mapped image of the face is the original full face image. Many of the other previous works either work in 2D; or they work with no texture mapping (See Irfan A. Essa and Alex P. Pentland, "Facial expression recognition using a dynamic model and motion energy", In International Conference on Computer Vision '95, Cambridge, Mass., June 1995.) or with texture mapping using original images or sub-images.
It is highly desirable to send a facial image in a highly compressed manner.