The present invention relates in general to video display devices, and, more particularly, to an apparatus and method for transmitting graphical representations, such as an image of a head.
Humans communicate using several signals, for example, voice, facial expressions, and, to a lesser extent, hands and body movement. When face to face, these signals are processed accordingly to convey communication. On a non-visual communications device, such as a phone, these signals are not processed. Video devices may transmit the signals, but require high bandwidth. Thus, devices with low bandwidth, such as cell phones, are not able to transmit these signals.
One method to achieve low-bit rate communication is model-based visual communications, as described by K. Aizawa and T. S. Huang, Model-Based Image Coding: Advanced Video Coding Techniques for Very Low Bit-Rate Applications, Proceedings of IEEE, 82(2), 259-271 (February 1995). Model-based visual communications takes an object of interest, and uses it at both ends of communication so that only model parameters are transmitted. By using a model of a human head, only parameters that describe the static and dynamic state of the head are transmitted with each frame instead of full video. This method, however, results in a unconvincing representation of the head at the receiving display.
Human communication is multi-modal. Emotion and meaning are transmitted via facial expressions, using the eyebrows, eyes, and the mouth, as described by Paul Ekman and Wallace Friesen, Unmasking the Face, Prentice Hall, Inc., Englewood Cliffs, N.J., 1975. Eyebrow movements give rise to forehead wrinkles, eye movements give rise to wrinkles and cheek movements, and mouth movements affect the jaw line and cheeks. In order to convey facial expressions convincingly, the synthesized face must contain motion that is similar to the original. This operation requires a system to track the eyebrows, eyes, and mouth. Previous attempts have involved systems that require a high bandwidth to transmit these signals. Existing systems utilize optical flow with a high computational cost, as described by Malcolm Davis and Mihran Tuceryan, Coding of Facial Image Sequences by Model-Based Optical Flow, Proceedings of the 1997 Int""l Workshop on Synthetic-Natural Hybrid Coding and 3D Imaging, at 192-194 (September 1997), and Douglas DeCarlo and Dimitris Metaxas, The Integration of Optical Flow and Deformable Models with Applications to Human Face Shape and Motion Estimation, Proceedings CUPR 96, at 231-2138 (1996).
From the foregoing, it may be appreciated that a need has arisen for a method for extracting, transmitting, and displaying a graphical representation with reduced band-width requirements and increased optical flow.
In accordance with an embodiment, a method for transmitting and displaying graphical representations comprises capturing an image of a head with an eye portion that correlates to an eye in the head, locating a target image of the eye within the image with a non-updating tracker, positioning the eye portion at the location of the target image of the eye, and, if the non-updating tracker is unable to locate the eye target image, then locating an updated eye target image with an updating tracker positioning the eye portion at the location of the updated eye target image, and updating the location of the eye portion.