1. Technical Field of the Invention
The present invention refers to videotelephony and more particularly to image compression for videotelephony.
2. Description of Related Art
At present, transferring live video streams throughout public networks remain a wager.
A good compromise between quality and rate is difficult to reach. For applications such as videotelephony, subjective quality is sensitive to the fluidity of the video stream. At a rate below 20 pictures per second, the feeling of discontinuity is awkward, essentially concerning the movement of the lips.
In the field of videotelephony an important problem is that the sound and the image emitted from a speaker reach a listener with a correct relative timing.
Principal components analysis on the image, principal components analysis in compression, and independent components analysis on the image are known in the prior art.
As well in videoconference as in television duplexes it is easy to observe that there are often problems of synchronization and fluidity. Indeed the images and sound are coded differently and sent independently.
U.S. Pat. No. 5,907,351 discloses a method and apparatus for cross-modal predictive coding for talking head sequences. According to this disclosure the audio signal is constantly transmitted to the receiver and is also used to create a predicted image of the lips of the talking head.
The actual lip image is compared to the predicted lip image. Based upon this comparison, it is determined which of three signals is to be transmitted to the receiver: no signal corresponding to the video signal, a signal corresponding to the video signal, a signal corresponding only to the difference between the actual lip image and a predicted lip image, or the actual lip image.
The receiver reconstructs a lip image based upon the audio signal received and the signal received, if any, corresponding to the video image, and inserts it into the previously received video frame or modifies the previous frame accordingly.
Up to now, the synchronization is not under control and in the case of videoconference, the passband limitations forbid a sufficiently rapid refreshment in order to obtain an appearance of fluidity of the system.
Starting from the principle that the whole of the image does not need a rapid refreshment and a good synchronization, but only some main areas such as the mouth or the eyes, the applicant had the idea to find compression methods which would be much more preferred and specialized for such areas and to model said areas.
The improvement of the compression of this zone of the image makes it possible to reduce the desynchronization between this zone and the sound, and thus to increase fluidity.