The Field of the Invention
The present invention relates to systems and methods for reducing a bit rate of a video sequence. More particularly, the present invention relates to systems and methods for reducing a bit rate of a video sequence by replacing original texture of the video sequence with synthesized texture at the encoder.
Background and Relevant Art
One of the goals of transmitting video sequences over computer networks is to have a relatively low bit rate while still maintaining a high quality video at the decoder. As technology improves and becomes more accessible, more users are leaving the realm of 56K modems and moving to Digital Subscriber Lines (DSL), including VDSL and ADSL, which support a higher bit rate than 56K modems. VDSL, for example, supports bit rates up to 28 Mbits/second, but the transmission distance is limited. The maximum transmission distance for a 13 Mbits/second bit rate is 1.5 km using VDSL. ADSL, on the other hand, can support longer distances using existing loops while providing a bit rate of approximately 500 kbits/second.
Video standards, such as MPEG-2, MPEG-4, and ITU H.263, can achieve bit rates of 3 to 9 Mbits/second, 64 kbits to 38.4 Mbits/second, and 8 kbits to 1.5 Mbits/second, respectively. Even though video sequences with bit rates of hundreds of kbits/second can be achieved using these standards, the visual quality of these video sequences is unacceptably low, especially when the content of the video sequences is complex.
Solutions to this problem use model-based analysis-synthesis compression methods. Model-based analysis-synthesis compression methods perform both analysis and synthesis at the encoder to modify parameters in order to minimize the error between the synthesized model and the original. The resulting parameters are transmitted to the decoder, which is required to synthesize the model again for the purpose of reconstructing the video sequence.
Much of the model-based analysis-synthesis compression methods have focused on modeling human head-and-shoulders objects while fewer attempts have modeled background objects. Focusing on human head-and-shoulder objects often occurs because in many applications, such as videoconferencing applications, the background is very simple. However, background modeling may also achieve a significant reduction of the bit rate as the bit rate of I (intra) frames is often dependent on the texture content of each picture. To a lesser extent, the bit rate of B (bi-directionally predicted) frames and P (predicted) frames is also affected by texture content as moving objects uncover additional background objects.
One proposal for reducing the bit rate is to use sprite methods on the background objects. Sprites are panoramic pictures constructed using all of the background pixels that are visible over a set of video frames. Instead of coding each frame, the sprite is compressed and transmitted. The background image can be reconstructed using the sprite and associated camera motion parameters. Sprite methods require exact object segmentation at the encoder, which is often a difficult task for complex video sequences. In addition, the motion or shape parameters that are transmitted with the sprite consume some of the available bit rate. These limitations may be addressed by filtering the textured areas. Unfortunately, different filters must be designed for various textures.
Texture replacement has also been proposed as a method of background modeling. In one example, the original texture is replaced with another texture that is selected from a set of textures. However, this requires that the set of replacement textures be stored at the encoder. In another example, the texture of selected regions is replaced at the encoder with pixel values that represent an “illegal” color in the YUV color space. At the decoder, the processed regions are recovered using chroma keying. There is an explicit assumption that texture synthesis, using texture parameters sent from the encoder, followed by mapping of the synthesized texture onto the decoded video sequences, is performed at the decoder. This method therefore assumes that the reconstruction is performed at the decoder using a method that is dependent on the decoder's processing capabilities. The drawbacks of these approaches are that the processing capabilities of the decoder are assumed and that the computational costs of the decoding stage are increased.