1. Field of the Invention
The present invention relates generally to methods and apparatuses for video compression. More specifically, the present invention describes a method for efficient encoding and encoding quantized sequences in video compression systems based on the principle of coding with side information available only at the decoder, hereafter referred to as Wyner-Ziv video coding systems.
2. Background Description
Conventional video compression systems, as standardized by MPEG, rely on a complex, sophisticated encoder that exploits the statistical correlation among neighboring video frames to achieve good compression performance. In emerging applications like video surveillance, mobile multimedia, video conferencing, video gaming, and battlefield video communications, however, a simple, low-cost encoder with low computational complexity is instead desired. In an effort to reduce encoding computational complexity, one approach proposed recently is to apply the principle of Wyner-Ziv coding to shift the computational load from the encoder to the decoder.
Briefly speaking, in Wyner-Ziv coding, the decoder has access to side information that is not available to the encoder; and such side information can still be exploited to achieve greater compression than would otherwise be possible. Therefore, with the objective to achieve very low encoding complexity, Wyner-Ziv video coding systems exploit the statistical correlation among neighboring video frames only at the decoder, and thus relieve the encoder of significant computational load.
FIG. 2 shows a typical Wyner-Ziv video compression system. In general, a Wyner-Ziv video compression system consists of a video encoder 235 which compresses (or encodes) a video signal 205 into a compressed video frame 255, and a video decoder 245 which decompresses (or decodes) the compressed video frame 255 to produce reconstructed video frame 275. At any time instant, a video frame V 205 is to be encoded by the encoder 235. Since the decoder 245 has access to the previously decoded frame 285, it can generate prior knowledge 290 about V 205 from the previously decoded frame 285, and use this knowledge in the decoding process 250. Being aware of the existence of the prior knowledge 290 about V 205 at the decoder 245, the encoder 235 can transmit fewer bits, and thus achieve greater compression, than would otherwise be possible.
A brief description of the typical encoding process is as follows. The encoder first compresses V 205 conventionally by using a discrete cosine transform (DCT) 210, and quantization 220 (equivalent to the intra mode transform and quantization in MPEG coding). The resultant signal x 225 is called the quantized sequence, and takes value in a discrete set.
Previous methods for encoding 230 the quantized sequence x 225 have been described by Pradhan and Ramchandran, Distributed source coding using syndromes (DISCUS): design and construction, IEEE Transactions on Information Theory, 2003, Aaron and Girod, Wyner-Ziv video coding with low-encoder complexity, Proc. Picture Coding Symposium, PCS 2004, San Francisco, Calif., 2004, Xu and Xiong, Layered Wyner-Ziv video coding, Proc. VCIP'04: Special Session on Multimedia Technologies for Embedded Systems, San Jose, Calif., 2004, Sehgal, Jagmohan, and Ahuja, A state-free video encoding paradigm, Proc. IEEE Int. Conf. Image Processing, 2003, and, Puri and Ramchandran, PRISM: A new robust video coding architecture based on distributed compression principles, Proc. of 40th Allerton Conference on Communication, Control, and Computing, Allerton, Ill., 2002.
The general process of encoding 230 the quantized sequence x 225 adopted in these methods is shown in FIG. 3. More specifically, in these methods, the signal x 225 is first binarized 310 into a set of binary streams 315. Correspondingly, the statistical model 240 representing the statistical relationship between x 225 (at the encoder 235) and y 295 (at the decoder 245) is decomposed 350 into a set of binary models 355, each of which corresponds to a binary stream 315 obtained from x 225. From each binary model 355, a binary code (e.g. 320, 330, . . . 360) is then generated, and used to encode the corresponding binary stream 315.
The chief drawback of these methods is that the binarization 310 of x 225 and the decomposition 350 of the statistical model 240 add computational complexity to encoding, and complicate the code generation process. The additional encoding computational complexity is particularly undesirable as the main objective of Wyner-Ziv video compression systems is to reduce encoding complexity.
Note that although the side information y 295 is not assumed on the encoder side, the encoder 235 needs to know the statistical relationship between x 225 and y 295 as reflected in the statistical model 240 in order to encode x 225. For the purpose of reducing encoding complexity, the statistical model should be estimated by using computationally efficient methods in Wyner-Ziv video compression systems. The description of such methods, however, is not relevant to the present invention. Hence, we shall simply assume that the statistical model 240 is known at the encoder 235 and at the decoder 245.