1. Field of the Invention
The present invention relates to video encoding and decoding apparatus and methods applicable in, for example, distributed video coding systems.
2. Description of the Related Art
Distributed video coding (DVC) is a new coding method that has attracted much recent attention. It is based on two key results in information theory, the Slepian-Wolf theorem and the Wyner-Ziv theorem, which showed that data could be compressed as efficiently by two independent encoders as by one encoder.
In DVC coding, the sequence of video frames is divided into key frames and non-key frames. The non-key frames are often referred to as Wyner-Ziv frames or WZ frames. The sequence of key frames is coded by a conventional coding method, and the coded data are sent to the decoder. The sequence of Wyner-Ziv frames is coded independently by a method that generates error-correcting information, generally referred to as parity bits, and only the parity bits, or only some of them, are sent to the decoder. A general feature of DVC coding systems is that they reduce the processing load on the encoder.
A basic DVC coding method is described by Aaron et al. in ‘Transform-Domain Wyner-Ziv Codec for Video’, Proc. SPIE Visual Communications and Image Processing, 2004. In the encoder, the key frames are coded as intraframes. A discrete cosine transform (DCT) is used to transform each Wyner-Ziv frame to the coefficient domain, the coefficients are grouped into bands, the coefficients in the k-th band are quantized by a 2Mk-level quantizer, the quantized coefficients (qk) are expressed in fixed numbers of bits, and the bit planes are extracted and supplied to a Slepian-Wolf encoder that uses a punctured turbo code to produces data bits and parity bits. The data bits are discarded (as implied but not explicitly shown by Aaron et al. in FIG. 1 of the above reference).
The decoder decodes the key frames, uses the decoded key frames to generate a predicted image for each Wyner-Ziv frame, applies a DCT to convert the predicted image to the coefficient domain, groups the coefficients into bands, and inputs the coefficients in each band as side information to a Slepian-Wolf decoder. The Slepian-Wolf decoder uses parity bits received from the encoder to correct prediction errors in the side information by an iterative process, in which the decoder originally receives a subset of the parity bits and may request further parity bits as required. When a satisfactory decoded result is obtained, an inverse discrete cosine transform (IDCT) is applied to reconstruct the image of the Wyner-Ziv frame.
A problem with this method is that since the key frames are coded as intraframes, they cannot be coded efficiently.
A proposed solution to this problem, described by Liu et al. in ‘Backward Channel Aware Wyner-Ziv Video Coding’, Proc. IEEE International Conference on Image Processing, Atlanta, Ga., October 2006, is to have the encoder perform interframe coding of the key frames, using motion estimation information supplied from the decoder on a feedback channel. The encoder can then perform efficient interframe coding with motion compensation, without having to perform the computationally intensive motion estimation processing.
The feedback channel is also used in the encoder described by Aaron et al., when the decoder requests further parity bits.
A basic problem with the use of a feedback channel is that in some applications, no feedback channel is available. Another problem is that if a feedback channel is used to request further parity bits, generating and sending the successive requests for more parity bits takes extra time and delays the decoding process.
In ‘Encoder Rate Control for Transform Domain Wyner-Ziv Video Coding’, ICIP 2007, Brites et al. describe a DVC system that does not use a feedback channel. Instead, the encoder estimates the number of parity bits that the decoder will need for adequate decoding of each Wyner-Ziv frame by performing limited motion estimation, generating a predicted image, and comparing this predicted image with the original image. The encoder then sends the decoder the estimated number of parity bits without having to be asked for them.
The methods proposed by Liu et al. and Brites et al. could in theory be combined to improve the efficiency of both key frame encoding, by performing interframe coding, and Wyner-Ziv decoding, by eliminating the need for the decoder to request additional parity bits, but this hypothetical combination would still require a feedback channel to supply motion information from the decoder to the key frame encoder.