1. Field of the Invention
The present invention relates to the encoding and decoding of moving picture sequences and is applicable in, for example, distributed video coding.
2. Description of the Related Art
Distributed video coding (DVC) is a new coding method that has attracted much recent attention. It is based on two key results in information theory, the Slepian-Wolf theorem and the Wyner-Ziv theorem, which showed that data could be compressed as efficiently by two independent encoders as by one encoder.
In DVC coding, a sequence of video frames is divided into key frames and non-key frames, the latter often referred to as Wyner-Ziv frames or WZ frames. The sequence of key frames is coded by a conventional intraframe or interframe coding method, and the coded data are sent to the decoder. The sequence of WZ frames is coded independently by a method that generates error-correcting information, generally referred to as parity bits, and only the parity bits, or only some of them, are sent to the decoder.
A basic DVC coding method is described by Aaron et al. in ‘Transform-Domain Wyner-Ziv Codec for Video’, Proc. SPIE Visual Communications and Image Processing, 2004. In the encoder, a discrete cosine transform (DCT) is used to transform each Wyner-Ziv frame to the coefficient domain, the coefficients are grouped into bands, the coefficients in the k-th band are quantized by a 2Mk-level quantizer, the quantized coefficients (qk) are expressed in fixed numbers of bits, and the bit planes are extracted and supplied to a Slepian-Wolf encoder, which is a type of encoder that produces data bits and parity bits. The parity bits are stored in a buffer for transmission to the decoder. The data bits are discarded (as implied but not explicitly shown by Aaron, et al. in FIG. 1 of the above reference).
The decoder decodes the key frames by conventional methods, uses the decoded key frames to generate a predicted image for each Wyner-Ziv frame, applies a DCT to convert the predicted image to the coefficient domain, groups the coefficients into bands, and inputs the coefficients in each band as side information to a Slepian-Wolf decoder. The Slepian-Wolf decoder uses the parity bits received from the encoder to correct prediction errors in the side information by an iterative process. When a satisfactory decoded result is obtained, an inverse discrete cosine transform (IDCT) is applied to reconstruct the image of the Wyner-Ziv frame.
In the encoder described by Aaron et al., the Slepian-Wolf encoder uses a punctured turbo code to produce the parity bits, originally sends the decoder a subset of the parity bits, and sends further subsets, if necessary, on request from the decoder. A problem with this scheme is that it requires a feedback channel from the decoder to the encoder, so it is inapplicable when no feedback channel is available. Another problem is that generating and sending the successive requests for more parity bits takes extra time and delays the decoding process.
In an alternative scheme, described by Morbee et al. in ‘Improved Pixel-Based Rate Allocation For Pixel-Domain Distributed Video Coders Without Feedback Channel’, ICIVS 2007, the encoder generates a predicted image of its own for each Wyner-Ziv frame, compares this predicted image with the original image in the Wyner-Ziv frame, thereby estimates the number of parity bits that will be required for accurate decoding of the Wyner-Ziv frame, and sends this number of parity bits without having to be asked for them by the decoder. This eliminates the need for a feedback channel and avoids the delays associated with repeated requests.
DVC systems can also be improved by having the encoder supply the decoder with extra information to expedite the Slepian-Wolf decoding process. If the encoder generates a predicted image, for example, then it can the supply the decoder with correlation information indicating how closely the predicted image is correlated with the original image.
A problem with this scheme is that DVC coding and decoding will be incorporated into many types of devices produced by many different manufacturers. The methods of generating predicted images are likely to differ considerably, depending on the manufacturer and the cost of the device. Since a principal reason for using DVC is to reduce the processing load on the encoder, it is also likely that the encoder will use a simple prediction method that requires comparatively little processing, while the decoder uses a more elaborate prediction method to obtain a better predicted image, in order to minimize the number of parity bits required in the decoding process.
As a result, the encoder will tend to generate a predicted image that differs more from the original image than does the predicted image generated by the decoder. As a result, the encoder may underestimate the correlation between the original image and the decoder's predicted image. If the decoder operates according to the underestimated correlation information supplied by the encoder, it will have a tendency to assume that predicted values that are actually correct are incorrect. This mistaken assumption will take additional decoding iterations to correct, delaying the convergence of the decoding process.