1. Field of the Invention
The present invention relates to a video encoder apparatus and a video decoder apparatus, and more particularly to such apparatus for use in video encoding and decoding by employing a distributed video coding (DVC).
2. Description of the Background Art
In recent years, a relatively new coding scheme called distributed video coding has come to attention. Such a scheme is taught by Anne Aaron, et al., “Transform-domain Wyner-Ziv Codec for Video”, Proceedings of SPIE Visual Communications and Image Processing, San Jose, Calif., 2004. In the DVC solution, an encoder processes original image data to be encoded by Slepian-Wolf encoding to thereby encode the image data, and then on a decoder side a predictive image of the original image is formed and is used in Slepian-Wolf decoding together with the encoded data to thereby restore the original image data.
On the encoder side, an original image to be encoded, in the form of Wyner-Ziv frames, is transformed into a transform coefficient domain, i.e. subjected to a discrete cosine transform (DCT). The transformed data are then quantized for each band by a 2Mk level quantizer into binary values qk, which will form information, for example, on a frame of image, i.e. an extract bit plane. The information is in turn subjected to a Slepian-Wolf encoding by a turbo encoder. The resultant data contain bits, which will be temporarily stored in a buffer, whereas the remaining bits will be discarded. This procedure is not explicitly illustrated in Aaron, et al.
On the decoder side, a predictive image is formed by interpolation/extrapolation, and the DCT is performed on the predictive image to thereby transform the image into a transform coefficient domain. The transforming results, namely, the obtained coefficients are delivered as side information for each band to a Slepian-Wolf decoder, i.e. turbo decoder. The Slepian-Wolf decoder in turn requests the encoder to transmit some of the parity bits temporarily stored, and then uses the supplied parity bits as well as the side information to perform the Slepian-Wolf decoding. If the decoding does not work adequately, the Slepian-Wolf decoder requests the encoder again to additionally retransmit some of the parity bits, and then executes the Slepian-Wolf decoding by means of the resupplied parity bits and the side information. This procedure is carried on until the decoding is sufficiently performed. The decoded values obtained by the Slepian-Wolf decoding and the side information are used to reconstruct transform coefficients, and then an inverse transform, or inverse DCT, is carried out on the coefficients to thereby obtain a decoded image.
In the common DVC solution as typically presented by Anne Aaron, et al., in order to perform the Wyner-Ziv frame coding or decoding, the Wyner-Ziv frame encoder transmits some of the error correction codes to the Wyner-ziv frame decoder. Upon receipt of the error correction codes sent from the Wyner-Ziv frame encoder, the Wyner-Ziv frame decoder executes the error correction. If the amount of the received error correction codes is not sufficient for the error correction, the Wyner-Ziv frame decoder requests the Wyner-Ziv frame encoder again to retransmit additional error correction codes. The Wyner-Ziv frame encoder in turn retransmits the error correction codes, the procedure being repeated until the Wyner-Ziv frame decoder can adequately perform the error correction. Such DVC technique involves the feedback of requesting a retransmission of error correction codes, resulting in a delay in the coding process. Furthermore, the encoder and the decoder cannot separately operate from each other.
In order to dispense with such a retransmission request procedure, Catarina Brites, et al., “Encoder Rate Control for Transform Domain Wyner-Ziv Video Coding”, ICIP 2007, discloses a solution in which a Wyner-Ziv frame encoder calculates the amount of error correction codes required for an error correction. More specifically, the Wyner-Ziv frame encoder forms a predictive image to which a predictive image that would be formed by the Wyner-Ziv frame decoder side is predicted, and estimates the fallibility of the predictive image. On the decoder side, the amount of the error correction codes required for the error correction to be carried out is calculated. It can therefore achieve a solution which does not require feedback.
The amount of error correction codes to be transmitted, i.e. transmission code amount, is determined by thinning out the error correction codes encoded by the Slepian-Wolf coding. In this solution, a plurality of thinning patterns is prepared, from which a specific thinning pattern suitable for obtaining the code amount required for error correction is selected to thereby determine the code amount to be transmitted. Taking as an example a series of thinning patterns in which the minimum rate for thinning out the error correction codes is of 1/48, 48 thinning patterns are determined, namely, the patterns thinning out error correction codes at the rate of 1/48, 2/48, . . . 48/48 are provided, from which one thinning pattern to be transmitted will be selected. For instance, if the amount of error correction codes to be transmitted is determined to be 0.04 bits, then a thinning pattern for thinning out the correction codes to 2/48 will be selected because the use of this pattern is for transmitting 2/48 bits of information content to the Wyner-Ziv decoder. In this case, the Slepian-Wolf encoder thins out the error correction codes to 2/48, and then supplies information on the thinning pattern 2/48 together with a thinned error correction signal to the Wyner-Ziv frame decoder. Upon receipt of the information on the thinning pattern, the Wyner-Ziv frame decoder assumes the received error correction signal as a transmitted thinning pattern to thereby perform the Slepian-Wolf decoding.
With regard to the formation of a predictive image, a method of motion estimation and compensation is disclosed by, for example, Joao Ascenso, et al., “Improving Frame Interpolation With Spatial Motion Smoothing for Pixel Domain Distributed Video Coding”, 5th EURASIP Conference on Speech and Image Processing, Multimedia communications and Services, July 2005.
The method for estimating transmission code amount by Catarina Brites, et al., is based upon the assumption that the distribution of a difference in the transform coefficients between a predictive image and an original image conforms to a Laplace distribution, which will be used to determine the fallibility of the transform coefficients of the predictive image. In the method, therefore, the values of transform coefficient of a predictive image may sometimes cause a large amount of error correction codes to be transmitted to be calculated even when no errors exist between the predictive image and the original image.
The conventional method for estimating the amount of transmission error correction codes will be described with reference to FIGS. 6 and 7A to 7C. FIG. 6 schematically illustrates how to calculate a conditional probability pn, and FIGS. 7A, 7B and 7C are graphs plotting the probability pn for the purpose of describing the drawbacks in the conventional method.
The transform coefficients yn at a dot n of a predictive image correspond to the transform coefficients xn of an original image, which have a higher, or more significant, bit j−1, which is assumed as identified, where j is a natural number. Then, a determination will be made on a conditional probability pn at which the coefficient of the j-th bit of the transform coefficients xn is equal to unity.
As shown in FIG. 6, when the Laplace distribution centering on the transform coefficients of the predictive image is used, the probability pn can be derived from the following expression:pn=(area of range where j-th bit is 1)/(sum of areas of ranges where j-th bit is 0 or 1)
On the basis of the probability pn, a conditional entropy is determined. Consequently, the ambiguity of the higher bit j at the time the transform coefficients yn of the predictive image is obtained can be determined. It represents the degree of ambiguity in estimating the j-th bit of the original image based on the transform coefficients of the predictive image, which can be regarded as the fallibility of the predictive image.
The obtained conditional entropy is averaged across the unit of processing, such as the unit of the length of the bit planes or the entire frame in the case of Catarina Brites, et al. On the basis of the average, the fallibility or ambiguity of the transform coefficients of the predictive image across the unit of processing is estimated to calculate the amount of error correction codes to be transmitted. In this calculation, if the value pn is close to 0 or 1, that is, bits 0 or 1 appear biasedly, the lower value is taken as shown in FIG. 7A. On the contrary, if the value pn is closer to 0.5, that is, bits 0 and 1 appear more evenly, the ambiguity increases, so that the higher value is taken as shown in FIG. 7B.
The value pn is defined according to the transform coefficients of a predictive image. When the value pn is closer to the value in which the transform coefficient of the predictive image quantized and digitized changes from 1 to 0 and vice versa, the higher value is taken regardless of whether or not errors exist in the coefficients between the predictive image and the original image, as shown in FIG. 7C.
As described above, values of the transform coefficient of a predictive image may sometimes cause a large amount of error correction codes to be transmitted to be calculated even when no errors exist between a predictive image and an original image.