The present invention relates to a video image decoding apparatus, a video image decoding program, and a video image encoding system. The present invention may be applied to, for example, an apparatus, a program, and a system that use Distributed Video Coding (DVC) method based on Slepian-Wolf theorem and Wyner-Ziv theorem.
X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov and M. Ouaret, “The Discover Codec: Architecture, Techniques and Evaluation”, in Picture Coding Symposium, 2007, vol. 2007, pp. 6-9 (hereinafter referred to as “Non-Patent Document 1”) is a representative example of an article explaining a video image encoding apparatus and a video image decoding apparatus that carry out encoding and decoding of video images based on Slepian-Wolf theorem and Wyner-Ziv theorem.
The video image decoding apparatus described in Non-Patent Document 1 includes a key frame decoder that inputs a key stream and outputs key frames that have been decoded (hereinafter referred to as “decoded key frames”) and a WZ frame decoder that inputs a WZ stream (where WZ is an abbreviation for “Wyner-Ziv”) and outputs WZ frames that have been decoded (hereinafter referred to as “decoded WZ frames”). In the WZ frame decoder, a predicted image generating unit inputs decoded key frames and generates a predicted image, and a WZ decoding unit carries out WZ decoding on the WZ stream while using the inputted predicted image as side information (supplementary information) to obtain a decoded WZ frame.
The predicted image generating unit includes a frame buffer, and generates predicted images using, for example, bidirectional motion compensated interpolation which is used in Non-Patent Document 1. The bidirectional motion compensated interpolation assumes that a subject in the video image is moving with uniform linear motion and is a method that generates predicted images from frames picked up before and after the time to be predicted by carrying out motion estimation and motion compensation.
J. Ascenso and F. Pereira, “Adaptive Hash-Based Side Information Exploitation for Efficient Wyner-Ziv Video Coding”, Image Processing 2007, ICIP 2007, 2007 (hereinafter referred to as “Non-Patent Document 2”) adds the concept of a “hash” (a small piece of information) to a video image encoding apparatus and a video image decoding apparatus that carry out encoding and decoding of video images based on Slepian-Wolf theorem and Wyner-Ziv theorem.
The video image decoding apparatus in Non-Patent Document 2 also includes a key frame decoder and a WZ frame decoder. In the WZ frame decoder described in Non-Patent Document 2, a predicted image generating unit generates a predicted image from an inputted hash and decoded key frames, and a WZ decoding unit inputs the generated predicted image and a WZ stream provided from an encoding apparatus and carries out WZ decoding on the WZ stream while using the inputted predicted image as side information to obtain a decoded WZ frame.
Here, the “hash” is information for facilitating the generation of a predicted image. In Non-Patent Document 2, part of a DC (direct current) component and an AC (alternating current) component when a DCT (discrete cosine transform) has been carried out on an image in N×N pixels is used as a hash. The predicted image generating unit includes a frame buffer, and generates a predicted image by, for example, searching reference images (i.e., images in the frame buffer) for a region for which the closest hash to the inputted hash is generated (motion estimation) and carries out compensation on the region (motion compensation).
The method described in Non-Patent Document 2 generates predicted images under the assumption that motion estimation can be carried out using part of the DC component and the AC component included in the hash.