The present invention relates to a video encoding device and program, a video decoding device and program, and a video distribution system, and is applicable, for example, to a video distribution system which encodes and distributes video data by a DVC (Distributed Video Coding) method for a video.
This application is based upon and claims benefit of priority from Japanese Patent Application No. 2012-047020, filed on Mar. 2, 2012, the entire contents of which are incorporated herein by reference.
In recent years, attention has been drawn to encoding methods of DVC, such as that described in B. Girod, A M. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed Video Coding,” Proceedings of the IEEE, vol. 93, Jan. 2005, pp. 71-83 (hereinafter, Non-Patent Literature 1), as video encoding methods for use in a video distribution system. These methods are encoding methods which perform a Slepian-Wolf encoding process for an original image to be encoded by an encoding section, and perform a decoding process of an image by performing Slepian-Wolf decoding for this encoded data along with a predicted image of the original image of the encoding section performed at the decoding side. Non-Patent Literature 1 discloses a video distribution system which includes a video encoding device and a video decoding device which perform encoding and decoding of a video based on Slepian-Wolf theory and Wyner-Ziv theory.
Next, an outline of a video distribution system which adopts the technology disclosed in Non-Patent Literature 1 will be described. In the video encoding device disclosed in Non-Patent Literature 1, an original image to be encoded (hereinafter, called a “Wyner-Ziv frame”) is expressed by a quantized binary (bit), and is Slepian-Wolf encoded. Then, in the video encoding device, only parity bits from within this encoded result are stored.
On the other hand, the video decoding device disclosed in Non-Patent Literature 1 performs a transmission request for part of the parity bits stored in the video encoding device. Then, the video decoding device performs Slepian-Wolf decoding, from the received parity bits and Side Information (a predicted image, hereinafter called “SI”). In the case where sufficient decoding was not able to be performed, the video decoding device performs an additional transmission request of part of the parity bits again to the encoding section, and again performs Slepian-Wolf decoding from the additionally received parity bits and the above described SI. Then, the video decoding device repeats the above described process until sufficient decoding can be performed.
However, in the system of the DVC method disclosed in Non-Patent Literature 1 (a system which performs encoding and decoding of a video based on Slepian-Wolf theory and Wyner-Ziv theory), the generation of SI in principle is not performed at the video encoding device side. However, it is difficult in the system of a DVC method to achieve a higher encoding efficiency under a restriction of not generating SI at the video encoding device side. Accordingly, studies have been conducted which are used in an encoding process by generating SI at the video encoding device side as well. The technology described in C. Brites and F. Pereira, “Encoder rate control for transform domain Wyner-Ziv video coding,” Image Processing, 2007. ICIP 2007. IEEE International Conference on, IEEE, 2007. pp. 4-7 (hereinafter, Non-Patent Literature 2), and M. Tagliasacchi, A. Majumdar, and K. Ramchandran, “A distributed-source-coding based robust spatio-temporal scalable video codec,” Proc. Picture Coding Symposium, Citeseer, 2004 (hereinafter, Non-Patent Literature 3), are technologies in related art which are used in an encoding process by generating SI at the video encoding device side.
In Non-Patent Literature 2, the amount of parity bits necessary for error correction (hereinafter, called the “amount of codes”) is calculated so as to eliminate the need for a re-transmission request of the parity bits from the video decoding device to the video encoding device. Specifically, Non-Patent Literature 2 achieves a technique which may not require feedback in a video encoding device, by generating SI capable of generation by a computation amount lower than the SI generated at the image decoding device side, and calculating an amount of codes necessary for estimating and correcting an error of the SI based on the same. In the case where there is a scalable structure, such as in the technology disclosed in Non-Patent Literature 3, information of a base layer may be used when generating SI at the video encoding device side.
While it may be necessary to generate SI with a prediction accuracy higher than, or at least equal to, SI generated by the video encoding device so as to improve the quality of a decoded image in the video decoding device, a greater computation amount may be necessary to generate SI with a high prediction accuracy. A method, which obtains SI with a prediction accuracy of a fixed value or more in the video encoding device by additionally performing a process which re-generates high-quality SI, in cases such as where SI once generated does not satisfy a prescribed quality, is presented in J. Ascenso, C. Brites, and F. Pereira, “Motion compensated refinement for low complexity pixel based distributed video coding,” Proceedings IEEE Conference on Advanced Video and Signal Based Surveillance, 2005, pp. 593-598 (hereinafter, Non-Patent Literature 4), as a method which generates SI with a high prediction accuracy.