1. Summary of the Invention
The present invention, like recently developed standard JVT (ITU-T Rec. H. 264| ISO/IEC 14496-10 AVC), relates to a method, apparatus, and program for encoding video information, and a method, apparatus, and program for decoding video information used when video information (bitstream) is received through a network medium such as satellite broadcasting, a cable TV or the Internet or when video information is processed on a storage medium such as an optical disc, a magnetic disk or a flash memory, the video information compressed through orthogonal transformation such as discrete cosine transform or the Karhunen Loeve transform, and motion compensation.
2. Discussion of the Background
Recently, for both of broadcasting stations providing information and homes receiving the information, it has become common to use devices that adopt a method such as MPEG to compress video information through orthogonal transformation such as discrete cosine transform, and motion compensation, utilizing the redundancy of the video information, for efficient information transmission and storage, by taking the video information as digital information.
Especially, the MPEG2 (ISO/IEC 13818-2) is defined as a general video encoding method, and is widely used as an application for professionals and for consumers since it can treat interlaced images and progressively scanned images, and standard resolution video and high resolution video. By using the MPEG2 compression method, a high compression rate and high quality of video can be realized, for example, by assigning interlaced images of standard resolution of 720×480 pixels a bit rate of 4 to 8 Mbps, or by assigning progressively scanned images of high resolution of 1920×1088 pixels a bit rate of 18 to 22 Mbps.
The MPEG2 mainly encodes high quality video for broadcasting and does not cope with a bit rate lower than that used by the MPEG1, that is, an encoding method with a high compression rate. However, it was expected that popularization of mobile terminals would bring high needs of such an encoding method, and therefore the MPEG4 encoding system was standardized. As to a video encoding method, its standard was approved as international standard ISO/IEC 14496-2 in December 1998.
In addition, recently, with video encoding for video conferencing as a first desired usage, a method called JVT (ITU-T Rec. H.264|ISO/IEC 14496-10 AVC) is being standardized. Compared with conventional encoding systems such as MPEG2 and MPEG4, it is known that the JVT can provide higher encoding efficiency although it requires more operations for encoding and decoding.
FIG. 8 shows a general construction of a video-information encoding apparatus that realizes video compression with orthogonal transformation such as the discrete cosine transform or the Karhunen Loeve transform, and motion compensation. As shown in FIG. 8, a video-information encoding apparatus 100 is composed of an A/D converter 101, a screen rearrangement buffer 102, an adder 103, an orthogonal transformation unit 104, a quantization unit 105, a reverse encoding unit 106, a storage buffer 107, a dequantization unit 108, an inverse orthogonal transformation unit 109, a frame memory 110, a motion prediction/compensation unit 111, and a rate control unit 112.
Referring to FIG. 8, the A/D converter 101 converts an input video signal into a digital signal. The screen rearrangement buffer 102 rearranges the frames according to the GOP (Group of Pictures) of video compression information given from the A/D converter 101. The screen rearrangement buffer 102 gives the orthogonal transformation unit 104 video information on the entire frames of images to be intra-encoded (within image encoded). The orthogonal transformation unit 104 applies an orthogonal transformation, such as the discrete cosine transform or the Karhunen Loeve transform, to the video information and gives a transform coefficient to the quantization unit 105. The quantization unit 105 performs a quantization process on the transform coefficient given from the orthogonal transformation unit 104.
The reverse encoding unit 106 determines an encoding mode based on the quantized transform coefficient, which was supplied by the quantization unit 105, and a quantization scale, and applies variable-length coding or reverse encoding such as arithmetic coding to the encoding mode to thereby create information to be inserted in the header part in each unit of encoded video. The encoded encoding mode is given and stored into the storage buffer 107. This encoded encoding mode is output as video compression information.
In addition, the reverse encoding unit 106 applies variable-length coding or reverse encoding, such as arithmetic coding, to the quantized transform coefficient and gives the encoded transform coefficient to the storage buffer 107 to store it therein. This encoded transform coefficient is output as video compression information.
The quantization unit 105 operates under the control of the rate control unit 112. The quantization unit 105 gives the quantized transform coefficient to the dequantization unit 108, which performs dequantization on the transform coefficient. The inverse orthogonal transformation unit 109 creates decoding video information by applying an inverse orthogonal transformation process to the dequantized transform coefficient, and gives the information to the frame memory 110 to store it therein.
On the other hand, the screen rearrangement buffer 102 gives the motion prediction/compensation unit 111 video information on an image to be inter-encoded (between images encoded). The motion prediction/compensation unit 111 retrieves video information used for reference simultaneously from the frame memory 110 and performs a motion prediction/compensation process to create reference video information. The motion prediction/compensation unit 111 gives the reference video information to the adder 103, which then converts the reference video information into a differential signal from the video information. At the same time, the motion prediction/compensation unit 111 gives motion vector information to the reverse encoding unit 106.
The reverse encoding unit 106 determines an encoding mode based on the quantized transform coefficient, which was given from the quantization unit 105, the quantization scale, the motion vector information given from the motion prediction/compensation unit 111, etc., and applies variable-length coding or reverse encoding such as arithmetic coding to the encoding mode, to thereby create information to be inserted into the header in a unit of encoded video. The encoded encoding mode is given to the storage buffer 107 to be stored therein. The encoded encoding mode is output as video compression information.
The reverse encoding unit 106 applies variable-length coding or the reverse encoding process such as arithmetic coding to the motion vector information to create information to be inserted in the header part in a unit of encoded video.
In the inter-encoding, video information to be input into the orthogonal transformation unit 104 is a differential signal obtained by the adder 103, which is different from the intra-encoding. Since other processes are the same as in the case of the video compression information to be intra-encoded, its explanation will be omitted.
Next, FIG. 9 shows a general construction of a video-information decoding apparatus corresponding to the aforementioned video-information encoding apparatus 100. As shown in FIG. 9, the video information decoding apparatus 120 is composed of a storage buffer 121, a reverse decoding unit 122, a dequantization unit 123, an inverse orthogonal transformation unit 124, an adder 125, a screen rearrangement buffer 126, a D/A converter 127, a motion prediction/compensation unit 128, and a frame memory 129.
The storage buffer 121 temporarily stores input video compression information, and then transfers the information to the reverse decoding unit 122. The reverse decoding unit 122 applies variable-length decoding or a process such as arithmetic decoding to the video compression information based on the prescribed format of the video compression information, obtains the encoding mode information from its header part, and gives the information to the dequantization unit 123. Similarly, the reverse decoding unit 122 obtains the quantized transform coefficient and gives it to the dequantization unit 123. In a case in which the frame has been subjected to the inter-encoding, the reverse decoding unit 122 decodes motion vector information stored in the header part of the video compression information as well, and gives the information to the motion prediction/compensation unit 128.
The dequantization unit 123 dequantizes the quantized transform coefficient supplied from the reverse decoding unit 122, and gives the transform coefficient to the inverse orthogonal transformation unit 124. The inverse orthogonal transformation unit 124 applies inverse orthogonal transformation, such as inverse discrete cosine transform or inverse Karhunen Loeve transform, to the transform coefficient based on the prescribed format of the video compression information.
In a case in which the frame has been subjected to the intra-encoding, on the other hand, the video information subjected to the inverse orthogonal transformation is stored in the screen rearrangement buffer 126, and then is output after a D/A conversion process by the D/A converter 127.
In a case in which the frame has been subjected to the inter-encoding, the motion prediction/compensation unit 128 creates a reference image based on the motion vector information subjected to the reverse decoding and the video information stored in the frame memory 129, and gives the image to the adder 125. The adder 125 adds this reference image and the output of the inverse orthogonal transformation unit 124. Since other processes are performed in the same way to the case of the frame subjected to the intra-encoding, its explanation will be omitted.
Now, the reverse encoding unit 106 under the JVT will be described in detail. As shown in FIG. 10, the reverse encoding unit 106 under the JVT adopts one reverse encoding out of arithmetic coding called CABAC (Context-based Adaptive Binary Arithmetic Coding) and variable-length coding called CAVLC (Context-based Adaptive Variable Length Coding), for a symbol such as mode information, motion information, and quantized coefficient information, which are input from the quantization unit 105 and the motion prediction/compensation unit 111, and outputs video compression information (bitstream). Based on CABAC/CAVLC selection information in FIG 10, it is judged which reverse encoding is used. This CABAC/CAVLC selection information is determined by the video-information encoding apparatus 100 and is output by being embedded in a bitstream as header information.
First the CABAC system in the reverse encoding unit 106 is shown in FIG. 11. As shown in FIG. 11, mode information, motion information, and quantized transform coefficient information input from the quantization unit 105 and the motion prediction/compensation unit 111 are input into a binarization unit 131 as multi-valued symbols. The binarization unit 131 converts the multi-valued symbols into a binary symbol string of an arbitrary length under a predetermined rule. This binary symbol string is input into a CABAC encoding unit 133, and the CABAC encoding unit 133 applies binary symbol arithmetic coding to the input binary symbols, and outputs the encoded resultant as a bitstream to the storage buffer 107. A Context operation unit 132 calculates Context based on the symbol information input to the binarization unit 131 and the binary symbols output from the binarization unit 131, and inputs the Context to the CABAC encoding unit 133. A Context memory group 135 of the Context operation unit 132 stores Context which is updated, as occasion arises, during an encoding process, and the initial state of Context to be used for a reset.
Next, the CAVLC system in the reverse encoding unit 106 is shown in FIG. 12. As shown in FIG. 12, mode information, motion information, and quantized transform coefficient information input from the quantization unit 105 and the motion prediction/compensation unit 111 are input in a CAVLC encoding unit 140 as multi-valued symbols. Like the variable-length coding adopted by the conventional MPEG, the CAVLC encoding unit 140 applies a variable-length coding table to the input multi-valued symbols, and outputs a bitstream. A Context storage unit 141 stores information already encoded in the CAVLC encoding unit 140, for example, the number of coefficients of non-zero in blocks already processed as well as in blocks being processed, the value of a coefficient encoded immediately before this time, and so on. The CAVLC encoding unit 140 is able to change a variable-length coding table to be applied for symbols, based on information from the Context storage unit 141. It should be noted that the Context storage unit 141 stores the initial state of Context to be used for a reset as well. The output bitstream is input into the storage buffer 107.
Similarly, the reverse decoding unit 122 under the JVT will be described in detail. Similarly to the reverse encoding unit 106, the reverse decoding unit 122 under the JVT applies one reverse decoding out of CABAC and CAVLC to an input bitstream, as shown in FIG. 13. By reading the CABAC/CAVLC selection information embedded in the header information of the input bitstream, one of CABAC and CAVLC is applied.
FIG. 14 shows the CABAC system in the reverse decoding unit 122. In FIG. 14, a CABAC decoding unit 161 applies binary symbol arithmetic decoding to a bitstream input from the storage buffer 121, and outputs the resultant as a string of binary symbols. This string of binary symbols is input into an inverse binarization unit 163, and the inverse binarization unit 163 converts the string of binary symbols into multi-valued symbols under a predetermined rule. The multi-valued symbols to be output from the inverse binarization unit 163 are output from the inverse binarization unit 163 to the dequantization unit 123 and the motion prediction/compensation unit 128 as mode information, motion vector, and quantized coefficient information. A Context operation unit 162 calculates Context based on the string of binary symbols input into the inverse binarization unit 163 and the multi-valued symbol output from the inverse binarization unit 163, and inputs the Context into the CABAC decoding unit 161. A Context memory group 165 of the Context operation unit 162 stores Context which is updated, as occasion arises, during a decoding process, and the initial state of Context to be used for a reset.
Next, the CAVLC system in the reverse decoding unit 122 is shown in FIG. 15. As shown in FIG. 15, an input bitstream from the storage buffer 121 is input into a CAVLC decoding unit 170. Like variable-length decoding adopted by the conventional MPEG, the CAVLC decoding unit 170 adopts a variable-length decoding table for the input bitstream and outputs mode information, motion information, and quantized transform coefficient information. These output information are input into the dequantization unit 123 and the motion prediction/compensation unit 128. A Context storage unit 171 stores information already decoded in the CAVLC decoding unit 170, for example, the number of coefficients of non-zero in blocks already processed as well as blocks being processed, the value of a coefficient decoded just before this time, and so on. The CAVLC decoding unit 170 is able to change the variable-length decoding table to be applied for symbols, based on information from the Context storage unit 171. It should be noted that the Context storage unit 171 stores the initial state of Context to be used for a reset as well.
For specific operations of the CABAC shown in FIG. 11 and FIG. 14, an explanation about the CABAC is written in Final Committee Draft ISO/IEO 14496-10:2002 (section 9.2), the entire contents of which are hereby incorporated herein by reference.