Among video processing devices, a video encoding device is designed to generate a bit stream by encoding input-video data fed from an external device on the basis of a predetermined video encoding method. One of representative video encoding methods for use by a video encoding device is the H.264 standard. The H.264 standard is recommended by ITU (International Telecommunication Union) in May 2003, as one standard of motion video data compression coding methods.
Among H.264 based video coding methods, a joint model (JM) method is known as a reference model. A video encoding device adopting the joint model method is proposed as art related to the present invention (see e.g. Patent Literature 1). In the present specification, the proposed video encoding device will be referred to as a video encoding device as related art.
FIG. 17 shows a structure of the video encoding device as related art. A video encoding device 100 as related art comprises a prediction-transformation-quantization unit 102 which sequentially receives input of video data 101 as an input image, a PCM (Pulse Code Modulation) encoder 103, and a first switch 104; the video encoding device 100 operates on a macroblock basis.
A video signal format of the video data 101, which is fed into the video encoding device 100 shown in FIG. 17, is assumed to be QCIF (Quarter Common Intermediate Format). The QCIF is one of video signal formats determined by ITU.
FIG. 18 shows an image frame of the QCIF video signal format. The image frame is formed of horizontally 11 by vertically 9 macroblocks. One image frame is formed of one frame picture in a case of progressive scanning. In a case of interlaced scanning, it is formed of two field pictures. In the following description, these frame pictures will be simply referred to as “pictures”.
A macroblock as a base unit forming a picture is formed of 16 by 16 luminance pixels and a pair of 8 by 8 chrominance pixels for Cr and Cb components. In the figure, with respect to 4 by 4 pixel blocks obtained by dividing a macroblock into 16 sub-blocks, a luminance position (x) and a chrominance position (o) are represented on a pixel basis. Each of macroblocks in a picture is encoded in the raster scan from upper left to lower right of the picture.
Back to FIG. 17, description will be made. The prediction-transformation-quantization unit 102 receives input of video data 101 and generates a prediction image from a decoded image 122 stored in a decoding picture buffer 121 to execute processing of subtracting the prediction image from the image data 101 as an input image on a macroblock basis. The input image is designed to, with frequency transform applied, be transformed from a space domain to a frequency domain, then quantized and output as coded data 123.
Here, the prediction-transformation-quantization unit 102 is designed to be supplied with a macroblock quantization parameter 126M by a rate controller 125. At the prediction-transformation-quantization unit 102, transformed coefficients obtained by applying frequency transform to the input image will be quantized by a quantization step size corresponding to the macroblock quantization parameter 126M.
In the present specification, the above-described transformed and quantized coefficients of the input image obtained at the prediction-transformation-quantization unit 102 will be referred to as “level values”. The “level values” corresponds to “transformed and quantized values” in the scope of claims.
The rate controller 125 monitors a bit stream 128 output from the video encoding device 100 according to the related art by a multiplexer 127. Then, it controls the quantization parameter 126 so as to adjust the number of bits of the bit stream 128 to a target number of bits. More specifically, when the current number of bits is larger than the target number of bits, calculate the quantization parameter 126 which makes the quantization step size be large and conversely when the current number of bits is smaller than the target number of bits, calculate the quantization parameter 126 which makes the quantization step size be small.
Thus calculated quantization parameter 126 is output as a slice quantization parameter 1265 at a head of a unit of a set of macroblocks called slice. The quantization parameter 126 will be also output as the macroblock quantization parameter 126M at the head of the macroblocks.
Among them, the slice quantization parameter 126S is designed to be supplied to the multiplexer 127, as well as to a context initial value calculator 129. A model ID (model_id) 131 supplied from a model calculator not shown which is disposed outside the video encoding device 100 as related art is designed to be also similarly supplied to the multiplexer 127, as well as to the context initial value calculator 129.
The context initial value calculator 129 calculates an initial value of a context for binary arithmetic coding based on the slice quantization parameter 126S supplied from the rate controller 125 and the model ID 131 supplied from the external model calculator. Then, set an initial value 132 of the context at a memory 133.
The slice quantization parameter 126S corresponds to a sum of the addition of “26” to “slice_qp_delta” of “Slice header syntax” in the section 7.3.3 in the above-described “ITU-T H.264 | ISO/IEC 14496-10 Advanced Video Coding” and “pic_int_qp_minus26” of “Picture parameter set RBSP syntax” in the same section 7.3.3. The model ID 131 also corresponds to “cabac_init_idc” in “Slice header syntax” in the same section 7.3.3.
Context is a pair of a most-probable-symbol (MPS) and a least-probable-symbol (LPS) occurrence probability state index (pStateidx) for a binary symbol which will be described later. In the binary arithmetic coding, a relationship expressed by the following expression (1) holds.MPS=1−LPS  (1)
In the H.264 standard of the video encoding system, the number of “pStateidx” of a context is 64. In the binary arithmetic coding processing, a least-probable-symbol occurrence probability (rLPS) corresponding to the “pStateidx” is set on the basis of a look-up table. Details thereof are recited in “9.3 CABAC parsing process for slice data” in the above-described “ITU-T H.264 | ISO/IEC 14496-10 Advanced Video Coding”.
The coded data 123 output from the prediction-transform-quantization unit 102 is supplied to a binarization unit 135. The coded data 123 here is formed of a prediction parameter related to generation of an predication image, a macroblock quantization parameter, and transformed and quantized coefficients (or transformed and quantized values).
The description will be supplemented. Strictly expressed, a difference value which is obtained by subtracting, from a macroblock quantization parameter, a macroblock quantization parameter of an immediately preceding macroblock, is supplied as coded data to the binarization unit 135. Whether a value of a quantization parameter is binarized as it is or a difference value is binarized is not so crucial to the present invention. The description has been therefore made assuming that the coded data 123 such as a macroblock quantization parameter is supplied to the binarization unit 135.
On the other hand, the above-described level values 136 output from the prediction-transform-quantization unit 102 are supplied to a local decoder 137. The local decoder 137 inverse-quantizes the level values 136. Then, by applying inverse-frequency-transform, it restores the level values 136 to an image in the original space domain. It will, however, be non-reversible restoration due to effect of quantization of the prediction-transform-quantization unit 102 in general. Finally, the local decoder 137 calculates a local decoded image 138 by adding a prediction image supplied from the prediction-transform-quantization unit 102 to the above-described image returned to the original space domain. The local decoded image 138 is supplied to the first switch 104. The binarization unit 135 having received a supply of the coded data 123 converts the image into a binary string according to a procedure determined by the H.264 standard to sequentially output each bit of the binary string. In the following description, each bit of the binary string will be referred to as a binary symbol (bin). Details of the binary string conversion executed by the binarization unit 135 are disclosed in “CABAC parsing process for slice data” in the section 9.3 in “ITU-T H.264 | ISO/IEC 14496-10 Advanced Video Coding”.
A PCM determination unit 139 is designed to monitor the coded data 123 supplied to the binarization unit 135. Then, determine whether the number of bins corresponding to coded data of one macroblock exceeds a predetermined and fixed number of bins. Monitoring the coded data 123 supplied to the binarization unit 135 is equivalent to monitoring the number of bins 142 to be input to a binary arithmetic encoder 141 connected to the output side of the binarization unit 135. The number of bins 142 is the number of bins in a case where the number of bits obtained by subjecting the number of bins corresponding to the coded data 123 of one macroblock described above to binary arithmetic coding exceeds the number of bits of one uncompressed macroblock.
In the present specification, the number of bits equivalent to input one uncompressed macroblock will be referred to as a predetermined number of bits. In the H.264 standard, when input video data is 4:2:0, its predetermined number of bits is 3200 bits.
When the number of bins corresponding to the coded data 123 of one macroblock does not exceed the predetermined number of bins as a result of monitoring, the PCM determination unit 139 outputs PCM-mode non-selection as a control signal 144. On the other hand, when the number of bins corresponding to the coded data 123 of one macroblock exceeds the predetermined number of bins, output PCM-mode selection as the control signal 144.
The control signal 144 is supplied to the first switch 104, the binary arithmetic encoder 141, a context modeling unit 146, and second and third switches 147 and 148. Among them, the binary arithmetic encoder 141 operates differently when a PCM-mode selection is signaled by the control signal 144 and when PCM-mode non-selection is signaled.
First, when the PCM mode selection is signaled by the control signal 144, the binary arithmetic encoder 141 outputs data 153 to a first buffer 151 among the first buffer 151 and a second buffer 152 connected to the unit. More specifically, first, it subjects bins of the prediction parameter corresponding to the PCM mode to binary arithmetic coding based on a context 154 supplied from the context modeling unit 146. Next, it flushes the current range of the binary arithmetic encoder 141 and output the same to the first buffer 151. Finally, by outputting as many bits of “0” as required, bit strings output to the first buffer 151 are aligned. Since bins of the prediction parameter corresponding to the PCM mode are values uniquely determined, the binary arithmetic encoder 141 by itself generates the bins to execute binary arithmetic coding.
On the other hand, when the PCM-mode non-selection is signaled by the control signal 144, the binary arithmetic encoder 141 executes output to the second buffer 152. The binary arithmetic encoder 141 subjects the bin 142 sequentially supplied from the binarization unit 135 to binary arithmetic coding based on the context 154 supplied from the context modeling unit 146. Then, it writes out its bit output 156 in the second buffer 152.
By the above-described binary arithmetic coding by the binary arithmetic encoder 141, the context 154 supplied by the context modeling unit 146 will be sequentially updated according to the bin 142 as a target of binary arithmetic coding. Roughly expressed, when a most-probable-symbol (MPS) is equal to a bin in the H.264 standard, it updates least-probable-symbol occurrence probability state indexes (pStateidx) such that the least-probable-symbol occurrence probabilities (rLPS) are decreased. When a most-probable-symbol (MPS) is not equal to a bin in the H.264 standard, update least-probable-symbol occurrence probability state indexes (pStateidx) such that least-probable-symbol occurrence probabilities (rLPS) are increased. When a most-probable-symbol (MPS) is not equal to a bin and least-probable-symbol occurrence probability state indexes (pStateidx) is “0”, that is, when a least-probable-symbol occurrence probability (rLPS) is the largest, invert the most-probable-symbol (MPS). Details of this point are recited in “9.3 CABAC parsing process for slice data” in “ITU-T H.264 | ISO/IEC 14496-10 Advanced Video Coding”.
Thus, by updating (learning) each context 154 according to input bins, the binary arithmetic encoder 141 enables entropy coding which meets occurrence probabilities of the input bins.
The context modeling unit 146 sequentially reads the context 154 corresponding to the bin 142 supplied to the binary arithmetic encoder 141 from the memory 133. Then, the context 154 is supplied to the binary arithmetic encoder 141 and the context learned by the binary arithmetic encoder 141 is stored in the memory 133.
On the other hand, the PCM encoder 103 receives input of the video data 101 on a macroblock basis and makes a pixel value of the input macroblock into a PCM code (non-entropy code). This PCM coding output 157 is output to a third buffer 158.
The number of bits of PCM coding with respect to one macroblock, with its pixel value as 8 bits, will be a product of 384 (pixels) and 8 bits, 3072 bits. Strictly speaking, the number of header bits for designating PCM coding is added to 3072 bits, whose total number of bits will be not more than the above-described predetermined number of bits. In addition, as to the number of header bits, the number of bits for 8 bins in a B slice in the H.264 standard will be a maximum value.
When the PCM mode selection is signaled by the control signal 144, the first switch 104 supplies the decoding picture buffer 121 with the video data 101 as an input image. On the other hand, when the PCM-mode non-selection is signaled by the control signal 144, the first switch 104 selects the local decoded image 138 output from the local decoder 137 and supplies the decoding picture buffer 121 with the same. The decoding picture buffer 121 will store an image input through the first switch 104 as a decoded image for the following coding.
Only when the PCM-mode selection is signaled by the control signal 144, the second switch 147 supplies the multiplexer 127 with output data 161 of the first buffer 151. In other words, when the PCM-mode non-selection is signaled by the control signal 144, the output data 161 of the first buffer 151 is not supplied to the multiplexer 127.
When the PCM-mode non-selection is signaled by the control signal 144, the third switch 148 supplies the multiplexer 127 with output data 162 of the second buffer 152. When the PCM-mode selection is signaled by the control signal 144, the multiplexer 127 will be supplied with output data 163 of the third buffer 158.
The multiplexer 127 multiplexes the slice quantization parameter 126S, the model ID (model_id) 131 supplied from an external model calculator and the output data 161 and either one of the output data 162 and 163 input through the second and the third switches 147 and 148, respectively, and outputs the obtained data as the bit stream 128.
Patent Literature 1: Japanese Patent Laying-Open No. 2004-135251 (paragraph 0005).
The video encoding device 100 as described in above enables binary arithmetic coding or PCM coding to be executed selectively on a macroblock basis. Accordingly, even when the number of bits of binary arithmetic coding with respect to coded data of a certain macroblock exceeds a predetermined number of bits, selecting PCM coding guarantees that the number of output bits of the macroblock is not more than the predetermined number of bits.
One reason why the number of bits of binary arithmetic coding with respect to coded data of a certain macroblock exceeds a predetermined number of bits is that symbol occurrence probabilities of contexts learned in the past macroblocks do not coincide with bin occurrence probabilities of the coded data of the current macroblock. More specifically, assume that in a certain macroblock, its symbol occurrence probabilities drastically change. In this case, the number of bits of binary arithmetic coding with respect to coded data of the macroblock exceeds the above-described predetermined number of bits, which is one factor that symbol occurrence probabilities and bin occurrence probabilities do not coincide with each other. More specifically, one example is a case where after successive macroblocks having significant level values only in highest frequency components of transformed and quantized coefficients are subjected to binary arithmetic coding and the subsequent macroblock having significant level values in all frequency components of transformed and quantized coefficients is input. Here, a signification level value represents a level having an absolute value larger than zero.
The video encoding device 100 as related art, when selecting PCM coding, is not possible to learn contexts related to level values. Therefore, assuming that a macroblock having significant level values in every frequency components of transformed and quantized coefficients is input and PCM coded, even when a macroblock having significant level values in every frequency components of transformed and quantized coefficients is again input thereafter, the contexts are not learned. As a result, it is impossible to realize efficient binary arithmetic coding with respect to a latter macroblock as well.
Thus, the video encoding device 100 as related art has a problem that coding efficiency is decreased due to drastic changes in symbol occurrence probabilities. While in the above description, consideration has been given to video coding, the same problem occurs at the time of decoded image data obtained when symbol occurrence probabilities have drastic changes.
Under these circumstances, an object of the present invention is to provide a video encoding device, a video encoding method, and a video encoding program which prevent decrease in a compression efficiency caused by drastic changes in symbol occurrence probabilities in context adaptive coding, and their corresponding video decoding device, video decoding method and decoding program.