A typical video encoding device executes an encoding process that conforms to a predetermined video coding scheme to generate coded data, i.e. a bitstream. In ISO/IEC 14496-10 Advanced Video Coding (AVC) described in Non Patent Literature (NPL) 1 as a representative example of the predetermined video coding scheme, each frame is divided into blocks of 16×16 pixel size called MBs (Macro Blocks), and each MB is further divided into blocks of 4×4 pixel size, setting MB as the minimum unit of encoding. FIG. 23 shows an example of block division in the case where the color format of a frame is the YCbCr 4:2:0 format and the spatial resolution is QCIF (Quarter Common Intermediate Format).
Each of the divided image blocks is input sequentially to the video encoding device and encoded. FIG. 24 is a block diagram showing an example of the structure of the typical video encoding device for generating a bitstream that conforms to AVC. Referring to FIG. 24, the structure and operation of the typical video encoding device is described below.
The video encoding device shown in FIG. 24 includes a frequency transformer 101, a quantizer 102, a variable-length encoder 103, a quantization controller 104, an inverse quantizer 105, an inverse frequency transformer 106, a frame memory 107, an intra-frame predictor 108, an inter-frame predictor 109, and a prediction selector 110.
An input image to the video encoding device is input to the frequency transformer 101 as a prediction error image, after a prediction image supplied from the intra-frame predictor 108 or the inter-frame predictor 109 through the prediction selector 110 is subtracted from the input image.
The frequency transformer 101 transforms the input prediction error image from a spatial domain to a frequency domain, and outputs the result as a coefficient image.
The quantizer 102 quantizes the coefficient image supplied from the frequency transformer 101 using a quantization step size, supplied from the quantization controller 104, controlling the granularity of quantization, and outputs the result as a quantized coefficient image.
The variable-length encoder 103 entropy-encodes the quantized coefficient image supplied from the quantizer 102. The variable-length encoder 103 also encodes the above quantization step size supplied from the quantization controller 104 and an image prediction parameter supplied from the prediction selector 110. These pieces of coded data are multiplexed and output from the video encoding device as a bitstream.
Here, an encoding process for the quantization step size at the variable-length encoder 103 is described with reference to FIG. 25. In the variable-length encoder 103, a quantization step size encoder for encoding the quantization step size includes a quantization step size buffer 10311 and an entropy encoder 10312 as shown in FIG. 25.
The quantization step size buffer 10311 holds a quantization step size Q(i−1) assigned to the previous image block encoded immediately before an image block to be encoded.
As shown in the following equation (1), the previous quantization step size Q(i−1) supplied from the quantization step size buffer 10311 is subtracted from an input quantization step size Q(i), and the result is input to the entropy encoder 10312 as a difference quantization step size dQ(i).dQ(i)=Q(i)−Q(i−1)  (1)
The entropy encoder 10312 entropy-encodes the input difference quantization step size dQ(i), and outputs the result as code corresponding to the quantization step size.
The above has described the encoding process for the quantization step size.
The quantization controller 104 determines a quantization step size for the current input image block. In general, the quantization controller 104 monitors the output code rate of the variable-length encoder 103 to increase the quantization step size so as to reduce the output code rate for the image block concerned, or, conversely, to decrease the quantization step size so as to increase the output code rate for the image block concerned. The increase or decrease in quantization step size enables the video encoding device to encode an input moving image by a target rate. The determined quantization step size is supplied to the quantizer 102 and the variable-length encoder 103.
The quantized coefficient image output from the quantizer 102 is inverse-quantized by the inverse quantizer 105 to obtain a coefficient image to be used for prediction in encoding subsequent image blocks. The coefficient image output from the inverse quantizer 105 is set back to the spatial domain by the inverse frequency transformer 106 to obtain a prediction error image. The prediction image is added to the prediction error image, and the result is input to the frame memory 107 and the intra-frame predictor 108 as a reconstructed image.
The frame memory 107 stores reconstructed images of encoded image frames input in the past. The image frames stored in the frame memory 107 are called reference frames.
The intra-frame predictor 108 refers to reconstructed images of image blocks encoded in the past within the image frame being currently encoded to generate a prediction image.
The inter-frame predictor 109 refers to reference frames supplied from the frame memory 107 to generate a prediction image.
The prediction selector 110 compares the prediction image supplied from the intra-frame predictor 108 with the prediction image supplied from the inter-frame predictor 109, selects and outputs one prediction image closer to the input image. The prediction selector 110 also outputs information (called an image prediction parameter) on a prediction method used by the intra-frame predictor 108 or the inter-frame predictor 109, and supplies the information to the variable-length encoder 103.
According to the processing mentioned above, the typical video encoding device compressively encodes the input moving image to generate a bitstream.
The output bitstream is transmitted to a video decoding device. The video decoding device executes a decoding process so that the bitstream will be decompressed as a moving image. FIG. 26 shows an example of the structure of a typical video decoding device that decodes the bitstream output from the typical video encoding device to obtain decoded video. Referring to FIG. 26, the structure and operation of the typical video decoding device is described below.
The video decoding device shown in FIG. 26 includes a variable-length decoder 201, an inverse quantizer 202, an inverse frequency transformer 203, a frame memory 204, an intra-frame predictor 205, an inter-frame predictor 206, and a prediction selector 207.
The variable-length decoder 201 variable-length-decodes the input bitstream to obtain a quantization step size that controls the granularity of inverse quantization, the quantized coefficient image, and the image prediction parameter. The quantization step size and the quantized coefficient image mentioned above are supplied to the inverse quantizer 202. The image prediction parameter is supplied to the prediction selector 207.
The inverse quantizer 202 inverse-quantizes the input quantized coefficient image based on the input quantization step size, and outputs the result as a coefficient image.
The inverse frequency transformer 203 transforms the coefficient image, supplied from the inverse quantizer 202, from the frequency domain to the spatial domain, and outputs the result as a prediction error image. A prediction image supplied from the prediction selector 207 is added to the prediction error image to obtain a decoded image. The decoded image is not only output from the video decoding device as an output image, but also input to the frame memory 204 and the intra-frame predictor 205.
The frame memory 204 stores image frames decoded in the past. The image frames stored in the frame memory 204 are called reference frames.
Based on the image prediction parameter supplied from the variable-length decoder 201, the intra-frame predictor 205 refers to reconstructed images of image blocks decoded in the past within the image frame being currently decoded to generate a prediction image.
Based on the image prediction parameter supplied from the variable-length decoder 201, the inter-frame predictor 206 refers to reference frames supplied from the frame memory 204 to generate a prediction image.
The prediction selector 207 selects either of the prediction images supplied from the intra-frame predictor 205 and the inter-frame predictor 206 based on the image prediction parameter supplied from the variable-length decoder 201.
Here, a decoding process for the quantization step size at the variable-length decoder 201 is described with reference to FIG. 27. In the variable-length decoder 201, a quantization step size decoder for decoding the quantization step size includes an entropy decoder 20111 and a quantization step size buffer 20112 as shown in FIG. 27.
The entropy decoder 20111 entropy-decodes input code, and outputs a difference quantization step size dQ(i).
The quantization step size buffer 20112 holds the previous quantization step size Q(i−1).
As shown in the following equation (2), Q(i−1) supplied from the quantization step size buffer 20112 is added to the difference quantization step size dQ(i) generated by the entropy decoder 20111. The added value is not only output as a quantization step size Q(i), but also input to the quantization step size buffer 20112.Q(i)=Q(i−1)+dQ(i)  (2)
The above has described the decoding process for the quantization step size.
According to the processing mentioned above, the typical video decoding device decodes the bitstream to generate a moving image.
In the meantime, in order to maintain the subjective quality of the moving image to be compressed by the encoding process, the quantization controller 104 in the typical video encoding device is generally analyzes either or both of the input image and the prediction error image, as well as analyzing the output code rate, to determine a quantization step size according to the human visual sensitivity. In other words, the quantization controller 104 performs visual-sensitivity-based adaptive quantization. Specifically, when the human visual sensitivity to the current image to be encoded is determined to be high, the quantization step size is set small, while when the visual sensitivity is determined to be low, the quantization step size is set large. Since such control can assign a larger code rate to a low visual sensitivity region, the subjective quality is improved.
As a visual-sensitivity-based adaptive quantization technique, for example, adaptive quantization based on the texture complexity of an input image used in MPEG-2 Test Model 5 (TM5) is known. The texture complexity is typically called activity. Patent Literature (PTL) 1 proposes an adaptive quantization system using the activity of a prediction image in conjunction with the activity of an input image. PTL 2 proposes an adaptive quantization system based on an activity that takes edge portions into account.
When the visual-sensitivity-based adaptive quantization technique is used, it will cause a problem if the quantization step size is often changed within an image frame. In the typical video encoding device for generating a bitstream that confirms to the AVC scheme, a difference from a quantization step size for an image block encoded just before an image block to be encoded is entropy-encoded in encoding the quantization step size. Therefore, as the change in quantization step size in the encoding sequence direction becomes large, the rate required to encode the quantization step size increases. As a result, the code rate assigned to encoding of the coefficient image is relatively reduced, and hence the image quality is degraded.
Since the encoding sequence direction is independent of the continuity of the visual sensitivity on the screen, the visual-sensitivity-based adaptive quantization technique inevitably increases the code rate required to encode the quantization step size. Therefore, even using the visual-sensitivity-based adaptive quantization technique in the typical video encoding device, the image degradation associated with the increase in the code rate for the quantization step size may cancel out the subjective quality improved by the adaptive quantization technique, i.e., there arises a problem that a sufficient improvement in image quality cannot be achieved.
To address this problem, PTL 3 discloses a technique for adaptively setting a range of quantization to zero, i.e. a dead zone according to the visual sensitivity in the spatial domain and the frequency domain instead of adaptively setting the quantization step size according to the visual sensitivity. In the system described in PTL 3, a dead zone for a transform coefficient determined to be low in terms of the visual sensitivity is more widened than a dead zone for a transform coefficient determined to be high in terms of the visual sensitivity. Such control enables visual-sensitivity-based adaptive quantization without changing the quantization step size.