A typical video encoding device performs an encoding process compliant with a predetermined video coding scheme on each frame of an input video, to generate coded data, i.e. a bitstream. ISO/IEC 14496-10 Advanced Video Coding (AVC) described in Non Patent Literature (NPL) 1, which is a representative example of the predetermined video coding scheme, divides each frame into blocks of 16×16 pixel size called macroblocks (MBs), and further divides each MB into blocks of 4×4 pixel size, where MB is the minimum unit of encoding. FIG. 17 shows an example of block division in the case where the color format of a frame is the YCbCr 4:2:0 format and the spatial resolution is QCIF (Quarter Common Intermediate Format).
Each of the image blocks obtained by the division is sequentially input to a video encoding device and encoded. FIG. 18 is a block diagram showing an example of a structure of a typical video encoding device for generating an AVC-compliant bitstream. The following describes the structure and operation of the typical video encoding device, with reference to FIG. 18.
The video encoding device shown in FIG. 18 includes a frequency transformer 101, a quantizer 102, a variable length encoder 103, a quantization controller 104, an inverse quantizer 105, an inverse frequency transformer 106, a frame memory 107, an intra-frame predictor 108, an inter-frame predictor 109, a prediction selector 110, and a bitstream buffer 111.
A predicted image supplied from the intra-frame predictor 108 or the inter-frame predictor 109 via the prediction selector 110 is subtracted from an image input to the video encoding device, and the result is input to the frequency transformer 101 as a prediction error image.
The frequency transformer 101 transforms the input prediction error image from a spatial domain to a frequency domain, and outputs the result as a coefficient image.
The quantizer 102 quantizes the coefficient image supplied from the frequency transformer 101 using a quantization step size, supplied from the quantization controller 104, for controlling quantization granularity, and outputs the result as a quantized coefficient image.
The variable length encoder 103 entropy-encodes the quantized coefficient image supplied from the quantizer 102. The variable length encoder 103 also encodes the quantization step size supplied from the quantization controller 104 and an image prediction parameter supplied from the prediction selector 110. These coded data are multiplexed and stored in the bitstream buffer 111 as a bitstream.
The bitstream buffer 111 stores the bitstream supplied from the variable length encoder 103, and outputs the bitstream as the output of the video encoding device at a predetermined transmission rate. The processing rate in the video encoding device and the transmission rate of the bitstream output from the video encoding device are adjusted by the bitstream buffer 111.
The quantization step size encoding process in the variable length encoder 103 is described below, with reference to FIG. 19. As shown in FIG. 19, a quantization step size encoder for encoding the quantization step size in the variable length encoder 103 includes a quantization step size buffer 10311 and an entropy encoder 10312.
The quantization step size buffer 10311 holds a quantization step size Q(i−1) assigned to an immediately previously encoded image block.
The immediately previous quantization step size Q(i−1) supplied from the quantization step size buffer 10311 is subtracted from an input quantization step size Q(i) as shown in the following equation (1), and the result is input to the entropy encoder 10312 as a differential quantization step size dQ(i).dQ(i)=Q(i)−Q(i−1)   (1).
The entropy encoder 10312 entropy-encodes the input differential quantization step size dQ(i), and outputs the result as a code corresponding to the quantization step size.
This completes the description of the quantization step size encoding process.
The quantization controller 104 determines a quantization step size for the current input image block. Typically, the quantization controller 104 monitors the amount of output code of the variable length encoder 103, and increases the quantization step size so as to reduce the amount of output code for the image block or decreases the quantization step size so as to increase the amount of output code for the image block. The quantization step size is increased or decreased to enable the video encoding device to encode an input moving image with a desired amount of code. The determined quantization step size is supplied to the quantizer 102 and the variable length encoder 103.
The quantized coefficient image output from the quantizer 102 is inverse-quantized by the inverse quantizer 105 to a coefficient image, to be used for prediction in subsequent image block encoding. The coefficient image output from the inverse quantizer 105 is transformed back to the spatial domain by the inverse frequency transformer 106, as the prediction error image. The predicted image is added to the prediction error image, and the result is input to the frame memory 107 and the intra-frame predictor 108 as a reconstructed image.
The frame memory 107 stores reconstructed images of previously input and encoded image frames. The image frames stored in the frame memory 107 are referred to as “reference frames”.
The intra-frame predictor 103 generates a predicted image, by referencing to a reconstructed image of a previously encoded image block in the image frame being currently encoded.
The inter-frame predictor 109 generates a predicted image, by referencing to a reference frame supplied from the frame memory 107.
The prediction selector 110 compares the predicted image supplied from the intra-frame predictor 108 and the predicted image supplied from the inter-frame predictor 109, and selects and outputs the predicted image closer to the input image. The prediction selector 110 also outputs information (referred to as “image prediction parameter”) about the method of prediction employed by the intra-frame predictor 108 or the inter-frame predictor 109, to supply it to the variable length encoder 103.
The typical video encoding device compression-encodes the input moving image to generate a bitstream, by the above-mentioned process.
The output bitstream is transmitted to a video decoding device. The video decoding device performs a decoding process on the bitstream, to restore a moving image. FIG. 20 shows an example of a structure of a typical video decoding device for decoding the bitstream output from the typical video encoding device to obtain the decoded video. The following describes the structure and operation of the typical video decoding device, with reference to FIG. 20.
The video decoding device shown in FIG. 20 includes a variable length decoder 201, an inverse quantizer 202, an inverse frequency transformer 203, a frame memory 204, an intra-frame predictor 205, an inter-frame predictor 206, a prediction selector 207, and a bitstream buffer 2083.
The bitstream buffer 208 stores the input bitstream, and then outputs the bitstream to the variable length decode 201. The transmission rate of the bitstream input to the video decoding device and the processing rate in the video decoding device are adjusted by the bitstream buffer 208.
The variable length decoder 201 variable-length-decodes the bitstream input from the bitstream buffer 208, to obtain a quantization step size for controlling quantization granularity, a quantized coefficient image, and an image prediction parameter. The quantization step size and the quantized coefficient image are supplied to the inverse quantizer 202. The image prediction parameter is supplied to the prediction selector 207.
The inverse quantizer 202 inverse-quantizes the input quantized coefficient image based on the input quantization step size, and outputs the result as a coefficient image.
The inverse frequency transformer 203 transforms the coefficient image supplied from the inverse quantizer 202 from the frequency domain to the spatial domain, and outputs the result as a prediction error image. The predicted image supplied from the prediction selector 207 is added to the prediction error image, to generate a decoded image. The decoded image is output from the video decoding device as the output image, and also input to the frame memory 204 and the intra-frame predictor 205.
The frame memory 204 stores previously decoded image frames. The image frames stored in the frame memory 204 are referred to as “reference frames”.
The intra-frame predictor 205 generates a predicted image by referencing to, based on the image prediction parameter supplied from the variable length decoder 201, a reconstructed image of a previously decoded image block in the image frame being currently decoded.
The inter-frame predictor 206 generates a predicted image by referencing to, based on the image prediction parameter supplied from the variable length decoder 201, a reference frame supplied from the frame memory 204.
The prediction selector 207 selects the predicted image supplied from the intra-frame predictor 205 or the predicted image supplied from the inter-frame predictor 206, based on the image prediction parameter supplied from the variable length decoder 201.
The quantization step size decoding process in the variable length decoder 201 is described below, with reference to FIG. 21. As shown in FIG. 21, a quantization step size decoder for decoding the quantization step size in the variable length decoder 201 includes an entropy decoder 20111 and a quantization step size buffer 20112.
The entropy decoder 20111 entropy-decodes the input code, and outputs a differential quantization step size dQ(i).
The quantization step size buffer 20112 holds the immediately previous quantization step size Q(i−1).
Q(i−1) supplied from the quantization step size buffer 20112 is added to the differential quantization step size dQ(i) generated by the entropy decoder 20111, as shown in the following equation (2). The sum is output as a quantization step size Q(i), and also input to the quantization step size buffer 20112.Q(i)=Q(i−1)+dQ(i)   (2).
This completes the description of the quantization step size decoding process.
The typical video decoding device decodes the input bitstream to generate the moving image, by the above-mentioned process.
Typically, the quantization controller 104 in the typical video encoding device not only analyzes the amount of output code but also analyzes one or both of the input image and the prediction error image to determine the quantization step size according to human visual sensitivity, in order to maintain the subjective quality of the moving image compressed by the encoding process. That is, the quantization controller 104 performs visual-sensitivity-adaptive quantization. In detail, the quantization controller 104 sets a small quantization step size in the case where the human visual sensitivity to the current image to be encoded is determined to be high, and sets a large quantization step size in the case where the human visual sensitivity is determined to be low. Such control allows a larger amount of code to be assigned to an area that is high in visual sensitivity, thus improving the subjective image quality.
An example of a known visual-sensitivity-adaptive quantization technique is adaptive quantization based on texture complexity of an input image, which is employed in Test Model 5 (TM5) of MPEG-2 The texture complexity is commonly called “activity”. Patent Literature (PTL) 1 proposes an adaptive quantization method in which the activity of the predicted image is used in addition to the activity of the input image. PTL 2 proposes an adaptive quantization method based on activity that takes an edge part into account.
When using the visual-sensitivity-adaptive quantization technique, there is a problem that the quantization step size frequently varies within an image frame. The typical video encoding device for generating the AVC-compliant bitstream, upon encoding the quantization step size, entropy-encodes the difference from the quantization step size for the immediately previously encoded image block. Accordingly, if the variation of the quantization step size in the encoding order direction is large, the amount of code necessary to encode the quantization step size increases. This causes an increase in bitstream size, and an increase in memory size required to implement the bitstream buffer.
Since the encoding order direction is not related to the continuity of the visual sensitivity on the screen, the visual-sensitivity-adaptive quantization technique inevitably increases the amount of code necessary to encode the quantization step size. Therefore, the typical video encoding device has a problem that an increase in bitstream size and an increase in required memory size are inevitable in the case of using the visual-sensitivity-adaptive quantization technique in order to improve the subjective image quality.
In view of this problem, PTL 3 discloses a technique in which a dead zone, i.e. a range for quantizing to zero, is adaptively set according to visual sensitivity in the spatial domain and the frequency domain, instead of adaptively setting the quantization step size according to visual sensitivity. In the method described in PTL 3, a dead zone for a transform coefficient determined to be low in visual sensitivity is set larger than a dead zone for a transform coefficient determined to be high in visual sensitivity. Such control enables visual-sensitivity-adaptive quantization to be carried out without varying the quantization step size.