Non Patent Literature (NPL) 1 discloses typical video encoding system and video decoding system.
A video encoding device described in NPL 1 has a structure as shown in FIG. 15. The video encoding device shown in FIG. 15 is called a typical video encoding device below.
Referring to FIG. 15, the structure and operation of the typical video encoding device that receives each frame of digitized video as input and outputs a bitstream are described below.
The video encoding device shown in FIG. 15 includes a transformer/quantizer 101, an entropy encoder 102, an inverse transformer/inverse quantizer 103, a buffer 104, a predictor 105, a multiplexer 106, and an encoding controller 108.
The video encoding device shown in FIG. 15 divides each frame into blocks of 16×16 pixel size called macro blocks (MBs), and encodes each MB sequentially from top left of the frame.
FIG. 16 is an explanatory diagram showing an example of block division in the case where the frame has a spatial resolution of QCIF (Quarter Common Intermediate Format). The following describes the operation of each unit while focusing only on pixel values of luminance for simplicity's sake.
A prediction signal supplied from the predictor 105 is subtracted from the block-divided input video, and the result is input to the transformer/quantizer 101 as a prediction error image. There are two types of prediction signals, namely, an intra prediction signal and an inter prediction signal. The inter prediction signal is also called an inter-frame prediction signal.
Each of the prediction signals is described below. The intra prediction signal is a prediction signal generated based on an image of a reconstructed picture that has the same display time as a current picture stored in the buffer 104.
Referring to 8.3.1 Intra_4×4 prediction process for luma samples, 8.3.2 Intra_8×8 prediction process for luma samples, and 8.3.3 Intra_16×16 prediction process for luma samples in NPL 1, intra prediction of three block sizes, i.e. Intra_4×4, Intra_8×8, and Intra_16×16, are available.
Intra_4×4 and Intra_8×8 are respectively intra prediction of 4×4 block size and 8×8 block size, as can be understood from (a) and (c) in FIG. 17. Each circle (∘) in the drawing represents a reference pixel used for intra prediction, i.e., a pixel of the reconstructed picture having the same display time as the current picture.
In intra prediction of Intra_4×4, reconstructed peripheral pixels are directly set as reference pixels, and used for padding (extrapolation) in nine directions shown in (b) of FIG. 17 to form the prediction signal. In intra prediction of Intra_8×8, pixels obtained by smoothing peripheral pixels of the image of the reconstructed picture by low-pass filters (1/2, 1/4, 1/2) shown under the right arrow in (c) of FIG. 17 are set as reference pixels, and used for extrapolation in the nine directions shown in (b) of FIG. 17 to form the prediction signal.
Similarly, Intra_16×16 is intra prediction of 16×16 block size, as can be understood from (a) in FIG. 18. Like in FIG. 17, each circle (∘) in the drawing represents a reference pixel used for intra prediction, i.e., a pixel of the reconstructed picture having the same display time as the current picture. In intra prediction of Intra_16×16, peripheral pixels of the image of the reconstructed picture are directly set as reference pixels, and used for extrapolation in four directions shown in (b) of FIG. 18 to form the prediction signal.
Hereafter, an MB and a block encoded using the intra prediction signal are called an intra MB and an intra block, respectively, i.e., a block size of intra prediction is called an intra prediction block size, and a direction of extrapolation is called an intra prediction direction. The intra prediction block size and the intra prediction direction are prediction parameters related to intra prediction.
The inter prediction signal is a prediction signal generated from an image of a reconstructed picture different in display time from the one the current picture has and is stored in the buffer 104. Hereafter, an MB and a block encoded using the inter prediction signal are called an inter MB and an inter block, respectively. A block size of inter prediction (inter prediction block size) can be selected from, for example, 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4.
FIG. 19 is an explanatory diagram showing an example of inter prediction using 16×16 block size. A motion vector MV=(mvx, mvy) shown in FIG. 19 is a prediction parameter of inter prediction, which indicates the amount of parallel translation of an inter prediction block (inter prediction signal) of a reference picture relative to a block to be encoded. In AVC, prediction parameters of inter prediction include not only a direction of inter prediction representing a direction of the reference picture of an inter prediction signal relative to a picture to be encoded of the block to be encoded, but also a reference picture index for identifying the reference picture used for inter prediction of the block to be encoded. This is because, in AVC, multiple reference pictures stored in the buffer 104 can be used for inter prediction.
In AVC inter prediction, a motion vector can be calculated at 1/4-pixel accuracy. FIG. 20 is an explanatory diagram showing interpolation processing for luminance signals in motion-compensated prediction. In FIG. 20, A represents a pixel signal at an integer pixel position, b, c, d represent pixel signals at decimal pixel positions with 1/2-pixel accuracy, and e1, e2, e3 represent pixel signals at decimal pixel positions with 1/4-pixel accuracy. The pixel signal b is generated by applying a six-tap filter to pixels at horizontal integer pixel positions. Likewise, the pixel signal c is generated by applying the six-tap filter to pixels at vertical integer pixel positions. The pixel signal d is generated by applying the six-tap filter to pixels at horizontal or vertical decimal pixel positions with 1/2-pixel accuracy. The coefficients of the six-tap filter are represented as [1, −5, 20, 20, −5, 1]/32. The pixel signals e1, e2, and e3 are generated by applying a two-tap filter [1, 1]/2 to pixels at neighboring integer pixel positions or decimal pixel positions, respectively.
A picture encoded by including only intra MBs is called an I picture. A picture encoded by including not only intra MBs but also inter MBs is called a P picture. A picture encoded by including inter MBs that use not only one reference picture but two reference pictures simultaneously for inter prediction is called a B picture. In the B picture, inter prediction in which the direction of the reference picture of the inter prediction signal relative to the picture to be encoded of the block to be encoded is past is called forward prediction, inter prediction in which the direction of the reference picture of the inter prediction signal relative to the picture to be encoded of the block to be encoded is future is called backward prediction, and inter prediction simultaneously using two reference pictures involving both the past and the future is called bidirectional prediction. The direction of inter prediction (inter prediction direction) is a prediction parameter of inter prediction.
In accordance with an instruction from the encoding controller 108, the predictor 105 compares an input video signal with a prediction signal to determine a prediction parameter that minimizes the energy of a prediction error image block. The encoding controller 108 supplies the determined prediction parameter to the entropy encoder 102.
The transformer/quantizer 101 frequency-transforms the image (prediction error image) from which the prediction signal has been subtracted to get a frequency transform coefficient.
The transformer/quantizer 101 further quantizes the frequency transform coefficient with a predetermined quantization step width Qs. Hereafter, the quantized frequency transform coefficient is called a transform quantization value.
The entropy encoder 102 entropy-encodes the prediction parameters and the transform quantization value. The prediction parameters are information associated with MB and block prediction, such as prediction mode (intra prediction, inter prediction), intra prediction block size, intra prediction direction, inter prediction block size, and motion vector mentioned above.
The inverse transformer/inverse quantizer 103 inverse-quantizes the transform quantization value with the predetermined quantization step width Qs. The inverse transformer/inverse quantizer 103 further performs inverse frequency transform of the frequency transform coefficient obtained by the inverse quantization. The prediction signal is added to the reconstructed prediction error image obtained by the inverse frequency transform, and the result is supplied to the buffer 104.
The buffer 104 stores the reconstructed image supplied. The reconstructed image for one frame is called a reconstructed picture.
The multiplexer 106 multiplexes and outputs the output data of the entropy encoder 102 and coding parameters.
Based on the operation described above, the multiplexer 106 in the video encoding device generates a bitstream.
A video decoding device described in NPL 1 has a structure as shown in FIG. 21. Hereafter, the video decoding device shown in FIG. 21 is called a typical video decoding device.
Referring to FIG. 21, the structure and operation of the typical video decoding device that receives the bitstream as input and outputs a decoded video frame is described.
The video decoding device shown in FIG. 21 includes a de-multiplexer 201, an entropy decoder 202, an inverse transformer/inverse quantizer 203, a predictor 204, and a buffer 205.
The de-multiplexer 201 de-multiplexes the input bitstream and extracts an entropy-encoded video bitstream.
The entropy decoder 202 entropy-decodes the video bitstream. The entropy decoder 202 entropy-decodes the MB and block prediction parameters and the transform quantization value, and supplies the results to the inverse transformer/inverse quantizer 203 and the predictor 204.
The inverse transformer/inverse quantizer 203 inverse-quantizes the transform quantization value with the quantization step width. The inverse transformer/inverse quantizer 203 further performs inverse frequency transform of the frequency transform coefficient obtained by the inverse quantization.
After the inverse frequency transform, the predictor 204 generates a prediction signal using an image of a reconstructed picture stored in the buffer 205 based on the entropy-decoded MB and block prediction parameters.
After the generation of the prediction signal, the prediction signal supplied from the predictor 204 is added to a reconstructed prediction error image obtained by the inverse frequency transform performed by the inverse transformer/inverse quantizer 203, and the result is supplied to the buffer 205 as a reconstructed image.
Then, the reconstructed picture stored in the buffer 205 is output as a decoded image (decoded video).
Based on the operation described above, the typical video decoding device generates the decoded image.