Patent Literature (PTL) 1 proposes a video encoding method for embedding, in an output bitstream, information indicating a block type that is not subjected to a transform process and an entropy encoding process, in order to guarantee a certain processing time for a video encoding device or a video decoding device.
An example of the block type that is not subjected to the transform process and the entropy encoding process is pulse code modulation (PCM) described in Non Patent Literature (NPL) 1. The term block type means an encoding type (below-mentioned intra prediction, inter prediction, and PCM) used for a block.
A video encoding device described in NPL 1 has a structure shown in FIG. 14. The video encoding device shown in FIG. 14 is hereafter referred to as a typical video encoding device.
A structure and an operation of the typical video encoding device that receives each frame of digitized video as input and outputs a bitstream are described below, with reference to FIG. 14.
The video encoding device shown in FIG. 14 includes a transformer/quantizer 102, an entropy encoder 103, an inverse transformer/inverse quantizer 104, a buffer 105, a predictor 106, a PCM encoder 107, a PCM decoder 108, a multiplex data selector 109, a multiplexer 110, a switch 121, and a switch 122.
The video encoding device shown in FIG. 14 divides each frame into blocks of 16×16 pixel size called macroblocks (MBs), and encodes each MB sequentially from top left of the frame. In AVC described in NPL 1, each MB is further divided into blocks of 4×4 pixel size, and each block of 4×4 pixel size is encoded.
FIG. 15 is an explanatory diagram showing an example of block division in the case where the frame has a spatial resolution of QCIF (Quarter Common Intermediate Format). The following describes an operation of each unit while focusing only on pixel values of luminance, for simplicity's sake.
A prediction signal supplied from the predictor 106 is subtracted from the block-divided input video, and the result is input to the transformer/quantizer 102. There are two types of prediction signal, namely, an intra prediction signal and an inter-frame prediction signal. Each of the prediction signals is described below.
The intra prediction signal is a prediction signal generated based on an image of a reconstructed picture that has the same display time as a current picture and is stored in the buffer 105. Referring to 8.3.1 Intra_4×4 prediction process for luma samples, 8.3.2 Intra_8×8 prediction process for luma samples, and 8.3.3 Intra_16×16 prediction process for luma samples in NPL 1, intra prediction of three block sizes, i.e. Intra_4×4, Intra_8×8, and Intra_16×16, are available.
Intra_4×4 and Intra_8×8 are respectively intra prediction of 4×4 block size and 8×8 block size, as can be understood from (a) and (c) in FIG. 16. Each circle (o) in the drawing represents a reference pixel used for intra prediction, i.e. a pixel of the reconstructed picture having the same display time as the current picture.
In intra prediction of Intra_4×4, reconstructed peripheral pixels are directly set as reference pixels, and used for padding (extrapolation) in nine directions shown in (b) in FIG. 16 to form the prediction signal. In intra prediction of Intra_8×8, pixels obtained by smoothing peripheral pixels of the image of the reconstructed picture by low-pass filters (1/2, 1/4, 1/2) shown under the right arrow in (c) in FIG. 16 are set as reference signals, and used for extrapolation in the nine directions shown in (b) in FIG. 16 to form the prediction signal.
Similarly, Intra_16×16 is intra prediction of 16×16 block size, as can be understood from (a) in FIG. 17. Each circle (o) in the drawing represents a reference pixel used for intra prediction, i.e. a pixel of the reconstructed picture having the same display time as the current picture, as in FIG. 16. In intra prediction of Intra_16×16, peripheral pixels of the reconstructed image are directly set as reference pixels, and used for extrapolation in four directions shown in (b) in FIG. 17 to form the prediction signal.
Hereafter, an MB and a block encoded using the intra prediction signal are respectively referred to as an intra MB and an intra block, a block size of intra prediction is referred to as an intra prediction block size, and a direction of extrapolation is referred to as an intra prediction direction. The intra prediction block size and the intra prediction direction are prediction parameters related to intra prediction.
The inter-frame prediction signal is a prediction signal generated from an image of a reconstructed picture that has a different display time from the one the current picture has and is stored in the buffer 105. Hereafter, an MB and a block encoded using the inter-frame prediction signal are respectively referred to as an inter MB and an inter block. A block size of inter prediction (inter prediction block size) can be selected from, for example, 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4.
FIG. 18 is an explanatory diagram showing an example of inter-frame prediction using 16×16 block size. A motion vector MV=(mvx, mvy) shown in FIG. 18 is a prediction parameter of inter-frame prediction, which indicates the amount of parallel translation of an inter-frame prediction block (inter-frame prediction signal) of a reference picture relative to a block to be encoded. In AVC, prediction parameters of inter-frame prediction include not only an inter-frame prediction direction representing a direction of the reference picture of the inter-frame prediction signal relative to a picture to be encoded of the block to be encoded, but also a reference picture index for identifying the reference picture used for inter-frame prediction of the block to be encoded. This is because, in AVC, a plurality of reference pictures stored in the buffer 105 can be used for inter-frame prediction.
Inter-frame prediction is described in more detail in 8.4 Inter prediction process in NPL 1.
A picture encoded including only intra MBs is called an I picture. A picture encoded including not only intra MBs but also inter MBs is called a P picture. A picture encoded including inter MBs that use not only one reference picture but two reference pictures simultaneously for inter-frame prediction is called a B picture. In the B picture, inter-frame prediction in which the direction of the reference picture of the inter-frame prediction signal relative to the picture to be encoded of the block to be encoded is to the past is called forward prediction, inter-frame prediction in which the direction of the reference picture of the inter-frame prediction signal relative to the picture to be encoded of the block to be encoded is to the future is called backward prediction, and inter-frame prediction involving both the past and the future is called bidirectional prediction. The direction of inter-frame prediction (inter prediction direction) is a prediction parameter of inter-frame prediction.
The transformer/quantizer 102 frequency-transforms the image (prediction error image) from which the prediction signal has been subtracted.
The transformer/quantizer 102 further quantizes the frequency-transformed prediction error image (frequency transform coefficient), with a predetermined quantization step width Qs. Hereafter, the quantized frequency transform coefficient is referred to as a transform quantization value.
The entropy encoder 103 entropy-encodes the prediction parameters and the transform quantization value. The prediction parameters are information related to MB and block prediction, such as block type (intra prediction, inter prediction, and PCM), intra prediction block size, intra prediction direction, inter prediction block size, and motion vector mentioned above.
The inverse transformer/inverse quantizer 104 inverse-quantizes the transform quantization value, with the quantization step width Qs. The inverse transformer/inverse quantizer 104 further inverse-frequency-transforms the frequency transform coefficient obtained by the inverse quantization. The prediction signal is added to the reconstructed prediction error image obtained by the inverse frequency transform, and the result is supplied to the switch 122.
The multiplex data selector 109 monitors the amount of input data of the entropy encoder 103 corresponding to the MB to be encoded. In the case where the entropy encoder 103 is capable of entropy-encoding the input data within a processing time of the MB, the multiplex data selector 109 selects the output data of the entropy encoder 103, and causes the selected data to be supplied to the multiplexer 110 via the switch 121. The multiplex data selector 109 further selects the output data of the inverse transformer/inverse quantizer 104, and causes the selected data to be supplied to the buffer 105 via the switch 122.
In the case where the entropy encoder 103 is not capable of entropy-encoding the input data within the processing time of the MB, the multiplex data selector 109 selects the output data of the PCM encoder 107 obtained by PCM encoding the video of the MB, and causes the selected data to be supplied to the multiplexer 110 via the switch 121. The multiplex data selector 109 further selects the output data of the PCM decoder 108 obtained by PCM decoding the output data of the PCM encoder 107, and causes the selected data to be supplied to the buffer 105 via the switch 122.
The buffer 105 stores the reconstructed image supplied via the switch 122. The reconstructed image per frame is referred to as a reconstructed picture.
The multiplexer 110 multiplexes the output data of the entropy encoder 103 and the PCM encoder 107, and outputs the multiplexing result.
Based on the operation described above, the multiplexer 110 in the video encoding device generates the bitstream.