As a video coding method using intra prediction, inter prediction, and residual transform, HEVC (High Efficiency Video Coding) has been proposed (see Non-patent document 1, for example).
[Configuration and Operation of Video Encoding Apparatus MM]
FIG. 11 is a block diagram showing a video encoding apparatus MM according to a conventional example configured to encode a video using the aforementioned video coding method. The video encoding apparatus MM includes an inter prediction unit 10, an intra prediction unit 20, a transform/quantization unit 30, an entropy encoding unit 40, an inverse quantization/inverse transform unit 50, an in-loop filtering unit 60, a first buffer unit 70, and a second buffer unit 80.
The inter prediction unit 10 receives, as its input data, an input video a and a local decoded image g supplied from the first buffer unit 70 as described later. The inter prediction unit 10 performs inter prediction (inter-frame prediction) based on the input video a and the local decoded image g so as to generate and output an inter predicted image b.
The intra prediction unit 20 receives, as its input data, the input video a and a local decoded image f supplied from the second buffer unit 80 as described later. The intra prediction unit 20 performs intra prediction (intra-frame prediction) based on the input video a and the local decoded image f so as to generate and output an intra predicted image c.
The transform/quantization unit 30 receives, as its input data, the input video a and an error (residual) signal which represents a difference between the input video a and the inter predicted image b or otherwise the intra predicted image c. The transform/quantization unit 30 transforms and quantizes the residual signal thus input so as to generate and output a quantized coefficient d.
The entropy encoding unit 40 receives, as its input data, the quantized coefficient d and unshown side information. The entropy encoding unit 40 performs entropy encoding of the input signal, and outputs the signal thus entropy encoded as a bit stream z.
The inverse quantization/inverse transform unit 50 receives the quantized coefficient d as its input data. The inverse quantization/inverse transform unit 50 performs inverse quantization processing and inverse transform processing on the quantized coefficient d so as to generate and output a residual signal e thus inverse transformed.
The second buffer unit 80 stores the local decoded image f, and supplies the local decoded image f thus stored to the intra prediction unit 20 and the in-loop filtering unit 60 at an appropriate timing. The local decoded image f is configured as a signal obtained by making the sum of the residual signal e thus inverse transformed and the inter predicted image or otherwise the intra predicted image c.
The in-loop filtering unit 60 receives the local decoded image f as its input data. The in-loop filtering unit 60 applies filtering such as deblock filtering or the like to the local decoded image f so as to generate and output a local decoded image g.
The first buffer unit 70 stores the local decoded image g, and supplies the local decoded image g thus stored to the inter prediction unit 10 at an appropriate timing.
[Configuration and Operation of Video Decoding Apparatus NN]
FIG. 12 is a block diagram showing a video decoding apparatus NN according to a conventional example, configured to decode a video based on the bit stream z generated by the video encoding apparatus MM. The video decoding apparatus NN comprises an entropy decoding unit 110, an inverse transform/inverse quantization unit 120, an inter prediction unit 130, an intra prediction unit 140, an in-loop filtering unit 150, a first buffer unit 160, and a second buffer unit 170.
The entropy decoding unit 110 receives the bit stream z as its input data. The entropy decoding unit 110 performs entropy decoding of the bit stream z so as to generate and output a quantized coefficient B.
The inverse transform/inverse quantization unit 120, the inter prediction unit 130, the intra prediction unit 140, the in-loop filtering unit 150, the first buffer unit 160, and the second buffer unit 170 respectively operate in the same manner as the inverse quantization/inverse transform unit 50, the inter prediction unit 10, the intra prediction unit 20, the in-loop filtering unit 60, the first buffer unit 70, and the second buffer unit 80.
[Detailed Description of Intra Prediction]
Detailed description will be made below regarding the aforementioned intra prediction. Intra prediction is described in Non-patent document 1 in which each pixel value is predicted for an encoding target block for each color component using the pixel values of reference pixels each configured as an encoded and reconstructed pixel. Also, as a prediction method for the luminance component, a total of 34 kinds of prediction methods are described in Non-patent document 1, including 32 directional prediction methods in addition to the DC prediction method and planar prediction method. Moreover, as a prediction method for the chrominance component, a method is described in Non-patent document 1 employing the same set of prediction methods as that used to predict the luminance component. Furthermore, another method is described in Non-patent document 1 employing a set of prediction methods that differs from that used to predict the luminance component, i.e., a set of the DC prediction method, planer prediction method, horizontal prediction method, and vertical prediction method. Such an arrangement is capable of reducing spatial redundancy for each color component.
Also, the LM mode is described in Non-patent document 2, which is configured as a method for reducing redundancy between the color components. For example, description will be made with reference to FIG. 13 regarding an arrangement in which the LM mode is applied to an image in the YUV420 format.
FIG. 13A shows the pixels of the chrominance component. FIG. 13B shows the pixels of the luminance component. In the LM mode, the chrominance component is calculated by linear prediction based on a prediction expression represented by the following Expression (1) using the reconstructed luminance components of the 16 pixels indicated by the open circles shown in FIG. 13B.predc[x,y]=α×((PL[2x,2y]+PL[2x,2y+1])>>1)+β  [Expression 1]
In Expression (1), PL represents the pixel value of the luminance component, and predc represents the predicted pixel value of the chrominance component. Also, α and β each represent a parameter that can be calculated using eight reference pixels indicated by solid circles shown in FIG. 13A and eight reference pixels indicated by solid circles shown in FIG. 13B. Specifically, the parameters α and β are represented by the following Expressions (2) and (3), respectively.
                    α        =                              R            ⁡                          (                                                                    P                    ^                                    L                                ,                                                      P                                                                                                  C                  ′                                            )                                            R            ⁡                          (                                                                    P                    ^                                                        L                    ,                                                  ⁢                                                      P                    ^                                    L                                            )                                                          [                  Expression          ⁢                                          ⁢          2                ]                                β        =                              M            ⁡                          (                              P                C                ′                            )                                -                      a            ×                          M              ⁡                              (                                                      P                    ^                                    L                                )                                                                        [                  Expression          ⁢                                                            ⁢                                                          ⁢          3                ]            
In Expressions (2) and (3), P′c represents the pixel value of the reference pixel of the chrominance component. Also, P^L, represents the pixel value of the luminance component calculated giving consideration to the phase of the luminance component and the phase of the chrominance component. Specifically, P^L, is represented by the following Expression (4).{circumflex over (P)}L[x,y]=(PL[2x,2y]+PL[2x,2y+1])>>1  [Expression 4]
It should be noted that, in order to reduce memory access, the calculation is performed for the reference pixels in an upper region without correcting the phase difference. Also, the chrominance prediction is performed for each smallest processing block, which is referred to as the “TU (Transform Unit)”.
In a case in which the LM mode applied to an image in the YUV420 format as described above is extended such that it is applied to an image in the YUV422 format, the number of reference pixels is increased in the vertical direction as shown in FIG. 14.
FIG. 15 is a block diagram showing the intra prediction units 20 and 140 configured to perform the intra prediction using the aforementioned LM mode. The intra prediction units 20 and 140 each include a luminance reference pixel acquisition unit 21, a chrominance reference pixel acquisition unit 22, a prediction coefficient derivation unit 23, and a chrominance linear prediction unit 24.
The luminance reference pixel acquisition unit 21 receives the luminance component of the local decoded image f as its input data. The luminance reference pixel acquisition unit 21 acquires the pixel values of the reference pixels located neighboring a luminance block that corresponds to a color reference prediction target block, adjusts the phases of the reference pixel values, and outputs the pixel values thus adjusted as luminance reference pixel values h.
The chrominance reference pixel acquisition unit 22 receives the chrominance component of the local decoded image f as its input data. The chrominance reference pixel acquisition unit 22 acquires the pixel values of the reference pixels located neighboring the chrominance prediction target block, and outputs the pixel values thus acquired as chrominance reference pixel values i.
The prediction coefficient derivation unit 23 receives, as its input data, the luminance reference pixel values h and the chrominance reference pixel values i. The prediction coefficient derivation unit 23 calculates the parameters α and β based on the aforementioned Expressions (2) through (4) using the pixel values thus input so as to output a prediction coefficient j.
The chrominance linear prediction unit 24 receives, as its input data, the luminance component of the local decoded image f and the prediction coefficient j. The chrominance linear prediction unit 24 calculates a predicted pixel value of the color component based on the aforementioned Expression (1) using the signals thus input, and outputs the predicted pixel value as a chrominance predicted pixel value k.
The usable memory capacity has been increasing accompanying progress in semiconductor techniques. However, as the memory capacity is increased, memory access granularity becomes greater. On the other hand, there has been a relatively small improvement in memory bandwidth as compared with the improvement in memory capacity. A video is encoded and decoded using memory. Thus, memory access granularity and memory bandwidth become a bottleneck in an encoding/decoding operation for a video.
Also, memory (e.g., SRAM) that is closest to a calculation core requires high manufacturing costs and large power consumption as compared with external memory (e.g., DRAM). Thus, such memory that is closest to a calculation core is preferably configured to have as small a memory capacity as possible. However, even if a video is provided in a worst-case condition designed in the specification, such an arrangement is required to be capable of encoding and decoding the video. That is to say, the memory that is closest to a calculation core must satisfy a memory requirement (memory access granularity, size, number of memory units, etc.) in a worst-case condition, instead of a memory requirement in an average-case condition.
In the LM mode, as described above, parameter derivation is performed for each TU. This leads to an increased number of reference pixels, resulting in an increased number of times of calculation and an increased number of times of memory access.
Investigation will be made below regarding the number of times of calculation and the number of reference pixels required to perform the parameter derivation in a case in which the LM mode is applied to an image in the YUV420 format, for example. The block size of the LCU (Largest Coding Unit), which is the largest processing block, is defined as (64×64) or less in the main profile in Non-patent document 1. On the other hand, a smallest CU, which is a smallest processing block, has a block size of (4×4). Also, in the YUV420 format, the number of pixels of the chrominance component is ¼ that of the luminance component. Accordingly, a smallest calculation block for the luminance component has a block size of (8×8). Thus, the number of times of calculation required for the parameter derivation is represented by (64+8)2=64. The number of reference pixels is represented by (28×64).
In order to reduce the number of times of calculation in a worst-case condition required for the parameter derivation with respect to images in formats different from the YUV420 format, a method is described in Non-patent document 2 in which the parameter derivation is performed for each CU (Coding Unit). FIG. 16 shows the number of times of calculation and the number of reference pixels required for each of a case in which the parameter derivation is performed for each TU and a case in which the parameter deviation is performed for each CU.