HEVC (High Efficiency Video Coding) has been proposed as a video encoding method using intra prediction, inter prediction, and residual transform (e.g., see Non-patent reference 1).
[Configuration and Operations of Video Encoding Device MM]
FIG. 15 is a block diagram of a video encoding device MM according to a conventional example that codes a video image using the aforementioned video encoding method. The video encoding device MM includes an inter prediction unit 10, an intra prediction unit 20, a transform and quantization unit 30, an entropy encoding unit 40, an inverse quantization and inverse transform unit 50, an in-loop filtering unit 60, a first buffer unit 70, and a second buffer unit 80.
An input image a and a later-described local decoded image g that is supplied from the first buffer unit 70 are input to the inter prediction unit 10. This inter prediction unit 10 performs inter prediction (inter-frame prediction) using the input image a and the local decoded image g to generate and output an inter prediction image b.
The input image a and a later-described local decoded image f that is supplied from the second buffer unit 80 are input to the intra prediction unit 20. This intra prediction unit 20 performs intra prediction (intra-frame prediction) using the input image a and the local decoded image f to generate and output an intra prediction image c.
An error (residual) signal between the input image a and the inter prediction image b or the intra predication image c is input to the transform and quantization unit 30. This transform and quantization unit 30 transforms and quantizes the input residual signal to generate and output a quantization coefficient d.
The quantization coefficient d and side information (not shown) are input to the entropy encoding unit 40. This entropy encoding unit 40 performs entropy encoding on the input signal and outputs the entropy-coded signal as a bitstream z.
The quantization coefficient d is input to the inverse quantization and inverse transform unit 50. This inverse quantization and inverse transform unit 50 performs inverse quantization and inverse transform on the quantization coefficient d to generate and output an inverse-transformed residual signal e.
The second buffer unit 80 accumulates the local decoded image f and supplies the accumulated local decoded image f to the intra prediction unit 20 and the in-loop filtering unit 60 as appropriate. The local decoded image f is a signal obtained by adding up the inter prediction image b or the intra prediction image c and the inverse-transformed residual signal e.
The local decoded image f is input to the in-loop filtering unit 60. This in-loop filtering unit 60 applies a filter such as a deblocking filter to the local decoded image f to generate and output the local decoded image g.
The first buffer unit 70 accumulates the local decoded image g and supplies the accumulated local decoded image g to the inter prediction unit 10 as appropriate.
[Configuration and Operations of Video Decoding Device NN]
FIG. 16 is a block diagram of a video decoding device NN according to a conventional example that decodes a video image from the bitstream z generated by the video encoding device MM. The video decoding device NN includes an entropy decoding unit 110, an inverse transform and inverse quantization unit 120, an inter prediction unit 130, an intra prediction unit 140, an in-loop filter 150, a first buffer unit 160, and a second buffer unit 170.
The bitstream z is input to the entropy decoding unit 110. This entropy decoding unit 110 performs entropy decoding on the bitstream z, and generates and outputs a quantization coefficient B.
The inverse transform and inverse quantization unit 120, the inter prediction unit 130, the intra prediction unit 140, the in-loop filtering unit 150, the first buffer unit 160, and the second buffer unit 170 respectively operate similarly to the inverse quantization and inverse transform unit 50, the inter prediction unit 10, the intra prediction unit 20, the in-loop filtering unit 60, the first buffer unit 70, and the second buffer unit 80 shown in FIG. 15.
(Details of Intra Prediction)
The aforementioned intra prediction will be described below in detail. Regarding intra prediction, Non-patent reference 1 indicates that pixel values in an encoding target block are predicted for each color component, using pixel values of reference pixels, which are reconstructed pixels that have already been encoded. Also, as luma component prediction methods, a total of 35 modes, namely DC, Planar, and direction prediction for 33 directions are indicated as shown in FIG. 17. As chroma component prediction methods, a method using the same prediction classification as that for a luma component, as well as DC, Planar, horizontal, and vertical methods that are independent from those for the luma component are indicated. With the above configuration, spatial redundancy can be reduced for each color component.
Non-patent reference 2 describes an LM mode as a technique for reducing redundancy among color components. For example, a case of using the LM mode for an image of a YUV420 format will now be described using FIG. 18.
FIG. 18A shows chroma component pixels, and FIG. 18B shows luma component pixels. In the LM mode, the chroma component is linearly predicted using the luma component that have been reconstructed in the pixels denoted by 16 white circles in FIG. 18B and a prediction equation indicated as Equation (1) below.[Equation 1]predc[x,y]=α×((PL[2x,2y]+PL[2x,2y+1])>>1)+β  (1)
In Equation (1), PL denotes a pixel value of the luma component, and predc denotes a predictive pixel value of the chroma component. α and β respectively indicate parameters that can be obtained using reference pixels denoted by 8 black circles in FIG. 18A and 8 black circles in FIG. 18B, and are determined by Equations (2) and (3) below.
                              [                      Equation            ⁢                                                  ⁢            2                    ]                ⁢                                  ⁢                  α          =                                    R              ⁡                              (                                                                            P                      ^                                        L                                    ,                                      P                    C                    ′                                                  )                                                    R              ⁡                              (                                                                            P                      ^                                        L                                    ,                                                            P                      ^                                        L                                                  )                                                                        (        2        )                                          [                      Equation            ⁢                                                  ⁢            3                    ]                ⁢                                  ⁢                  β          =                                    M              ⁡                              (                                  P                  C                  ′                                )                                      -                          α              ×                              M                ⁡                                  (                                                            P                      ^                                        L                                    )                                                                                        (        3        )            
R in Equation (2) denotes inner product calculation, M in Equation (3) denotes averaging calculation, and P′c in Equations (2) and (3) denotes a pixel value of a reference pixel of the chroma component. P^L denotes a pixel value of the luma component obtained while considering phases of the luma and the chroma, and is determined by Equation (4) below.[Equation 4]{circumflex over (P)}L[x,y]=(PL[2x,2y]+PL[2x,2y+1])>>1  (4)
Note that the phase of the reference pixel in the upper part remains shifted in order to reduce memory access. The chroma prediction is performed for each smallest processing block, which is called a TU (Transform Unit).
In the case of extending the LM mode for an image of the aforementioned YUV420 format and using the extended LM mode for an image of a YUV422 format, the number of reference pixels in the vertical direction increases as shown in FIG. 19.
FIG. 20 is a block diagram of the intra prediction units 20 and 140 that perform intra prediction using the aforementioned LM mode. The intra prediction units 20 and 140 each include a luma reference pixel acquisition unit 21, a chroma reference pixel acquisition unit 22, a prediction coefficient derivation unit 23, and a chroma linear prediction unit 24.
The luma component of the local decoded image f is input to the luma reference pixel acquisition unit 21. This luma reference pixel acquisition unit 21 acquires a pixel value of each reference pixel existing neighboring a luma block corresponding to a chroma prediction target block, adjusts the phase of the acquired pixel value, and outputs the phase-adjusted pixel value as a luma reference pixel value h.
The chroma component of the local decoded image f is input to the chroma reference pixel acquisition unit 22. This chroma reference pixel acquisition unit 22 acquires a pixel value of each reference pixel existing neighboring the chroma prediction target block, and outputs the acquired pixel value as a chroma reference pixel value i.
The luma reference pixel value h and the chroma reference pixel value i are input to the prediction coefficient derivation unit 23. This prediction coefficient derivation unit 23 obtains the parameters α and β from Equations (2) to (4) above using these input pixel values, and outputs the parameters α and β as prediction coefficients j.
The luma component of the local decoded image f and the prediction coefficients j are input to the chroma linear prediction unit 24. This chroma linear prediction unit 24 obtains a predictive pixel value of the chroma component by Equation (1) above using these input signals, and outputs the obtained predictive pixel value as a chroma predictive pixel value k.
Incidentally, the available memory capacity is increasing with the advance of the semiconductor technology. However, with the increase of the memory capacity, the granularity of memory access becomes large. Meanwhile, the memory bandwidth has not been significantly widened compared with the increase of the memory capacity. Since the memory is used in encoding and decoding of video images, the granularity of the memory access and the memory bandwidth have been bottlenecks.
In addition, manufacturing costs and power consumption of a memory that is closest to a calculation core (e.g., an SRAM) are higher than those of an external memory (e.g., a DRAM). For this reason, it is favorable that the memory capacity of the memory closest to the calculation core can be reduced as much as possible. However, since video images need to be able to be decoded even with the worst value provided in the specifications, the memory closest to the calculation core needs to be able to satisfy memory requirements at the worst value, rather than average memory requirements (granularity, size, and number etc.).
In the LM mode, since the parameters are derived for each TU as mentioned above, the number of reference pixels increases, and the number of times of calculation and the number of times of memory access become large.
For example, the number of times of calculation and the number of reference pixels for deriving the parameters in the case of using the LM mode for an image of the YUV420 format will be considered below. The size of an LCU (Largest Encoding unit), which is the largest processing block, is provided as 64×64 in the main profile in Non-patent reference 1, and the size of the smallest CU, which is the smallest processing block, is 4×4. In addition, since the number of chroma pixels in the YUV420 format is ¼, the smallest calculation block of the luma component is 8×8. For this reason, the number of times of calculation for deriving the parameters is (64/8)2=64 times, and the number of reference pixels is 28×64.
Non-patent reference 2 describes a technique for deriving the parameters for each CU (Encoding unit) in order to reduce the worst value of the number of times of calculation for deriving the parameters for an image of a non-YUV420 format. FIG. 21 shows the number of times of calculation and the number of reference pixels in the case of deriving the parameters for each TU and in the case of deriving the parameters for each CU.
As described above, redundancy among color components can be reduced in the LM mode. However, when considering the CTU units, a problem arises in that the number of reference pixels of the worst value used when deriving the parameters is large.
Non-patent reference 3 describes a technique for reducing the number of reference pixels in the LM mode for an image of a non-YUV420 format.
For example, a case of applying the technique in Non-patent reference 3 to an image of a YUV422 format will be described using FIG. 22. In this case, the number of reference pixels neighboring the long side of a prediction target block is half compared with the case shown in FIG. 19. For this reason, as shown in FIG. 24, both the number of luma reference pixels and the number of chroma reference pixels are 8 pixels.
Also, a case of applying the technique in Non-patent reference 3 to an image of a YUV444 format will be described using FIG. 23. In this case as well, the number of reference pixels neighboring the long side of a prediction target block is half. For this reason, as shown in FIG. 24, both the number of luma reference pixels and the number of chroma reference pixels are 8 pixels.