The High Efficiency Video Coding (HEVC) standard is developed under the joint video project of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations, and is especially with partnership known as the Joint Collaborative Team on Video Coding (JCT-VC). In HEVC, one slice is partitioned into multiple coding tree units (CTU). In main profile, the minimum and the maximum sizes of CTU are specified by the syntax elements in the sequence parameter set (SPS). The allowed CTU size can be 8×8, 16×16, 32×32, or 64×64. For each slice, the CTUs within the slice are processed according to a raster scan order.
The CTU is further partitioned into multiple coding units (CU) to adapt to various local characteristics. A quadtree, denoted as the coding tree, is used to partition the CTU into multiple CUs. Let CTU size be M×M, where M is one of the values of 64, 32, or 16. The CTU can be a single CU (i.e., no splitting) or can be split into four smaller units of equal sizes (i.e., M/2×M/2 each), which correspond to the nodes of the coding tree. If units are leaf nodes of the coding tree, the units become CUs. Otherwise, the quadtree splitting process can be iterated until the size for a node reaches a minimum allowed CU size as specified in the SPS (Sequence Parameter Set).
Furthermore, according to HEVC, each CU can be partitioned into one or more prediction units (PU). Coupled with the CU, the PU works as a basic representative block for sharing the prediction information. Inside each PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. A CU can be split into one, two or four PUs according to the PU splitting type. HEVC defines eight shapes for splitting a CU into PU, including 2N×2N, 2N×N, N×2N, N×N, 2N×nU, 2N×nD, nL×2N and nR×2N partition types. Unlike the CU, the PU may only be split once according to HEVC. The partitions shown in the second row correspond to asymmetric partitions, where the two partitioned parts have different sizes.
The HEVC coding comprises Inter prediction and Intra prediction. For Intra prediction, the generation of Intra prediction includes three parts: Intra smoothing filter, Intra prediction, and Intra gradient filter. At first, a smoothing operation is applied to the reference samples as a pre-processing step before calculating the prediction. This smoothing operation corresponds to applying an FIR-filter with filter weights [1, 2, 1]>>2, with low-pass characteristics to the samples belonging to the left column and the above row of the current TU (transform unit). The Intra prediction of each TU is produced with the reconstructed samples of neighboring TUs. The samples involved in Intra smoothing are indicated in FIG. 1, where block 100 corresponds to the current block, line 110 corresponds to a horizontal boundary and line 120 corresponds to a vertical boundary. Whether this smoothing operation is used is determined by the TU size and the Intra prediction mode. Second, the Intra prediction of current block is derived from neighboring reference samples with a certain Intra prediction mode, and the Intra prediction mode is selected from DC mode, planar mode, and 33 directional modes by encoder and signaled in the bitstream. Third, if the Intra prediction mode is DC, horizontal or vertical mode, Intra gradient filter is further applied to the samples at the left and top boundaries of the current TU.
Out of all 35 Intra prediction modes in HEVC, three modes are considered as most probable modes (MPM) for predicting the Intra prediction mode in current prediction block. For example, the Intra prediction modes used in the left prediction block and in the above prediction block can be used as candidates of the MPM set. In the case that the Intra prediction modes in two neighboring blocks are identical and both directional, or only one of the two neighboring blocks is available and coded in Intra prediction and at the same time this Intra prediction mode is directional, two neighboring directions immediately next to this direction is also used in MPM. DC mode and Planar mode are also considered in MPM set to fill the available spot in the MPM, especially if the above or top neighboring blocks are not available or not coded in Intra prediction, or the Intra prediction modes in neighboring blocks are not directional. If the Intra prediction mode for current prediction block is one of the modes in the MPM set, 1 or 2 bins is used to signal which one it is. Otherwise, it is not the same as any entry in the MPM set, it will be coded as a non-MPM mode. There are all-together 32 such non-MPM modes and a (5-bit) fixed length coding method is applied to signal this mode. The 33 directions are illustrated in FIG. 2. In FIG. 2, there are all together 33 directional modes, i.e., H, H+1, . . . , H+8, H−1, . . . , H−7, V, V+1, . . . , V+8, V−1, . . . , V−8. This system can be expanded to a general case, where horizontal and vertical modes are represented as H and V modes. For other directional modes, they can be represented either as H+k or V+k modes, where k=±1, ±2, etc. For example, if 65 directional modes are used as shown in FIG. 3, k can be range from ±1 to ±16.
In some recent development beyond HEVC, additional 32 directional modes are used in between the existing 33 directional modes, as shown in FIG. 3. In this case, there are a total 65 modes including directional modes as well as some non-directional modes.
In HEVC, once a directional mode is decided, along the prediction direction, all the pixels in the current block will use the same predictor value. If the predictor falls in between two reconstructed reference sample, a bi-linear filter will be used to calculate the predictor as a weighted average of the two neighboring pixels. For example, the predictor signal P can be derived according to P=[P1* a+P2*(32−α)]/32, where P1 and P2 are the two neighboring reconstructed samples, integer α is the distance from the predictor P to P2 with a range between 0 and 32, inclusively.
The concept of Intra gradient filter is to utilize the gradient information along the Intra prediction direction to improve the quality of Intra prediction. For the Intra prediction modes from vertical/horizontal directions (v/h) to vertical/horizontal +8 directions (v+8/h+8) as shown in FIG. 2, the left column/the above row neighboring samples can locate their corresponding references along the Intra prediction direction from the above row/the left column. The gradient calculated with the neighboring samples can be used to improve the Intra prediction. An example for the vertical directional mode is illustrated in FIG. 4A, where Pij denotes the predictor at row i and column j. AL represents the reconstructed sample at the left-above corner of the current block, while Li represents the reconstructed sample in the left column of the current block. A new predictor is calculated asP′ij=Pij+α·(Li−AL),   (1)where α is a fraction from 0 to 1 and is selected according to j, such as α=½ for j=0, and α=¼ for j=1. P′ij is used as the final predictor. As for the horizontal directional mode, the final predictor P′ij is calculated asP′ij=Pij+α·(Aj−AL),   (2)
In the above equation, Aj is the reconstructed sample in the above row, which is shown in FIG. 4A. As for the directional modes v+1, . . . , v+8 and h+1, . . . , h+8, Li or Aj first obtains its corresponding reference sample RLi or RAj along the direction of Intra prediction. When RLi or RAj is not located at the position of integer pixel, they are produced by interpolation of integer pixels in the above row or the left column of the current block. The example of v+1, . . . , v+8 directional modes is shown in FIG. 4B. The final predictor P′ij is calculated asP′ij=Pij+α·(Li−RLi).   (3)
Similar to the vertical directional mode, α is a fraction from 0 to 1 and is selected according to the direction of Intra prediction and j. As for h+1, . . . , h+8 directional modes, the final predictor P′ij is calculated asP′ij=Pij+α·(Aj−RAj),   (4)where α is a fraction from 0 to 1 and is selected according to the direction of Intra prediction and i.
The Intra gradient filter can be applied for all directional modes, i.e., v+1, . . . , v+8 and h+1, . . . , h+8 in HEVC. However, only when the Intra prediction mode is DC, horizontal or vertical mode, Intra gradient filter is used. If the Intra prediction is DC mode, the samples at the first row and first column are filtered by Intra gradient filter. If the Intra prediction is horizontal mode, then the samples at the first row are filtered by Intra gradient filter. If the Intra prediction is vertical mode, then the samples at the first column are further filtered by Intra gradient filter.
Besides Intra gradient filtering, another method called bi-directional Intra prediction is also proposed to improve the quality of Intra prediction in JCT-VC meetings. For diagonal Intra prediction modes, i.e. v+1, . . . , v+8 and h+1, . . . , h+8, a weighted sum of the reconstructed samples of the above row and the reconstructed samples from the left column along the direction is used as the Intra predictor. For example, for v+1, . . . , v+8 directional modes, as illustrated in FIG. 5, Pij from the neighboring samples of above row has a corresponding reference sample Fij in the left column along the prediction direction. If Fij is not located at the integer pixel position, it can be generated by interpolating integer pixels in the left column. The final predictor P′ij is then calculated as the weighted sum of Pij and Fij asP′ij=α·Pij+(1−α)·Fij   (5)where α is a fraction from 0 to 1 and is selected according to the direction of Intra prediction together with j (for v+1, . . . , v+8 directional modes) or i (for h+1, . . . , h+8 directional modes).
After generating Intra predictors, the prediction error is further processed by transform and quantization and encoded by entropy coding. For entropy coding, the quantized coefficients are divided into multiple 4×4 coefficient groups, first. The coding order of different coefficient groups and scan order of coefficients in one coefficient group is selected according to Intra prediction mode and transform size. If the transform size is smaller than or equal to 8×8, Intra-mode-dependent scan will be used for the coding order of different coefficient groups and scan order of coefficients in one coefficient group. Otherwise, diagonal scan is used for the coding order of different coefficient groups and scan order of coefficients in one coefficient group.
Also, it is possible to have weighted sum of several predictors to generate the final prediction signal for Intra prediction (namely multiple parameter Intra prediction or MPI). The final predictor PMPI[i, j] of position (i, j) is defined as follows:PMPI[i, j]=(αPHEVC[i, j]+βPMPI[i−1, j]+γPMPI[i, j−1]+δPMPI[i−1, j−1]+4)>>3,where outside of the block PMPI[i, j] is equal to reconstructed signal as shown in FIG. 6,PMPI[i, j]=REC[i, j], if i<0∥j<0.
FIG. 6 illustrates an example of multiple parameter Intra prediction (MPI) process, where an input block is processed by Arbitrary Directional Intra (ADI) 610 followed by MPI 620. The strength of this post-processing (i.e., parameters α+β+γ+δ=8) is controlled on the CU level and signaled with up to 2 bits.
In an ITU-I contribution C1046 (A. Said, etc., “Position dependent Intra prediction combination,” ITU-T SG16 COM 16-C1046-E, October 2015), a method is proposed to use a combination of filtered and unfiltered reference samples to form the final predictor p[x, y] as shown in FIG. 7 for unfiltered (710) and filtered (720) cases.
Signals r and s are used to represent the sequences with filtered and unfiltered references. The new prediction p[x, y] combines weighted values of boundary elements r[ ] with q[x, y] (i.e., predictor derived from filtered samples s[ ]) as following:p[x, y]={(c1(v)>>└y/d┘)r[x, −1]−(c2(v)>>└y/d┘)r[−1, −1]+(c1(h)>>└x/d┘)r[−1, y]−(c2(h)>>└x/d┘)r[−1, −1]+b[x, y]q[x, y]+64}>>7where c1v, c2v, c1h, c2h are stored prediction parameters, d=1 for block sizes up to 16×16, and d=2 for larger blocks, andb[x, y]=128−(c1(v)>>└y/d┘)+(c2(v)>>└y/d┘)−(c1(v)>>└y/d┘)+(c2(h)>>└y/d┘),is a normalization factor.
In the contribution JVET-C-0061 (X. Xiu, etc., “Decoder-side Intra mode derivation”, JVET-C0061, May, 2016), interpolation for Intra prediction using the planar mode is disclosed. According to JVET-C-0061, the sample at bottom-right corner current prediction block is either signaled or estimated using linear average of corresponding left reference sample and above reference sample). Accordingly, samples in the right most column and bottom row are bi-linearly interpolated using the top/bottom-right sample combination and the left/bottom-right sample combination (810) as shown in FIG. 8. The remaining pixels in the prediction block are predicted using similar bi-linear interpolation (820), as is shown in FIG. 8.
Template Based Intra Prediction
In the contribution JVET-C-0061, a decoder side Intra prediction mode derivation method is proposed, where the neighboring reconstructed samples of the current block are used as a template. Reconstructed pixels in the template are compared with the predicted pixels in the same corresponding positions. The predicted pixels are generated using the reference pixels, which are the neighboring reconstructed pixels around the template. For each of the possible Intra prediction modes, the encoder and decoder try to generate predicted pixels in the similar way as in HEVC for the positions in the template. The distortion between the predicted pixels and the reconstructed pixels in the template are compared and the recorded. The Intra prediction mode with minimum distortion is selected as the derived Intra prediction mode. During the template matching search, the available Intra prediction mode is increased to 129 (from 67) and the interpolation filter for reference sample is increased to 1/64-pel (from 1/32-pel). FIG. 9 illustrates an example of decoder side Intra mode derivation (DIMD), where L is the width and height of the template for both the pixels on the top of current block and to the left of current block (i.e., the Target block shown in FIG. 9).
Quadtree Plus Binary Tree (QTBT) Structure
In contribution m37524/COM16-C966 (J. An, et al., “Block partitioning structure for next generation video coding,” MPEG doc. m37524 and ITU-T SG16 Doc. COM16-C966, October 2015), a quadtree plus binary tree (QTBT) block partitioning structure is disclosed. According to QTBT, a coding tree block (CTB) is firstly partitioned by a quadtree structure. The quadtree leaf nodes are further partitioned by a binary tree structure. The binary tree leaf nodes, namely coding blocks (CBs), are used for prediction and transform without any further partitioning. For P and B slices, the luma and chroma CTBs in one coding tree unit (CTU) share the same QTBT structure. For I slice, the luma CTB is partitioned into CBs by a QTBT structure, and two chroma CTBs are partitioned into chroma CBs by another QTBT structure.
A CTU (or CTB for I slice), which is the root node of a quadtree, is firstly partitioned by a quadtree, where the quadtree splitting of one node can be iterated until the node reaches the minimum allowed quadtree leaf node size (MinQTSize). If the quadtree leaf node size is not larger than the maximum allowed binary tree root node size (MaxBTSize), it can be further partitioned by a binary tree. The binary tree splitting of one node can be iterated until the node reaches the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). The binary tree leaf node, namely CU (or CB for I slice), will be used for prediction (e.g. Intra-picture or inter-picture prediction) and transform without any further partitioning. There are two splitting types in the binary tree splitting: symmetric horizontal splitting and symmetric vertical splitting.
Block partitioning 1010 and corresponding QTBT structure 1020 of FIG. 10 illustrates an example of block partitioning by using QTBT. The solid lines indicate quadtree splitting and dotted lines indicate binary tree splitting. In each splitting (i.e., non-leaf) node of the binary tree, one flag is signaled to indicate which splitting type (i.e., horizontal or vertical) is used, where 0 indicates horizontal splitting and 1 indicates vertical splitting. For the quadtree splitting, there is no need to indicate the splitting type since it always splits a block horizontally and vertically into 4 sub-blocks of equal size.
In the above disclosure, JVET (joint video exploration team) refers to an international organization that has been established by both ITU-T VCEG and ISO/IEC MPEG to study the next generation video coding technologies. Reference software called JEM (joint exploration model) is built based on HEVC's reference software (HM). Some new video coding methods, including QTBT and 65 Intra prediction directions, are included in JEM software
In order to reduce the complexity and/or increase the coding efficiency associated with DIMD, various techniques are disclosed.