The High Efficiency Video Coding (HEVC) standard is developed under the joint video project of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations, and is especially with partnership known as the Joint Collaborative Team on Video Coding (JCT-VC).
In HEVC, one slice is partitioned into multiple coding tree units (CTU). In main profile, the minimum and the maximum sizes of CTU are specified by the syntax elements in the Sequence Parameter Set (SPS). The allowed CTU size can be 8×8, 16×16, 32×32, or 64×64. For each slice, the CTUs within the slice are processed according to a raster scan order.
The CTU is further partitioned into multiple coding units (CU) to adapt to various local characteristics. A quadtree, denoted as the coding tree, is used to partition the CTU into multiple CUs. Let CTU size be M×M, where M is one of the values of 64, 32, or 16. The CTU can be a single CU or can be split into four smaller units of equal sizes (i.e., M/2×M/2), which are nodes of coding tree. If units are leaf nodes of coding tree, the units become CUs. Otherwise, the quadtree splitting process can be iterated until the size for a node reaches a minimum allowed CU size as specified in the SPS. This representation results in a recursive structure as specified by a coding tree (also referred to as a partition tree structure).
One or more prediction units (PU) are specified for each CU. Coupled with the CU, the PU works as a basic representative block for sharing the prediction information. Inside each PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. A CU can be split into one, two or four PUs according to the PU splitting type.
After obtaining the residual block by applying the prediction process based on the PU splitting type, a CU can be partitioned into transform units (TUs) according to another quadtree structure, which is analogous to the coding tree for the CU. The TU is a basic representative block of residual or transform coefficients for applying the integer transform and quantization. For each TU, one integer transform with the same size is applied to the TU to obtain residual coefficients. These coefficients are transmitted to the decoder after quantization on a TU basis.
The terms, coding tree block (CTB), coding block (CB), prediction block (PB), and transform block (TB) are defined to specify the 2-D sample array of one color component associated with CTU, CU, PU, and TU, respectively. Thus, a CTU consists of one luma CTB, two chroma CTBs, and associated syntax elements. A similar relationship is valid for CU, PU, and TU. The tree partitioning is generally applied simultaneously to both luma and chroma, although exceptions apply when certain minimum sizes are reached for chroma.
In the next generation video coding, a method to combine the quadtree and binary tree structure has been adopted in JVET-E1001 (Chen et al., “Algorithm Description of Joint Exploration Test Model 5 (JEM 5)”, Joint Collaborative Team on Video Coding of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 5th Meeting: Geneva, CH, 12-20 January, 2017, Document: JVET-E1001), which is called quadtree plus binary tree (QTBT) structure.
The disclosed tree structure can be applied separately to luma and chroma for the I-slice (i.e., Intra coded slice) and applied simultaneously to both luma and chroma (except when certain minimum sizes are reached for chroma) for the P- and B-slice. In other words, in the I-slice, the luma CTB has its QTBT-structured block partitioning, and the two chroma CTBs have another QTBT-structured block partitioning. The two chroma CTBs may also have their own QTBT-structured block partitioning.
In HEVC, for each TU, one integer transform having the same size to the TU is applied to obtain residual coefficients. These coefficients are transmitted to the decoder after quantization on a TU basis. HEVC adopts Discrete Cosine Transform Type II (DCT-II) as its core transform because it has a strong “energy compaction” property. Most of the signal information tends to be concentrated in a few low-frequency components of the DCT-II, which approximates the Karhunen-Loève Transform (KLT). As is known in the field of data compression, KLT is optimal in the decorrelation sense for signals based on certain limits of Markov processes. The N-point DCT-II of the signal f[n] is defined as (1).
                                                                        f                ^                                            DCT                -                II                                      ⁡                          [              k              ]                                =                                    λ              k                        ⁢                          2                              N                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                f                  ⁡                                      [                    n                    ]                                                  ⁢                                  cos                  ⁡                                      [                                                                                            k                          ⁢                                                                                                          ⁢                          π                                                N                                            ⁢                                              (                                                  n                          +                                                      1                            2                                                                          )                                                              ]                                                                                      ,                                  ⁢                  k          =          0                ,        1        ,        2        ,        …        ⁢                                  ,                  N          -          1                ,                              λ            k                    =                      {                                                                                                      2                                              -                        0.5                                                              ,                                                                                        k                    =                    0                                                                                                                    1                    ,                                                                                        k                    ≠                    0                                                                                                          (        1        )            
In the Joint Exploration Test Model 5 (JEM 5), large block-size transforms, up to 128×128 in size, are enabled to improve coding efficiency especially for higher resolution video (e.g., 1080p and 4K sequences). In addition to DCT-II and 4×4 DST-VII (Discrete Sine Transform Type VII), which have been employed in HEVC, an Adaptive Multiple Transform (AMT) scheme is used for residual coding for both Inter and Intra coded blocks. For Intra-predicted residue, other transforms may be more efficient than DCT-II. Accordingly, in JVET-E1001 (i.e., JEM 5), an Enhanced Multiple Transform (EMT) scheme is used for residual coding for both Intra and Inter-coded blocks. In the literature, the EMT may also be referred as Adaptive Multiple Transform (AMT). In this disclosure, the terms AMT and EMT are used interchangeably. It utilizes multiple selected transforms from the DCT/DST families other than the current transforms in HEVC. The newly introduced transform matrices are DST-VII, DCT-VIII, DST-I and DCT-V. Table 1 summarizes the transform basis functions of each transform for N-point input.
TABLE 1Transform basis functions for N-point inputTransform TypeBasis function Ti(f), i, j=0, 1, . . . , N-1DCT-II            T      i        ⁡          (      j      )        =            ω              0        ⁢                                        ·                  2        N              ·                  ⁢          cos      ⁡              (                                          ⁢                                                  ·                                                  ⁢            i            ⁢                                                  ·                          (                                                2                  ⁢                  j                                +                1                            )                                            2            ⁢            N                          )                   where    ⁢                  ⁢          ω      0        =      {                                                      2              N                                                            i            ⁢                                                  =                                                  ⁢            0                                                1                                      i            ⁢                                                  ≠            0                               DCT-V                    T        i            ⁡              (        j        )              =                  ω                  0          ⁢                                                    ·              ω                  1          ⁢                                                    ·                        2                                    2              ⁢              N                        -            1                              ·                          ⁢              cos        ⁡                  (                                    2              ⁢                                              ⁢                                                                  ·                                                                  ⁢                i                ⁢                                                                  ·                j                                                                    2                ⁢                N                            -              1                                )                      ,       where    ⁢                  ⁢          ω      0        =      {                                                                      2                N                                                                        i              ⁢                                                          =                                                          ⁢              0                                                            1                                              i              ⁢                                                          ≠              0                                          ,                          ⁢                        ω          1                =                  {                                                                                          2                    N                                                                                                j                  ⁢                                                                          =                                                                          ⁢                  0                                                                                    1                                                              j                  ⁢                                                                          ≠                  0                                                                         DCT-VIII            T      i        ⁡          (      j      )        =                    4                              2            ⁢            N                    +          1                      ·                  ⁢          cos      ⁡              (                                          ⁢                                                  ·                                                  ⁢                          (                                                2                  ⁢                  i                                ⁢                                                                  +                1                            )                        ·                          (                                                2                  ⁢                  j                                +                1                            )                                                          4              ⁢              N                        +            2                          )             DST-I            T      i        ⁡          (      j      )        =                    2                  N          +          1                      ·                  ⁢          sin      ⁡              (                                          ⁢                                                  ·                                                  ⁢                          (                              i                ⁢                                                                  +                1                            )                        ·                          (                              j                +                1                            )                                            N            +            1                          )             DST-VII            T      i        ⁡          (      j      )        =                    4                              2            ⁢            N                    +          1                      ·                  ⁢          sin      ⁡              (                                          ⁢                                                  ·                                                  ⁢                          (                                                2                  ⁢                  i                                ⁢                                                                  +                1                            )                        ·                          (                              j                +                1                            )                                                          2              ⁢              N                        +            1                          )            
According to EMT, multiple transforms can be selected for one TU. For example, for Inter CUs, one EMT flag can be coded to indicate that the HEVC transform is used (i.e., EMT flag equal to zero) or one of the new multiple transforms is used (i.e., EMT flag equal to one). When EMT flag is equal to one, there are two kinds of different transforms in horizontal and vertical directions, respectively. An EMT index may be used to indicate the transform selected for each of the horizontal and vertical directions. Overall, four different transforms are supported for each CU when EMT flag is one. For Intra CUs, there are also four candidates in multiple transforms. However, these four candidates are variants according to Intra prediction direction.
In order to keep the orthogonality of the transform matrix, the transform matrices are quantized more accurately than the transform matrices in HEVC. To keep the intermediate values of the transformed coefficients within the range of 16-bit, after horizontal and after vertical transform, all the coefficients are right shifted by 2 more bits, comparing to the right shift used in the current HEVC transforms.
The AMT is enabled for CUs with both width and height smaller than or equal to 64. Whether AMT is applied or not is controlled by a CU-level flag. When the CU-level flag is equal to 0, DCT-II is applied in the CU to code the residue. For luma coding blocks within an AMT enabled CU, two additional flags are signaled to identify the horizontal and vertical transform to be used.
When AMT is available, a CU level flag is used to control whether AMT is applied to the CU. When the CU level AMT flag is equal to 0, DCT-II is applied to the CU to code the residue in both horizontal and vertical directions. For a luma coding block, when the CU AMT flag is equal to 1 indicating AMT is applied to the CU, two additional flags are signaled to identify the selected horizontal and vertical transforms.
For Intra residue coding, due to the different residual statistics of different Intra prediction modes, a mode-dependent transform candidate selection process is used. Three transform sets have been pre-defined for each CU as shown in Table 2, where each set consists of two different transforms. A transform index from 0 to 2 is used to select a transform set. The transform set is selected based on the Intra prediction mode of the CU, as specified in Table 3. Based on Table 3, a transform set is first identified according to the Intra prediction mode of a CU when the CU-level AMT flag is equal to 1. Upon the identified transform set, for each of the horizontal and vertical transforms, one of the two transform candidates is selected based on an explicitly signaled flag. For example, if Intra prediction mode 16 is used for a CU, the vertical transform will used transform set 0 (i.e., DST-VII and DCT-VIII) and the horizontal transform will used transform set 2 (i.e., DST-VII and DCT-V). Furthermore, a flag is signaled to indicate the transform candidate selected for the vertical transform and a flag is signaled to indicate the transform candidate selected for the horizontal transform. If the flags signaled correspond to (1, 0) for vertical and horizontal transforms respectively and the CU is coded using Intra prediction mode 16, the second candidate (i.e., DCT-VIII) from set 0 is used for the vertical transform and the first candidate (i.e., DST-VII) from set 2 is used for the horizontal transform
TABLE 2Three pre-defined transform candidate setsTransform SetTransform Candidates0DST-VII, DCT-VIII1DST-VII, DST-I2DST-VII, DCT-V
TABLE 3Transform set selection based on Intra prediction modeIntra Mode01234567891011121314151617V210101010101010000H210101010101012222Intra Mode1819202122232425262728293031323334V00000101010101010H22222101010101010Intra Mode353637383940414243444546474849505152V101010101012222222H101010101010000000Intra Mode5354555657585960616263646566V22101010101010H00101010101010
For Inter prediction residual, only one transform set consists of DST-VII and DCT-VIII is used for all Inter modes and for both horizontal and vertical transforms.
It is desirable to develop methods to further improve the coding performance or reducing the complexity for system incorporating EMT.