ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) published the H.265/HEVC (High Efficiency Video Coding) standard in 2013 (version 1) 2014 (version 2) 2015 (version 3) and 2016 (version 4). Since then, the potential need has been studied for standardization of future video coding technology with a compression capability that significantly exceeds that of the HEVC standard (including its extensions).
In October 2017, a Joint Call for Proposals on Video Compression with Capability beyond HEVC (CfP) was issued. By Feb. 15, 2018, total 22 CfP responses on standard dynamic range (SDR), 12 CfP responses on high dynamic range (HDR), and 12 CfP responses on 360 video categories were submitted, respectively. In April 2018, all received CfP responses were evaluated in the 122 MPEG/10th JVET (Joint Video Exploration Team—Joint Video Expert Team) meeting. With careful evaluation, JVET formally launched the standardization of next-generation video coding beyond HEVC, i.e., the so-called Versatile Video Coding (VVC).
In HEVC, a coding tree unit (CTU) is split into coding units (CUs) by using a quadtree structure denoted as coding tree to adapt to various local characteristics. The decision whether to code a picture area using inter-picture (temporal) or intra-picture (spatial) prediction is made at the CU level. Each CU can be further split into one, two or four prediction units (PUs) according to the PU splitting type. Inside one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying the prediction process based on the PU splitting type, a CU can be partitioned into transform units (TUs) according to another quadtree structure similar to the coding tree for the CU. One of key feature of the HEVC structure is that it has the multiple partition conceptions including CU, PU, and TU. In HEVC, a CU or a TU can only be square shape, while a PU may be square or rectangular shape for an inter predicted block. In later stage of HEVC some contributions proposed to allow rectangular shape PUs for intra prediction and transform. These proposals were not adopted to HEVC but extended to be used in JEM.
At picture boundary, HEVC imposes implicit quad-tree split so that a block will keep quad-tree splitting until the size fits the picture boundary.
Under the development of VVC, 65 angular directions are proposed, to accommodate the increased number of directional intra modes, an intra mode coding method with 6 Most Probable Modes (MPMs) is used. Two major technical aspects are involved: 1) the derivation of 6 MPMs, and 2) entropy coding of 6 MPMs and non-MPM modes. In the JEM, the modes included into the MPM lists are classified into three groups, neighbor intra modes, derived intra modes, and default intra modes.
Five neighboring intra prediction modes are used to form the MPM list. If the MPM list is not full (i.e., there are less than 6 MPM candidates in the list), derived modes are added; these intra modes are obtained by adding −1 or +1 to the angular modes that are already included in the MPM list. Such additional derived modes are not generated from the non-angular modes (DC or planar). Finally, if the MPM list is still not full, the default modes are added in the following order: vertical, horizontal, mode 2, and diagonal mode. As a result of this process, a unique list of 6 MPM modes is generated.
The coding for selection of the remaining 61 non-MPMs is done as follows. The 61 non-MPMs are first divided into two sets: a selected mode set (secondary MPM) and a non-selected mode set. The selected modes set contains 16 modes and the rest (45 modes) are assigned to the non-selected modes set. The mode set that the current mode belongs to is indicated in the bitstream with a flag. If the mode to be indicated is within the selected mode set, the selected mode is signaled with a 4-bit fixed-length code, and if the mode to be indicated is from the non-selected set, the selected mode is signaled with a truncated binary code.
In addition to DCT-II and 4×4 DST-VII which have been employed in HEVC, an Adaptive Multiple Transform (AMT, or as known as Enhanced Multiple Transform (EMT)) scheme is used for residual coding for both inter and intra coded blocks. It uses multiple selected transforms from the DCT/DST families other than the current transforms in HEVC. The newly introduced transform matrices are DST-VII, DCT-VIII, DST-I and DCT-V. Table shows the basis functions of the selected DST/DCT.
TABLE 1Transform basis functions of DCT-II/V/VIII and DST-I/VII for N-point inputTransform TypeBasis function Ti(j), i, j = 0, 1, . . . , N − 1DCT-II            T      i        ⁡          (      j      )        =            ω      0        ·                  2        N              ·          cos      ⁡              (                              π            ·            i            ·                          (                                                2                  ⁢                  j                                +                1                            )                                            2            ⁢            N                          )                     where    ⁢                  ⁢          ω      0        =      {                                                      2              N                                                            i            =            0                                                1                                      i            ≠            0                               DCT-V                    T        i            ⁡              (        j        )              =                  ω        0            ·              ω        1            ·                        2                                    2              ⁢              N                        -            1                              ·              cos        ⁡                  (                                    2              ⁢                              π                ·                i                ·                j                                                                    2                ⁢                N                            -              1                                )                      ,         where    ⁢                  ⁢          ω      0        =      {                                                                      2                N                                                                        i              =              0                                                            1                                              i              ≠              0                                          ,                        ω          1                =                  {                                                                                          2                    N                                                                                                j                  =                  0                                                                                    1                                                              j                  ≠                  0                                                                         DCT-VIII            T      i        ⁡          (      j      )        =                    4                              2            ⁢            N                    +          1                      ·          cos      ⁡              (                              π            ·                          (                                                2                  ⁢                  i                                +                1                            )                        ·                          (                                                2                  ⁢                  j                                +                1                            )                                                          4              ⁢              N                        +            2                          )             DST-I            T      i        ⁡          (      j      )        =                    2                  N          +          1                      ·          sin      ⁡              (                              π            ·                          (                              i                +                1                            )                        ·                          (                              j                +                1                            )                                            N            +            1                          )             DST-VII            T      i        ⁡          (      j      )        =                    4                              2            ⁢            N                    +          1                      ·          sin      ⁡              (                              π            ·                          (                                                2                  ⁢                  i                                +                1                            )                        ·                          (                              j                +                1                            )                                                          2              ⁢              N                        +            1                          )            
In order to keep the orthogonality of the transform matrix, the transform matrices are quantized more accurately than the transform matrices in HEVC, with 10-bit representation instead of 8-bit in HEVC. To keep the intermediate values of the transformed coefficients within the range of 16-bit, after horizontal and after vertical transform, all the coefficients are right shifted by 2 more bits, comparing to the right shift used in the current HEVC transforms.
The AMT applies to the CUs with both width and height smaller than or equal to 64, and whether AMT applies or not is controlled by a CU level flag. When the CU level flag is equal to 0, DCT-II is applied in the CU to encode the residue. For luma coding block within an AMT enabled CU, two additional flags are signalled to identify the horizontal and vertical transform to be used. As in HEVC, the residual of a block can be coded with transform skip mode in the JEM. To avoid the redundancy of syntax coding, the transform skip flag is not signalled when the CU level AMT flag is not equal to zero.
For intra residue coding, due to the different residual statistics of different intra prediction modes, a mode-dependent transform candidate selection process is used. Three transform subsets have been defined as shown in Table, and the transform subset is selected based on the intra prediction mode, as specified in Table.
TABLE 2Three pre-defined transform candidate setsTransformSetTransform Candidates0DST-VII, DCT-VIII1DST-VII, DST-I2DST-VII, DCT-VIII
With the subset concept, a transform subset is first identified based on Table using the intra prediction mode of a CU with the CU-level AMT flag is equal to 1. After that, for each of the horizontal and vertical transform, one of the two transform candidates in the identified transform subset, according to in Table, is selected based on explicitly signalled with flags.
TABLE 3Selected (H)orizontal and (V)ertical transform sets for each intra prediction modeIntra Mode01234567891011121314151617V210101010101010000H210101010101012222Intra Mode1819202122232425262728293031323334V00000101010101010H22222101010101010Intra Mode353637383940414243444546474849505152V101010101012222222H101010101010000000Intra Mode5354555657585960616263646566V22101010101010H00101010101010
For inter prediction residual, however, only one transform set, which consists of DST-VII and DCT-VIII, is used for all inter modes and for both horizontal and vertical transforms.
The complexity of AMT would be relatively high at the encoder side, since totally five (DCT-II and four multiple transform candidates) different transform candidates need to be evaluated with rate-distortion cost for each residual block when brute-force search is used. To alleviate this complexity issue at the encoder, several optimization methods are designed for algorithm acceleration in the JEM.
Under the development of VVC, a mode-dependent non-separable secondary transform (NSST) is proposed to locate between the forward core transform and quantization (at the encoder) and between the de-quantization and inverse core transform (at the decoder). To keep low complexity, NSST is only applied to the low frequency coefficients after the primary transform. If both width (W) and height (H) of a transform coefficient block is larger than or equal to 8, then 8×8 non-separable secondary transform is applied to the top-left 8×8 region of the transform coefficients block. Otherwise, if either W or H of a transform coefficient block is equal to 4, a 4×4 non-separable secondary transform is applied and the 4×4 non-separable transform is performed on the top-left min(8,W)×min(8,H) region of the transform coefficient block. The above transform selection rule is applied for both luma and chroma components.
Matrix multiplication implementation of a non-separable transform is described as follows using a 4×4 input block as an example. To apply the non-separable transform, the 4×4 input block X:
                    X        =                  [                                                                      X                  00                                                                              X                  01                                                                              X                  02                                                                              X                  03                                                                                                      X                  10                                                                              X                  11                                                                              X                  12                                                                              X                  13                                                                                                      X                  20                                                                              X                  21                                                                              X                  22                                                                              X                  23                                                                                                      X                  30                                                                              X                  31                                                                              X                  32                                                                              X                  33                                                              ]                                    (                  Equation          ⁢                                          ⁢          1                )            is represented as a vector :=[X00X01X02X03X10X11X12X13X20X21X22X23X30X31X32X33]T   (Equation 21)
The non-separable transform is calculated as =T·, where  indicates the transform coefficient vector, and T is a 16×16 transform matrix. The 16×1 coefficient vector  is subsequently re-organized as 4×4 block using the scanning order for that block (horizontal, vertical or diagonal). The coefficients with smaller index will be placed with the smaller scanning index in the 4×4 coefficient block. In JEM, a Hypercube-Givens Transform (HyGT) with butterfly implementation is used instead of matrix multiplication to reduce the complexity of non-separable transform.
There are totally 35×3 non-separable secondary transforms for both 4×4 and 8×8 block size, where 35 is the number of transform sets specified by the intra prediction mode, denoted as set, and 3 is the number of NSST candidate for each intra prediction mode. The mapping from the intra prediction mode to the transform set is defined in Table 4. The transform set applied to luma/chroma transform coefficients is specified by the corresponding luma/chroma intra prediction modes, according to Table 4. For intra prediction modes larger than 34 (diagonal prediction direction), the transform coefficient block is transposed before/after the secondary transform at the encoder/decoder.
For each transform set, the selected non-separable secondary transform candidate is further specified by the explicitly signalled CU-level NSST index. The index is signalled in a bitstream once per intra CU after transform coefficients and truncated unary binarization is used. The truncated value is 2 in case of planar or DC mode, and 3 for angular intra prediction mode. This NSST index is signalled only when there is more than one non-zero coefficient in a CU. The default value is zero when it is not signalled. Zero value of this syntax element indicates secondary transform is not applied to the current CU, values 1-3 indicates which secondary transform from the set should be applied.
In the JEM, NSST is not applied for a block coded with transform skip mode. When the NSST index is signalled for a CU and not equal to zero, NSST is not used for a block of a component that is coded with transform skip mode in the CU. When a CU with blocks of all components are coded in transform skip mode or the number of non-zero coefficients of non-transform-skip mode CBs is less than 2, the NSST index is not signalled for the CU.
For example, it is proposed to forbid mixing NSST and EMT when using QTBT—effectively enforcing NSST to only be used with DCT2 as primary transform.
TABLE 4Mapping from intra prediction mode to transform set indexintra mode01234567891011121314151617set01234567891011121314151617intra mode343536373839404142434445464748495051set343332313029282726252423222120191817intra mode18192021222324252627282930313233set18192021222324252627282930313233intra mode52535455565758596061626364656667 (LM)set1615141312111098765432NULL
A Hypercube-Givens Transform (HyGT) is used in the computation of the non-separable secondary transform. The basic elements of this orthogonal transform are Givens rotations, which are defined by orthogonal matrices G(m, n, θ), which have elements defined by
                                          G                          i              ,              j                                ⁡                      (                          m              ,              n                        )                          =                  {                                                                                          cos                    ⁢                                                                                  ⁢                    θ                                    ,                                                                                                  i                    =                                          j                      =                                                                        m                          ⁢                                                                                                          ⁢                          or                          ⁢                                                                                                          ⁢                          i                                                =                                                  j                          =                          n                                                                                                      ,                                                                                                                          sin                    ⁢                                                                                  ⁢                    θ                                    ,                                                                                                  i                    =                    m                                    ,                                      j                    =                    n                                    ,                                                                                                                                                -                      sin                                        ⁢                                                                                  ⁢                    θ                                    ,                                                                                                  i                    =                    n                                    ,                                      j                    =                    m                                    ,                                                                                                      1                  ,                                                                                                  i                    =                                                                  j                        ⁢                                                                                                  ⁢                        and                        ⁢                                                                                                  ⁢                        i                                            ≠                                              m                        ⁢                                                                                                  ⁢                        and                        ⁢                                                                                                  ⁢                        i                                            ≠                      n                                                        ,                                                                                                      0                  ,                                                                              otherwise                  .                                                                                        (                  Equation          ⁢                                          ⁢          3                )            
HyGT is implemented by combining sets of Givens rotations in a hypercube arrangement. For example, assuming that N is a power of two, a HyGT round is defined as a sequence of log 2(N) passes, where in each pass, the indexes in vectors m and n are defined by edges of a hypercube with dimension log 2(N), sequentially in each direction.
To obtain good compression, more than one HyGT round are used. For example, a full non-separable secondary transform is composed of R rounds HyGT, and may include an optional permutation pass, to sort transform coefficients according to their variance. In the JEM, 2-round HyGT is applied for 4×4 secondary transform and 4-round HyGT is applied for 8×8 secondary transform.