HEVC (High Efficiency Video Coding) is an advanced video coding system being developed under the Joint Collaborative Team on Video Coding (JCT-VC) Group of TU-T Study Group. In HEVC, the core of its coding layer is a coding tree block (CTB) or largest coding unit (LCU). The size of CTB or LCU can be 64×64, 32×32 or 16×16 for the Luma component. Each CTB or LCU can be divided into coding unit(s) (CU) using quad-tree partition. Each CU can be further split into one or more prediction units (PUs) for performing prediction. After the prediction process is performed on each CU, the prediction residues are coded using block-based transforms. A transform unit (TU) has its root at the CU level, where the TU size can be of 32×32, 16×16, 8×8, or 4×4. A TU can be divided into multiple 4×4 sub-blocks for TU size larger than 4×4. Quantization and entropy coding are applied to the TU to generate compressed data corresponding to the residues.
FIG. 1 illustrates an exemplary block diagram for decoding process of a CU. The compressed data is decoded by entropy decoder 110, such as a variable length decoder (VLD) to recover the coded transform coefficients. The quantized transform coefficients are stored in a transform coefficient buffer (TC buffer) 120 for performing inverse scan (IS) 130. IS can be implemented by “rearrange 1” 140, “rearrange 2” 150, or both. The inverse scan is required due to the processing order of the transform coefficients at the encoder side. After IS, transform coefficients are processed by inverse quantization (IQ) 160 and inverse transform (IT) 170 to generate the reconstructed residues. The reconstructed residues are then used by motion compensation (MC) 180 to generate a reconstructed CU. While FIG. 1 illustrates one exemplary configuration of decoding process, other system configuration may also be used. For example, instead of having IS between entropy decoding and IQ as shown in FIG. 1, IS can be located between IQ and IT.
In HEVC, the transform coefficients are scanned in a two-level fashion. Each TU is divided into sub-blocks. For the first level, the scanning is performed over the sub-blocks of a TU. For convenience, the first level scan is also referred to as level-1 scan or inter sub-block scan. The second scan is applied to transform coefficients within each sub-block. For convenience, the second level scan is also referred to as level-2 scan or intra sub-block scan. The scan orders (also called scanning patterns in this disclosure) in level 1 and level 2 depend on the TU size and the prediction mode.
FIG. 2A and FIG. 2B illustrate exemplary scan orders adopted by HEVC for a 32×32 TU. The 32××32 TU is divided into 4×4 sub-blocks. The level-1 scan order (i.e., inter sub-block scan order) is shown in FIG. 2A and the level-2 scan order (i.e., intra sub-block scan order) is shown in FIG. 2B. As shown in FIG. 2A, the level-1 scan runs through the 64 sub-blocks in the 225-degree diagonal direction starting from the sub-block at the lower-right corner and ending at the sub-block at the upper-left corner (i.e., from rear to front of the TU, or sub-blocks 1→2→3→4→ . . . →64). During the level-1 scan, if the 4×4 sub-block contains at least one nonzero transform coefficient, further information for this 4×4 sub-block will be transmitted to convey the nonzero transform coefficient(s) in the level-2 scan as shown in FIG. 2B. The level-2 scan (i.e., the intra sub-block scan) runs through the 16 transform coefficients of the 4×4 sub-block in the 225-degree diagonal direction starting from the transform coefficient at the lower-right corner and ending at the transform coefficient at the upper-left corner (i.e., from rear to front of the sub-block, or transform coefficients 1→2→3→4→ . . . →16). On the other hand, during the level-1 scan, if the 4×4 sub-block does not contain any nonzero transform coefficient, no further information needs to be transmitted for the 4×4 sub-block. For a 16×16 TU, the level-1 scan order has the same scanning pattern as the 32×32 TU.
After two-level scanning is applied to transform coefficients of a TU, the scanned transform coefficients are coded by entropy coding, such as variable length coding. At the decoder side, entropy decoding such as variable length decoding (VLD) is used to recover the scanned transform coefficients. The scan order of the transform coefficients for the TU is the same as that shown in FIG. 2A and FIG. 2B. While the transform coefficients of a TU are scanned using two-level scanning, the IS output provided to the input to IQ/IT is according to a column by column order in a reference HEVC decoder. FIG. 3 illustrates the IS output order for a 32×32 TU, where the left-most column (i.e., column 0) is outputted first and right-most column (i.e., column 31) is outputted last. In other words, the transform coefficients from IS to IQ/IT are in the column scan order from front to rear, i.e., columns 0→1→ . . . →31. Within each column, the transform coefficients may be scanned from top to bottom. However, the scan order of the 32 transform coefficients from top to bottom within each column is not mandatory.
For the reference HEVC video decoder mentioned above, the last column (i.e., column 31) includes data for the first sub-block. Therefore, the processing of the first sub-block cannot start until the last column is received. Consequently, the TC buffer size will be equal to or larger than the biggest TU size for performing IS. For example, the biggest TU size in the HEVC main profile is 32×32. Therefore, the TC buffer size will have to be able to hold at least 64 sub-blocks of transform coefficients, i.e., 32×32× transform_coefficient_bitwidth (TC_bitwidth) bits. Furthermore, in order to achieve high system throughput, VLD to IS and IS to IQ/IT may have to be performed in parallel. The system may have to be configured in a ping-pong design and the TC buffer size will become twice as large. If the TC buffer is implemented using on-chip storage, such as DRAM or RAM, the TC buffer size will have direct impact on the chip cost. The cost associated with the TC buffer will become much higher if the largest TU size goes to 64×64 or even 128×128. It is desirable to develop an inverse scan method that can reduce the TC buffer requirement.