Video compression systems employ block processing for most of the compression operations. A block is a group of neighboring pixels and may be treated as one coding unit in terms of the compression operations. Theoretically, a larger coding unit is preferred to take advantage of correlation among immediately neighboring pixels. Various video compression standards, e.g., Motion Picture Expert Group (“MPEG”)-1, MPEG-2, and MPEG-4, use block sizes of 4×4, 8×8, and 16×16 (referred to as a macroblock).
High efficiency video coding (“HEVC”) is a block-based hybrid spatial and temporal predictive coding scheme. HEVC partitions an input picture into square blocks referred to as coding tree units (“CTUs”) as shown in FIG. 1. Unlike prior coding standards, the CTU can be as large as 128×128 pixels. Each CTU can be partitioned into smaller square blocks called coding units (“CUs”). FIG. 2 shows an example of a CTU partition of CUs. A CTU 100 is first partitioned into four CUs 102. Each CU 102 may be further split into four smaller CUs 102 that are a quarter of the size of the CU 102. This partitioning process can be repeated based on certain criteria, such as limits to the number of times a CU can be partitioned. As shown, CUs 102-1, 102-3, and 102-4 are a quarter of the size of CTU 100. Further, a CU 102-2 has been split into four CUs 102-5, 102-6, 102-7, and 102-8.
Each CU 102 may include one or more blocks, which may be referred to as prediction units (“PUs”). FIG. 3A shows an example of a CU partition of PUs. The PUs may be used to perform spatial prediction or temporal prediction. A CU can be either spatially or temporally predictively coded. If a CU is coded in intra mode, each PU of the CU can have its own spatial prediction direction. If a CU is coded in inter mode, each PU of the CU can have its own motion vectors and associated reference pictures.
Unlike prior standards where only one transform of 8×8 or 4×4 is applied to a macroblock, a set of block transforms of different sizes may be applied to a CU 102. For example, the CU partition of PUs 202 shown in FIG. 3A may be associated with a set of transform units (“TUs”) 204 shown in FIG. 3B. In FIG. 3B, PU 202-1 is partitioned into four TUs 204-5 through 204-8. Also, TUs 204-2, 204-3, and 204-4 are the same size as corresponding PUs 202-2 through 202-4. Each TU 204 can include one or more transform coefficients in most cases but may include none (e.g., all zeros). Transform coefficients of the TU 204 can be quantized into one of a finite number of possible values. After the transform coefficients have been quantized, the quantized transform coefficients can be entropy coded to obtain the final compressed bits that can be sent to a decoder.
Using the above block processing, scalable video coding supports decoders with different capabilities. An encoder generates multiple bitstreams for an input video. One of the output bitstreams, referred to as the base layer, can be decoded by itself, and this bitstream provides the lowest scalability level of the video output. To achieve a higher level of video output, the decoder can process the base-layer bitstream together with other output bitstreams, referred to as enhancement layers. One or more enhancement layers may be added to the base layer to generate higher scalability levels. One example is spatial scalability, where the base layer represents the lowest resolution video and the decoder can generate higher resolution video by combining the base-layer bitstream together with additional enhancement-layer bitstreams. Thus, using additional enhancement-layer bitstreams produces a better quality video output.