Particular embodiments generally relate to video compression.
Video compression systems employ block processing for most of the compression operations. A block is a group of neighboring pixels and may be treated as one coding unit in terms of the compression operations. Theoretically, a larger coding unit is preferred to take advantage of correlation among immediate neighboring pixels. Various video compression standards, e.g., Motion Picture Expert Group (MPEG)-1, MPEG-2, and MPEG-4, use block sizes of 4×4, 8×8, and 16×16 (referred to as a macroblock (MB)). The standards typically use a fixed transform size (e.g., 4×4 or 8×8) in a macro block. However, if more than one transform size is used, then a macroblock level parameter may be required to indicate which transform size to use. Including this parameter increases the overhead as the macroblock level parameter needs to be encoded.
High efficiency video coding (HEVC) is also a block-based hybrid spatial and temporal predictive coding scheme. HEVC partitions an input picture into square blocks referred to as largest coding units (LCUs). Each LCU can be partitioned into smaller square blocks called coding units (CUs). FIG. 1a shows an example of an LCU partition of CUs. An LCU 100 is first partitioned into four CUs 102. Each CU 102 may also be further split into four smaller CUs 102 that are a quarter of the size of the CU 102. This partitioning process can be repeated based on certain criteria, such as limits to the number of times a CU can be partitioned may be imposed. As shown, CUs 102-1, 102-3, and 102-4 are a quarter of the size of LCU 100. Further, a CU 102-2 has been split into four CUs 102-5, 102-6, 102-7, and 102-8.
A quadtree data representation is used to describe how LCU 100 is partitioned into CUs 102. FIG. 1b shows a quadtree 104 of the LCU partition shown in FIG. 1a. Each node of quadtree 104 is assigned a flag of “1” if the node is further split into four sub-nodes and assigned a flag of “0” if the node is not split. The flag is called a split bit (e.g. 1) or stop bit (e.g., 0) and is coded in a compressed bitstream.
A node 106-1 includes a flag “1” at a top CU level because LCU 100 is split into 4 CUs. At an intermediate CU level, the flags indicate whether a CU 102 is further split into four CUs. In this case, a node 106-3 includes a flag of “1” because CU 102-2 has been split into four CUs 102-5-102-8. Nodes 106-2, 106-4, and 106-5 include a flag of “0” because these CUs 102 are not split. Nodes 106-6, 106-7, 106-8, and 106-9 are at a bottom CU level and hence, no flag bit of “0” or “1” is necessary for those nodes because corresponding CUs 102-5-102-8 are not split. The quadtree data representation for quadtree 104 shown in FIG. 1b may be represented by the binary data of “10100”, where each bit represents a node 106 of quadtree 104. The binary data indicates the LCU partitioning to the encoder and decoder, and this binary data needs to be coded and transmitted as overhead.
HEVC uses a block transform of either a square or non-square. Each CU 102 may include one or more prediction units (PUs). The PUs may be used to perform spatial prediction or temporal prediction.
FIG. 2A shows an example of a CU partition of PUs. As shown, a CU 102 has been partitioned into four PUs 202-1-202-4. Unlike prior standards where only one transform of 8×8 or 4×4 is applied to a macroblock, a set of block transforms of different sizes may be applied to a CU 102. For example, the CU partition of PUs 202 shown in FIG. 2A may be associated with a set of transform units (TUs) 204 shown in FIG. 2B. In FIG. 2B, PU 202-1 is partitioned into four TUs 204-5-204-8. Also, TUs 204-2, 204-3, and 204-4 are the same size as corresponding PUs 202-2-202-4. Because the size and location of each block transform within a CU may vary, another quadtree data representation, referred to as a residual quadtree (RQT), is needed to describe the TU partitioning. FIG. 2c shows an example of an RQT. The RQT is derived in a similar fashion as described with respect to quadtree 104 for the LCU partitioning. For example, each node of the RQT may include a flag of “1” if CU 102 is split into more than one TU 204. A node 206-1 includes a flag of “1” because CU 102 is split into four TUs 204. Also, node 206-2 has a flag of “1” because TU 204-1 is split into four TUs 204-5-204-8. All other nodes 206 have a flag of “0” because TUs 204-2, 204-3, and 204-4 are not split. For the RQT data representation, binary data of “11000” also has to be encoded and transmitted as overhead. Having to encode and transmit the RQT data representation may be undesirable due to the added overhead and complexity.
A rate-distortion (RD) based approach is used to determine the coding units (CUs) within the LCU and the transform units within the CUs. The RD based approach may be costly in terms of additional complexity as every level of the quadtree 104 and the RQT is tested to determine if a node should be split. The RQT may be built from bottom to top, where the RD decision process starts from the smallest TU nodes (bottom TUs). A total RD cost of four TUs (children) is compared against their parent and a winner is then determined. If the winner is the parent, the node (parent) has no children. Otherwise, the node has four children where the RD cost is the sum of the four children. Then, the parent node will be further combined with its three siblings to compare with their parent node. The process repeats all the way to the CU level to have the final RQT tree shape. To keep the overhead and complexity relatively low, constraints may be applied to the RQT structure, such as the maximum size of a TU and the depth of the RQT. For example, the maximum TU size is set equal to the CU size. Also, the depth of the RQT determines the minimum size of a TU relative to the maximum TU size. For example, a tree depth may be set to two or three levels Limiting the depth limits the number of levels of partitioning that are available and the complexity of the RD decision.
Issues may result with the current RQT restrictions. First, the current RQT uses a short tree depth that implies a relatively balanced tree. In a balanced tree, the nodes at the same level within the RQT have more or less the same split or stop bit. The RQT representation for a balanced tree may not be as efficient a use of a block-based syntax where one TU size is applied to a CU or a PU, and each possible TU size is assigned a unique code word. For example, it would be more efficient to have a fixed transform size. Second, the maximum size of the TU is set to the CU size and the RD based decision approach is used to determine the TU sizes within a CU. The TU sizes and the positions of the TU within a CU are then represented by the RQT. The RD decision process may be complicated and adds complexity.