Particular embodiments generally relate to video compression.
In video sequences, a great degree of temporal redundancy may exist. That is, within a very short period of time, the shape(s) of foreground object(s) and the background within a picture may not vary very much, and pixels in the foreground objects may move along a similar nature. In object-based video coding, different parts of a picture can be coded and transmitted separately as video objects. Motion information of different pixels in the same object should be the same in some cases. However, many bits still need to be used to describe the arbitrary object shape, which reduces coding efficiency. Thus, the efficient representation of object motion is challenging.
High efficiency video coding (HEVC) is a block-based hybrid spatial and temporal predictive coding scheme. HEVC partitions an input picture into square blocks referred to as largest coding units (LCUs) that could be a size up to 64×64. Theoretically, a larger coding unit is preferred to take advantage of correlation among immediate neighboring pixels. Each LCU can be partitioned into smaller square blocks called coding units (CUs). FIG. 1A shows an example of an LCU partition of CUs. An LCU 100 is first partitioned into four CUs 102. Each CU 102 may also be further split into four smaller CUs 102 that are a quarter of the size of the CU 102. This partitioning process can be repeated based on certain criteria, such as limits to the number of times a CU can be partitioned may be imposed. As shown, CUs 102-1, 102-3, and 102-4 are a quarter of the size of LCU 100. Further, a CU 102-2 has been split into four CUs 102-5, 102-6, 102-7, and 102-8.
To allow for flexible motion representation and higher coding efficiency, a quadtree data representation is used to describe how LCU 100 is partitioned into CUs 102. FIG. 1B shows a quadtree 104 of the LCU partition shown in FIG. 1A. Each node of quadtree 104 is assigned a flag of “1” if the node is further split into four sub-nodes and assigned a flag of “0” if the node is not split. The flag is called a split bit (e.g. 1) or stop bit (e.g., 0) and is coded in a compressed bitstream.
A node 106-1 includes a flag “1” at a top CU level because LCU 100 is split into 4 CUs. At an intermediate CU level, the flags indicate whether a CU 102 is further split into four CUs. In this case, a node 106-3 includes a flag of “1” because CU 102-2 has been split into four CUs 102-5-102-8. Nodes 106-2, 106-4, and 106-5 include a flag of “0” because these CUs 102 are not split. Nodes 106-6, 106-7, 106-8, and 106-9 are at a bottom CU level and hence, no flag bit of “0” or “1” is necessary for those nodes because corresponding CUs 102-5-102-8 are not split. The partitioning process may continue all the way to 4×4 blocks. The quadtree data representation for quadtree 104 shown in FIG. 1B may be represented by the binary data of “10100”, where each bit represents a node 106 of quadtree 104. The binary data indicates the LCU partitioning to the encoder and decoder, and this binary data needs to be coded and transmitted as overhead.
Each CU 102 may include one or more prediction units (PUs). The PUs may be used to perform spatial prediction or temporal prediction. FIG. 2 shows an example of a CU partition of PUs 202. As shown, a CU 102 has been partitioned into four PUs 202-1-202-4. Spatial or temporal prediction coding may be performed over each PU 202. In inter-mode, motion parameters are coded and transmitted for each PU. The structure may require many bits for motion information, especially for irregularly shaped objects.
A spatial merge mode may be used to improve coding efficiency. The spatial merge mode may merge a current block with its neighboring block(s) to form a “region”. All the pixels within the region share the same motion parameters. Thus, there is no need to code and transmit motion parameters for each individual block of a region. Instead, for a region, only one set of motion parameters is coded and transmitted. The current block is allowed to merge with a spatially-located block that is neighboring the current block to the left or the top. An indicator is used to specify whether the current block is merged with an available neighboring block, either the left neighboring block or the top neighboring block should be used in the spatial merge. The spatial merge mode is limited to merging with spatially-located blocks in the same frame.
A temporal merge mode may also be used to further improve coding efficiency. The temporal merge mode may enable a current block to use the motion parameters of its temporal neighboring block(s). Thus, there is no need to code and transmit motion parameters for each individual block merged by temporal merge mode. Instead, only one set of motion parameters is coded and transmitted. The current block is allowed to merge with a temporally-located block from a previous encoded/decoded picture. An indicator is used to specify whether the current block is merged with an available temporal neighboring block.