Video compression systems employ block processing for most of the compression operations. A block is a group of neighboring pixels and may be treated as one coding unit in terms of the compression operations. Theoretically, a larger coding unit is preferred to take advantage of correlation among immediate neighboring pixels. Various video compression standards, e.g., Motion Picture Expert Group (MPEG)-1, MPEG-2, and MPEG-4, use block sizes of 4×4, 8×8, and 16×16 (referred to as a macroblock (MB)).
High efficiency video coding (HEVC) is also a block-based hybrid spatial and temporal predictive coding scheme. HEVC partitions an input picture into square blocks referred to as coding tree units (CTUs) as shown in FIG. 1. Unlike prior coding standards, the CTU can be as large as 128×128 pixels. Each CTU can be partitioned into smaller square blocks called coding units (CUs). FIG. 2 shows an example of a CTU partition of CUs. A CTU 100 is first partitioned into four CUs 102. Each CU 102 may also be further split into four smaller CUs 102 that are a quarter of the size of the CU 102. This partitioning process can be repeated based on certain criteria, such as limits to the number of times a CU can be partitioned may be imposed. As shown, CUs 102-1, 102-3, and 102-4 are a quarter of the size of CTU 100. Further, a CU 102-2 has been split into four CUs 102-5, 102-6, 102-7, and 102-8.
Each CU 102 may include one or more blocks, which may be referred to as prediction units (PUs). FIG. 3 shows an example of a CU partition of PUs. The PUs may be used to perform spatial prediction or temporal prediction. A CU can be either spatially or temporally predictive coded. If a CU is coded in intra mode, each PU of the CU can have its own spatial prediction direction. If a CU is coded in inter mode, each PU of the CU can have its own motion vector(s) and associated reference picture(s).
In HEVC, motion vectors (MVs) are predictively coded in a temporal prediction process. For a current PU having one current motion vector and an associated reference index, a motion vector predictor (MVP) is derived from motion vectors of spatially neighboring or temporally collocated PUs of the current PU. The difference between the current motion vector and the MVP is then determined and coded. This reduces overhead as only the difference is sent instead of information for the current motion vector. Also, when in merge mode, a single motion vector may be applied to a group of spatially neighboring or temporally collocated PUs.
Given a current PU in a current picture, an associated collocated PU resides in an associated collocated picture. The collocated PU is used as one of the candidates for the MVP or in a merge/skip mode for the current PU. The collocated picture is a reference picture specified in either a list0 or a list1. A flag may be set to indicate which list the collocated PU should be defined from. For example, the flag can be set to 1 to indicate that the reference picture that contains a collocated partition shall be defined from list0, otherwise the reference picture shall be defined from list1.
Once an encoder or decoder determines the list that contains the collocated picture, the encoder or decoder uses the first reference picture in either list0 or list1. That is, the reference picture with an index of 0 in list0 or list1 is selected. In some cases, the first reference picture in list0 or list1 may not be the optimal reference picture to use when performing a temporal prediction process for the current PU.