Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras.
To reduce the inter-view redundancy, disparity-compensated prediction (DCP) has been used as an alternative to motion-compensated prediction (MCP). MCP refers to an inter-picture prediction that uses already coded pictures of the same view in a different access unit, while DCP refers to inter-picture prediction that uses already coded pictures of other views in the same access unit, as illustrated in FIG. 1. The three-dimensional/multi-view data consists of texture pictures (110) and depth maps (120). The motion compensated prediction is applied to texture pictures or depth maps in the temporal direction (i.e., the horizontal direction in FIG. 1). The disparity compensated prediction is applied to texture pictures or depth maps in the view direction (i.e., the vertical direction in FIG. 1). The vector used for DCP is termed disparity vector (DV), which is analog to the motion vector (MV) used in MCP.
3D-HEVC is an extension of HEVC (High Efficiency Video Coding) that is being developed for encoding/decoding 3D video. One of the views is referred to as the base view or the independent view. The base view is coded independently of the other views as well as the depth data. Furthermore, the base view is coded using a conventional HEVC video coder.
In 3D-HEVC, a hybrid block-based motion-compensated DCT-like transform coding architecture is still utilized. The basic unit for compression, termed coding unit (CU), is a 2N×2N square block, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs). The PU size can be 2N×2N, 2N×N, N×2N, or N×N. When asymmetric motion partition (AMP) is supported, the PU size can also be 2N×nU, 2N×nD, nL×2N and nR×2N.
In 3D-HEVC, a motion vector competition (MVC) based scheme is also applied to select a motion vector predictor/disparity vector predictor (MVP/DVP) among a given candidate set (or candidate list). There are three inter-prediction modes including Inter, Skip, and Merge. The Inter mode performs motion-compensated prediction/disparity-compensated prediction with transmitted motion vectors/disparity vectors (MVs/DVs), while the Skip and Merge modes utilize inference methods to select a MV or DV from a candidate list to obtain the motion information. The candidate list comprises candidates from spatial neighboring blocks located in the current picture, a temporal neighboring block located in a temporal collocated picture which is signaled in the slice header, or the corresponding block in an inter-view reference picture. These candidates are arranged in the candidate list according to a competition order, one candidate in the list is selected as MV/DV or MVP/DVP. When a PU is coded in Skip or Merge mode, no motion information is transmitted except for the index of the selected candidate. In the case of a PU coded in the Skip mode, the residual signal is also omitted.
For the Inter mode in HTM-4.0 (3D-HEVC based Test Model version 4.0), the Advanced Motion Vector Prediction (AMVP) scheme is used to select a motion vector predictor among an AMVP candidate set. As for the Merge and Skip modes, the Merge scheme is used to select a motion vector predictor among a Merge candidate set. Based on the rate-distortion optimization (RDO) decision, the encoder selects one final MVP/DVP within a given candidate set of MVPs/DVPs for Inter, Skip, or Merge modes and transmits the index of the selected MVP/DVP to the decoder. The selected MVP/DVP may be linearly scaled according to temporal distances or view distances.
For the Inter mode of depth coding, the reference picture index is explicitly transmitted to the decoder. The MVP/DVP is then selected among the candidate set for a given reference picture index. As shown in FIG. 2, the MVP/DVP candidate set for the Inter mode in HTM-4.0 includes two spatial MVPs/DVPs, an inter-view candidate, and a temporal MVP/DVP. One spatial MVP/DVP candidate is selected from B0, B1 and B2 and the other spatial MVP/DVP candidate is selected from A0 and A1. The temporal MVP/DVP candidate is selected from TBR. If TBR is not available, TCT is used. The temporal blocks TBR and TCT are located in a temporal reference picture. The size of MVP/DVP candidate set is fixed to 2 or 3 depending on whether the inter-view candidate is included.
In 3D-HEVC, if a particular block is encoded using a Merge mode, a Merge index is signaled to indicate which MVP/DVP candidate among the Merge candidate set is used for this block to be merged. To follow the essence of motion information sharing, each merged PU reuses the MV, prediction direction, and reference picture index of the selected candidate. For the temporal merging candidate, the reference picture index is set to zero and the MV is scaled according to the POC distances. As shown in FIG. 2, the Merge candidate set includes five spatial merging candidates, one inter-view candidate, one disparity candidate, one VSP candidate, and one temporal merging candidate. The size of Merge candidate set is fixed to 6. The temporal candidate is based on the bottom-right block (TBR) of the temporally collocated block. If the bottom-right block (TBR) is not available, the center block (TCT) of the temporally collocated block is used. The candidate set for texture coding in 3D-HEVC is shown as the following:
Inter-view candidate                A1        B1        B0        A0        Disparity candidate (DV)        B2        VSP candidate        Temporal candidate.        
The candidates are inserted into the candidate list one by one according to the competition order shown above (i.e., from Inter-view candidate to temporal candidate). When the total number of candidates in the merging candidate list reaches 6 (with redundant candidates removed), no additional candidate will be inserted.
For Merge mode and Skip mode, if the candidate list can result in a better MV/DV or MVP/DVP, more blocks may be coded in the merge mode or Skip mode. Accordingly, the coding efficiency may be improved. Therefore, it is desired to develop a merging candidate list that can improve the coding efficiency.