Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras.
For 3D video, in addition to the conventional texture data associated with multiple views, depth data is often captured or derived as well. The depth data may be captured for video associated with one view or multiple views. The depth information may also be derived from images of different views. The depth data may be represented in lower spatial resolution than the texture data. The depth information is useful for view synthesis and inter-view prediction.
To share the previously coded texture information of adjacent views, a technique known as disparity-compensated prediction (DCP) has been included in the HTM (High Efficiency Video Coding (HEVC)-based Test Model) software test platform as an alternative to motion-compensated prediction (MCP). MCP refers to Inter-picture prediction that uses previously coded pictures of the same view, while DCP refers to an Inter-picture prediction that uses previously coded pictures of other views in the same access unit. FIG. 1 illustrates an example of 3D video coding system incorporating MCP and DCP. The vector (110) used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP. FIG. 1 illustrates three MVs (120, 130 and 140) associated with MCP. Moreover, the DV of a DCP block can also be predicted by the disparity vector predictor (DVP) candidate derived from neighboring blocks or the temporal collocated blocks that also use inter-view reference pictures.
The derivation of inter-view motion prediction is illustrated in FIG. 2. An estimated disparity vector 210 (DV) is derived for the current block (222) in the current picture (220). The estimated DV (210) is used along with the current block (222) to locate the corresponding block (232) in the base-view picture (230) by combining the position of the current block and the estimated DV. A condition is checked to determine whether the corresponding block (232) is Inter-coded and the Picture Order Count (POC) of the reference picture (240) is in the reference lists of the current block (222). If the existence condition is true, the MV (260) of the corresponding block (232) will be provided as the inter-view motion prediction for the current block (222), where the MV (260) of the corresponding block (232) is used by the current block (222) to point to a reference picture (250) in the same view as the current picture (220). Otherwise, the estimated DV itself (with vertical component set to zero) can be regarded as a ‘Motion Vector Prediction (MVP)’, which is actually DV Prediction (DVP).
The estimated DV plays a critical role in the process of inter-view motion prediction. In the conventional HTM, the estimated DV is derived by checking whether spatial or temporal neighboring blocks have any available DV. If so, an available DV will be used as the estimated DV for the current block. If none of the neighboring blocks has any available DV, the conventional HTM adopts a technique, named DV-MCP (Disparity Vector-Motion Compensated Prediction) to provide an estimated DV. The DV-MCP technique determines the estimated DV based on the depth map of the current block. If the DV-MCP method also fails to find an estimated DV, a zero DV is used as the default DV.
In HTM, Merge mode is provided for Inter coded block to allow the block to be “merged” with a neighboring block. For a selected block coded in Merge mode, the motion information can be determined from the coded neighboring blocks. A set of possible candidates in Merge mode comprises spatial neighbor candidates and a temporal candidate. Index information is transmitted to select one out of several available candidates. Therefore, only residual information for the selected block needs to send. Skip mode is similar to Merge mode where no motion information needs to be explicitly transmitted. For a block coded in Skip mode, there is also no need to explicitly transmit the residual information. The residual information can be inferred as default values, such as zero. In general, there are two types of Inter-coded blocks: Merge and non-Merge. When an Inter-coded block is not coded in Merge/Skip mode, the Inter-coded block is coded according to Advanced Motion Vector Prediction (AMVP). The MV candidate lists for Merge coded block and AMVP coded block are constructed differently.
In Three-Dimensional Video Coding (3DVC), an inter-view candidate is introduced into the MV candidate list. The inter-view candidate can be inter-view motion prediction or DV prediction depending on the existence condition for Merge coded blocks and depending on the target reference picture for AMVP coded blocks as mentioned before. The inter-view candidate is placed in the first candidate position (i.e., position 0) for Merge coded blocks and the third candidate position (i.e., position 2) for AMVP coded blocks. For AMVP coded blocks, the MV candidate list is constructed in the same way regardless of whether the target reference picture of the current block corresponds to an inter-view reference picture or a temporal reference picture. Similarly, for Merge coded blocks, the MV candidate list is constructed in the same way regardless of whether the inter-view candidate of the current block refers to an inter-view reference picture or a temporal reference picture.
For AMVP, the target reference picture is specified explicitly. For MV candidate list constructed for AMVP coded blocks, the DV estimation process is invoked first to find an estimated DV. The AMVP derivation process will fill up the candidate list, where the candidate list includes spatial candidates, temporal candidate and inter-view candidate. The term candidate in this disclosure may refer to DV candidate, MV candidate or MVP candidate. The spatial candidates are derived based on neighboring blocks as shown in FIG. 3A, where neighboring blocks include Above_Left block (B2), Above block (B1), Above_Right block (B0), Left block (A1) and Below_Left block (A0). A spatial candidate is selected among B0-B2 and another spatial candidate is selected from A0 and A1. After spatial MV candidates are derived, the inter-view candidate is checked to determine if it refers to the target reference picture. The temporal candidate is then derived based on temporal neighboring blocks as shown in FIG. 3B, where the temporal neighboring blocks include a collocated center block (BCTR) and Right_Bottom block (RB). In HTM, Right_Bottom block (RB) is checked first for the temporal candidate and, if no MV is found, the collocated center block (BCTR) is checks.
FIG. 3C shows a simplified flowchart of the AMVP candidate list derivation process. An estimated DV is received as a possible inter-view candidate as shown in step 310. The DVs from neighboring blocks are checked in step 320 through step 360 to derive spatial candidates. As shown in step 320, Below_Left block is checked and if an MV is available, the first spatial MV candidate is derived. In this case, the process continues to derive the second spatial MV candidate. Otherwise, Left block is check as shown in step 330. For the second spatial MV candidate, Above block is checked first. If an MV is available, the second spatial MV candidate is derived. Otherwise, the process further checks Above_Right block as shown in step 350. If an MV is available, the second spatial MV is derived. Otherwise, it further checks Above_Left block as shown in step 360. The inter-view candidate is checked to determine whether it refers to the target reference picture as shown in step 370. A temporal candidate is checked in step 380. If the temporal candidate exists, it is added to the MV candidate list for AMVP. The POC scaling checking is omitted in the following discussion. The spatial candidates, temporal candidate and inter-view candidate are all referred as candidate members of the candidate list.
An exemplary DV estimation process (400) is shown in FIG. 4. Neighboring blocks are checked one by one as shown in steps 410-450 of FIG. 4 to determine whether a DV is available in the neighboring block. Whenever an available DV is found, there is no need to further check the remaining neighboring blocks or to use DV-MCP. The estimated DV is considered as an ‘MV’s referring to an inter-view reference picture. After spatial neighboring blocks are checked, if no available DV is found, the DV of the temporal neighboring block is checked in step 460 to determine whether a DV is available. DV-MCP method is used to derive an estimated DV if none of the spatial and temporal neighboring blocks has an available DV. In this case, the depth map of the current block is used to derive the estimated DV as shown in step 470.
If the target reference picture determined for an AMVP coded block is an inter-view reference picture, there might be redundancy between the neighboring block checking in the DV estimation process and the AMVP candidate list derivation process. Both processes check the availability of the motion information, where both DV and MV are consider part of motion information associated with a block, among the spatial and temporal neighboring blocks to determine if there is an ‘MV’ referring to the inter-view reference picture. In the worst case, all the neighboring blocks will be checked for the second time in different orders, as shown in FIG. 5. The estimated DV derivation process (400) will determine the estimated DV. The estimated DV is used during the AMVP candidate list derivation process to fill up the needed candidates in the list. Moreover, the inter-view candidate is used as DVP instead of a candidate for the inter-view motion prediction when the target reference picture is an inter-view reference picture. The inter-view candidate in this case is based on DVs of spatial and temporal neighboring blocks. On the other hand, since the target reference picture is an inter-view reference picture, the spatial and temporal candidates of the MV candidate list for AMVP correspond to DVs of the spatial and temporal neighboring blocks pointing to the inter-view reference picture. Therefore, the inter-view candidate and spatial/temporal candidates are derived based on the same motion information. Therefore, the inter-view candidate in this case may not be efficient.
Another issue with the conventional 3D video coding as described in the conventional HTM is related to the candidate list derivation for Merge mode. In Merge mode, inter-view candidate is placed in the first candidate position in the candidate list. As mentioned before, the inter-view candidate can be used in the inter-view motion prediction or used for DVP, depending on the existence condition. In inter-view motion prediction, the inter-view candidate refers to a temporal reference picture. In the case of DVP, the DVP refers to an inter-view reference picture. It may not be efficient to place the inter-view candidate at the first candidate position when the inter-view candidate is used as DVP.