Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. The multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. 3D video formats may also include depth maps associated with corresponding texture pictures. The depth maps also have to be coded to rendering three-dimensional view or multi-view.
Various techniques to improve the coding efficiency of 3D video coding have been disclosed in the field. There are also development activities to standardize the coding techniques. For example, a working group, ISO/IEC JTC1/SC29/WG11 within ISO (International Organization for Standardization) is developing an HEVC (High Efficiency Video Coding) based 3D video coding standard (named 3D-HEVC). To reduce the inter-view redundancy, a technique, called disparity-compensated prediction (DCP) has been added as an alternative coding tool to motion-compensated prediction (MCP). MCP is also referred as Inter picture prediction that uses previously coded pictures of the same view in a different access unit (AU), while DCP refers to an Inter picture prediction that uses already coded pictures of other views in the same access unit, as shown in FIG. 1.
The video pictures and depth maps corresponding to a particular camera position are indicated by a view identifier (i.e., V0, V1 and V2 in FIG. 1). All video pictures and depth maps that belong to the same camera position are associated with a same viewID. The view identifiers are used for specifying the coding order inside the access units and detecting missing views in error-prone environments. Within an access unit (e.g., access unit 110), the video picture (112) and the associated depth map, if present, with viewID equal to 0 are coded first. The video picture and the depth map associated with viewID equal to 0 are followed by the video picture (114) and depth map with viewID equal to 1, the video picture (116) and depth map with viewID equal to 2 and so on. The view with viewID equal to 0 (i.e., V0 in FIG. 1) is also referred to as the base view or the independent view. The base view is independently coded using a conventional HEVC video coder without the need of any depth map and without the need of video pictures from any other view.
As shown in FIG. 1, motion vector predictor (MVP)/disparity vector predictor (DVP) can be derived from the inter-view blocks in the inter-view pictures for the current block. In the following, “inter-view blocks in inter-view picture” may be abbreviated as “inter-view blocks” and the derived candidate is termed as inter-view candidates (i.e., inter-view MVPs/DVPs). Moreover, a corresponding block in a neighboring view, also termed as an inter-view collocated block, is determined by using the disparity vector derived from the depth information of the current block in the current picture. For example, current block 126 in current picture 116 in view V2 is being processed. Block 122 and block 124 are located in the inter-view collocated pictures 0 and 1 (i.e., 112 and 114) respectively at the corresponding location of current block 126. Corresponding blocks 132 and 134 (i.e., inter-view collocated blocks) in the inter-view collocated pictures 0 and 1 (i.e., 112 and 114) can be determined by the disparity vectors 142 and 144 respectively.
In 3D-HEVC, inter-view residual prediction (IVRP) has been developed as a new coding tool in order to share the previously encoded residual information of reference views. The basic principle of the inter-view residual prediction is illustrated in FIG. 2. The inter-view residual prediction is based on a Disparity Vector (DV) derived for the current block (i.e., Prediction Unit, PU). The DV can be derived from the spatial or temporal neighboring blocks of the current block according to 3D-HEVC. Alternatively, a disparity derivation technique based on Motion Compensated prediction (MCP), named DV-MCP, can also be used to derive an estimated DV. In this case, blocks coded by MCP are also for the disparity derivation process. When a neighboring block is an MCP coded block and its motion is predicted by interview motion prediction, the disparity vector used for the inter-view motion prediction represents a motion correspondence between the current and the inter-view reference picture. The blocks are referred to as DV-MCP.
FIG. 2 illustrates the process of inter-view residual prediction. As shown in FIG. 2, current PU 222 of current picture 220 in a dependent view (e.g., view 1, V1) is a block to be coded. The mapping from current block 222 to corresponding block 212 based on the derived disparity vector is indicated by the dashed arrow (240). In other words, the derived disparity vector is used to located corresponding block 212 in reference-view picture 210 (i.e., view 0, V0). The reference-view picture (210) has been already coded when the current block (222) is encoded. Therefore, residual picture (230) for the reference-view picture (210) is available. A residual block (232) at the corresponding location of block 212 is then used as inter-view residual predictor for the current block (222).
The derivation of the location of reference residual block of IVRP is shown in FIG. 3. Pixel location 310 is located in block 222 in the current view and block 222 (i.e., a prediction unit, PU) is identified by its upper left corner pixel (320). The corresponding block (212) in the reference view (V0) is identified by first locating respective location 330 in the reference view (V0) corresponding to upper left corner location 320 of block 222 in the current view (V1). The DV (350) is then used to locate the corresponding upper left corner location 340 of the reference block (212) in the reference view. Accordingly, the reference block (212) in the reference view is identified. A corresponding location in the residual picture is identified as the corresponding residual block. This residual block of the reference picture is used as the predictor for a residual block of the current block in the reference view. If the disparity vector points to a sub-sample location, the residual prediction signal is obtained by interpolating the residual samples of the reference view using a bi-linear filter.
The usage of the inter-view residual prediction can be adaptively controlled at the prediction unit (PU) level. An IVRP On/Off control flag is signaled as part of the coding unit (CU) syntax when all the following conditions are true:                1. The current CU is a texture CU in a dependent view and the current slice is not an I-slice.        2. The current CU has at least one PU using motion-compensated prediction.        3. One or more transform units (TUs) covered or partially covered by the reference block in the reference view are non-Intra coded and contain at least one nonzero coded block flag (CBF).        
If the IVRP On/Off control flag is signaled as 0 or not signaled for a CU, the IVRP is Off for the CU. In other words, the residual of the CU is conventionally coded using the HEVC transform coding. Otherwise, if the IVRP On/Off flag is signaled as 1 for a CU, then each PU in the CU will determine whether to use the IVRP or not according to the reference picture type as follows:                1. If a PU only uses motion-compensated prediction, the IVRP is enabled for the PU.        2. If a PU only uses disparity-compensated prediction, the IVRP is disabled for the PU.        3. If the current PU is bi-prediction and one direction is motion compensated prediction and the other direction is disparity-compensated prediction, the IVRP is enabled. However, the reference residual used in IVRP is multiplied by ½.        
For 3D-HEVC, an advanced residual prediction (ARP) method is proposed to improve the efficiency of IVRP, where the motion of a current view is applied to the corresponding block in a reference view. Furthermore, an additional weighting factor is introduced to compensate the quality difference between different views. FIG. 4 illustrates the prediction structure of ARP in multi-view video coding. As shown in FIG. 4, block 410 represents the current block in the current view (view 1), block 420 and block 430 correspond to the representation of the current block (410) in the reference view (view 0) at time Tj and the temporal prediction of the current block (410) from the same view (view 1) at time Ti respectively. Motion vector 450 corresponds to the motion from block 410 to block 430 of view 1. Since block 410 and block 420 are actually projections of the same object in two different views, these two blocks should share the same motion information. Therefore, the reference block (440) corresponding to the temporal prediction in view 0 at time Ti for block 420 can be located from block 420 by applying motion information (460), which is the same as motion vector 450. The residues of block 420 using motion information 440 are then multiplied by a weighting factor and used as the residual predictor for current residues of block 410.
The main procedures of the ARP at the decoder side is described as follows:                1. Obtain an estimated disparity vector according to 3D-HEVC, where the estimated disparity vector points to a target reference view. The corresponding block in the referenced picture of the reference view within the same access unit is located by the estimated disparity vector.        2. Re-use the motion information of the current block to derive the motion information for the reference block in the reference view. Apply motion compensation to the corresponding block (i.e., the reference block in the reference view) based on the same motion vector of current block and the derived reference picture in the reference view for the reference block to derive a residue block for the corresponding block. The relationship among current block, corresponding block and motion compensated block is shown in FIG. 5. The reference picture in the reference view (V0) which has the same POC (Picture Order Count) value as the reference picture of current view (Vm) is selected as the reference picture of the corresponding block. Disparity vector (516) is the estimated DV for the current block (520) in view Vm to locate the corresponding block (510) in the reference view (V0). The current block 520 has motion vector 522 pointing to list0 refidx0 in Vm which can be reused by the corresponding block (510). Motion compensation can be applied to the corresponding block (510) based on the same motion vector (i.e., 522) of current block 520 to derive residue block 512.        3. Apply a weighting factor to the residue block to get a weighted residue block and add the values of the weighted residue block to the predicted samples.        
Three weighting factors are used in ARP, i.e., 0, 0.5 and 1. The one leading to minimal rate-distortion cost for the current CU is selected as the final weighting factor and the corresponding weighting index (0, 1 and 2 which correspond to weighting factor 0, 1, and 0.5, respectively) is transmitted in the bitstream at the CU level. All PU predictions in one CU share the same weighting factor. When the weighting factor is equal to 0, ARP is not used for the current CU. SUMMARY
A method for three-dimensional or multi-view video coding is disclosed. The method first receives input data associated with a current block of a current picture in a current dependent view, wherein the current block is inter-time coded based on an inter-time reference block located by a motion vector (MV), determines estimated DV (disparity vector) candidates from neighboring DVs(disparity vectors) associated with neighboring blocks of the current block, applies an evaluation function to the estimated DV candidates to obtain evaluation results for the estimated DV candidates, and selects a final estimated DV from the estimated DV candidates based on the evaluation results. The method then determines an inter-view reference region in an inter-view reference picture based on the final estimated DV, determines first pseudo residues, wherein the first pseudo residues correspond to first differences between the inter-view reference region and a pseudo reference region in a pseudo reference picture located by the MV, and wherein the inter-view reference picture and the pseudo reference picture are in a same reference view, and applying encoding or decoding to the input data associated with residues of the current block utilizing the first pseudo residues.