Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. The multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. 3D video formats may also include depth maps associated with corresponding texture pictures. The depth maps also have to be coded to rendering three-dimensional view or multi-view.
Various techniques to improve the coding efficiency of 3D video coding have been disclosed in the field. There are also development activities to standardize the coding techniques. For example, a working group, ISO/IEC JTC1/SC29/WG11 within ISO (International Organization for Standardization) is developing an HEVC (High Efficiency Video Coding) based 3D video coding standard (named 3D-HEVC). In 3D coding, since all cameras capture the same scene from different viewpoints, a multi-view video contains a large amount of inter-view redundancy. To share the previously encoded residual information of adjacent views, the residual signal for current block (PU) can be predicted by the residual signal of the corresponding blocks, which are located by a DV, in the inter-view pictures.
FIG. 1 illustrates an example of advanced residual prediction (ARP) according to the current design of 3D-HEVC (HTM-9.0), where the residual signal in a current view using temporal prediction is predicted by a residual prediction signal in a reference view. The main procedures of ARP can be described as follows for the case that the current prediction unit (PU) uses temporal prediction (i.e., the reference picture is a temporal reference picture):    1. The temporal reference block (CurrRef 142) in a reference picture (140) of the current view (Vc) is located from the location of the current block (Curr 112) using motion vector (denoted as mvLX, X=0 or 1) and the reference index of the current block in the current picture (110) of the current view (Vc).    2. The corresponding block (Base 122) in the reference picture (120) of the reference view corresponding to the current block (Curr 112) is located from the location of the current block (Curr 112) using a derived disparity vector (DV) of the current block (Curr 112).    3. The temporal reference block (BaseRef 152) for the corresponding block (Base 122) in the reference view (Vr) is located by re-using the temporal motion information (i.e., mvLX and the reference index) of the current block (Curr 112).    4. To reduce the bandwidth of memory access, the motion vector mvLX from the current block is scaled towards a fixed reference picture before performing motion compensation according to the current 3D-HEVC (HTM-9.0) standard when the weighting factor is not 0. Specifically, the fixed reference picture is defined as the first temporal reference picture of each reference picture list.    5. The residual predictor of the temporal residual signal of current PU/block can be calculated as the difference between these two blocks in the reference view (i.e., Base−BaseRef). In other words, the current residual (Curr−CurrRef) is predicted by residual predictor (Base−BaseRef).
FIG. 2 illustrates an example of the ARP derivation for a PU/block coded using temporal prediction.    1. The current temporal prediction residual (Curr residual 210) is formed between the current block (Curr 112) and the corresponding temporal reference block (CurrRef 142).    2. The residual prediction signal (Residual pred 220) is formed between the corresponding signals in reference view (Base 122) and the corresponding temporal reference block (BaseRef 152).    3. The final residual is derived from the difference between the current residual and the residual prediction signal.
When the current PU uses inter-view prediction (i.e., the reference picture is an inter-view reference picture) instead of temporal prediction, the main procedures of ARP can be described as shown in FIG. 3.    1. The inter-view reference block (Base 322) of the reference view picture (320) in the reference view (Vr) is located by the disparity motion vector (330) of the current block (Curr 312) of the current picture (310) in the current view (Vc).    2. The temporal reference block (BaseRef 352) of the inter-view reference block (Base 322) in the reference view is located using the temporal motion vector (mvLX) and reference index, where L0 motion information is firstly used; if L0 motion information is not available, L1 motion information is then used.    3. A corresponding reference block (CurrRef 342) in the current view is located from the location of the temporal reference block (BaseRef 352) of the inter-view reference block (Base 322) in the reference view by re-using the disparity motion vector (330) of the current block (312).    4. To reduce the bandwidth of memory access, in current 3D-HEVC (HTM-9.0), the motion vectors mvL0 (or mvL1) from the inter-view reference block (Base 322) is scaled towards a fixed reference picture before performing motion compensation when the weighting factor is not 0. The fixed picture is defined as the first temporal reference picture of each reference picture list. However, when mvL0 from Base is invalid, the motion vector mvL1 from Base will be used. If both mvL0 and mvL1 from Base are invalid, a zero vector will be used and the reference picture will be set as the first temporal reference picture of that prediction direction of current block (list 0 or list 1). A motion vector from the inter-view reference block (Base 322) may be invalid if the inter-view reference block (Base 322) has no L0 MV, or the list 0 prediction for the inter-view reference block (Base 322) is inter-view disparity compensated prediction.    5. The residual predictor of the inter-view residual signal of current PU/block can be calculated as the difference between these two blocks in a reference time, i.e., another access unit (CurrRef−BaseRef).
FIG. 4 illustrates an example of the ARP derivation for a PU/block using inter-view prediction.    1. The current inter-view prediction residual (Curr residual 410) is formed between the current block (Curr 312) and the inter-view reference block (Base 322).    2. The residual prediction signal (Residual pred 420) is formed between the corresponding reference block (CurrRef 342) in current view and the temporal reference block (BaseRef 352) of the inter-view reference block (Base 322) in the reference view.    3. The final residual is derived from the difference between the current residual and the residual prediction signal.
FIG. 5 illustrates a pictorial example of the ARP derivation for a PU/block using temporal prediction. Block 510 represents the current block in the current view (i.e., view 1), block 520 and block 530 denote the representation of current block 510 in the reference view (view 0) at time Tj and temporal prediction of current block 510 from the same view (view 1) at time Ti respectively. Motion vector 550 denotes the motion from current block 510 to block 530 at time Ti from the same view. Since current block 510 in view 1 and corresponding block 520 in view 0 represent projections of the same object in two different views, these two blocks should share the same motion information. Therefore, temporal prediction block 540 in view 0 at time Ti of corresponding block 520 in view 0 at time Tj can be located from corresponding block 520 in view 0 by applying the motion information of motion vector 550 (i.e., MV 560=MV 550). The residue (i.e., 540) of corresponding block 520 is then multiplied by a weighting factor and is used along with the corresponding block (i.e., 520) to form the predictor for current block (i.e., 510).
FIG. 6 illustrates an example of ARP bi-prediction mode having two different motion vectors (L0 MV 640 and L1 MV 650) for the current block (612) of the current picture (610) in a current view (V1). Motion information of the current PU (612) in the current picture of the current view is applied to the corresponding block (662) in the reference picture (660) of the reference view (V0). The predicting residual signal is then generated by performing motion compensation based on the L0 and L1 motion information of the current PU. The first residual is generated between the corresponding block (662) and the L0 reference block (672) of reference picture (670) in the reference view (V0) using L0 MV. The second residual is generated between the corresponding block (662) and the L1 reference block (682) of reference picture (680) in the reference view (V0) using L1 MV. Only one clipping operation is employed during the generation of L0 and L1 predicting signal. The L0 and L1 motion compensated prediction signal is respectively generated by an interpolation process without any clipping operation. The interpolated signal is then added by the predicting residual signal generated by ARP. In the final stage, the L0 and L1 prediction signal are added up followed by a clipping operation and are output as the final result. In the ARP scheme according to the current 3D-HEVC (i.e., HTM-9.0), two clipping operations are applied during the generation of L0 or List predicting signal when current PU is a uni-prediction PU. The L0 or L1 motion compensated prediction signal is firstly generated by an interpolation process followed by a clipping operation (clipping to a valid range of input bit depth). The clipped signal is then added by the predicting residual signal generated by ARP and followed by the second clipping operation to be output as the final result.
Moreover, in ARP, the first temporal reference picture of each reference list in the reference view (i.e., V0) is selected as the reference picture of the corresponding block as shown in FIG. 6. The motion vectors of the current PU are then scaled towards the selected reference picture of the corresponding block in the reference view to generate the predicting residual signal by performing motion compensation. FIG. 7 illustrates the case when the current PU is bi-prediction with identical motion vector (740) for L0 and L1. Different predicting residual signals may be generated for L0 and L1 due to different reference pictures for L0 and L1. In this example, the L1 MV is scaled to point to the reference block (782) in the reference picture (780) in the opposite direction. It may result in degradation of the prediction performance by scaling a MV to the reference picture in the opposite direction when a reference picture in the same direction is available.
According to the current 3D-HEVC (i.e., HTM-9.0), if the two motion vectors for a bi-prediction mode are the same, the bi-prediction mode cannot be simplified as a uni-prediction mode since the uni-prediction mode would not generate the same result as the bi-prediction mode. It is desirable to develop a new ARP procedure that can take advantage of uni-prediction as a simplified bi-prediction if the two motion vectors are the same.