Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. The multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. 3D video formats may also include depth maps associated with corresponding texture pictures. The depth maps also have to be coded to rendering three-dimensional view or multi-view.
Various techniques to improve the coding efficiency of 3D video coding have been disclosed in the field. There are also development activities to standardize the coding techniques. For example, a working group, ISO/IEC JTC1/SC29/WG11 within ISO (International Organization for Standardization) is developing an HEVC (High Efficiency Video Coding) based 3D video coding standard (named 3D-HEVC). In 3D and multi-view coding, since all cameras capture the same scene from different viewpoints, a multi-view video contains a large amount of inter-view redundancy. To share the previously encoded residual information of adjacent views, the residual signal for a current block (PU) can be predicted by the residual signal of one or more corresponding blocks, which are located by a disparity vector (DV), in the inter-view pictures.
FIG. 1 illustrates an example of advanced residual prediction (ARP) according to the current design of 3D-HEVC (HTM-9.0), where the residual signal in a current view using temporal prediction is predicted by a residual prediction signal in a reference view. The main procedures of ARP can be described as follows for the case that the current prediction unit (PU) uses temporal prediction (i.e., the reference picture is a temporal reference picture):                1. The temporal reference block (CurrRef 142) in a reference picture (140) of the current view (Vc) is located from the location of the current block (Curr 112) using a motion vector (denoted as mvLX, X=0 or 1) and the reference index of the current block in the current picture (110) of the current view (Vc).        2. The corresponding block (Base 122) in the reference picture (120) of the reference view corresponding to the current block (Curr 112) is located from the location of the current block (Curr 112) using a derived disparity vector (DV) of the current block (Curr 112).        3. The temporal reference block (BaseRef 152) for the corresponding block (Base 122) in the reference view (Vr) is located by re-using the temporal motion information (i.e., mvLX and the reference index) of the current block (Curr 112).        4. To reduce the bandwidth of memory access, the motion vector mvLX from the current block is scaled towards a fixed reference picture before performing motion compensation according to the current 3D-HEVC (HTM-9.0) standard when the weighting factor is not 0. Specifically, the fixed reference picture is defined as the first temporal reference picture of each reference picture list.        5. The residual predictor of the temporal residual signal of current PU/block can be calculated as the difference between these two respective blocks in the reference view (i.e., Base−BaseRef). In other words, the current residuals, (Curr−CurrRef) are predicted by reference residuals, (Base−BaseRef).        
When the current PU uses inter-view prediction (i.e., the reference picture is an inter-view reference picture) instead of temporal prediction, the main procedures of ARP can be described as shown in FIG. 2.                1. The inter-view reference block (Base 222) of the reference view picture (320) in the reference view (Vr) is located by the disparity motion vector (330) of the current block (Curr 212) of the current picture (210) in the current view (Vc).        2. The temporal reference block (BaseRef 252) of the inter-view reference block (Base 222) in the reference view is located using the temporal motion vector (mvLX) and reference index, where L0 motion information is used first; if L0 motion information is not available, L1 motion information is then used.        3. A current reference block (CurrRef 242) with respect to the current block in the current view is located from the location of the temporal reference block (BaseRef 252) of the inter-view reference block (Base 222) in the reference view by re-using the disparity motion vector (DMV 230) of the current block (212).        4. To reduce the bandwidth of memory access, in current 2D-HEVC (HTM-9.0), the motion vectors mvL0 (or mvL1) from the inter-view reference block (Base 222) is scaled towards a fixed reference picture before performing motion compensation when the weighting factor is not 0. The fixed picture is defined as the first temporal reference picture of each reference picture list. However, when mvL0 from Base is invalid, the motion vector mvL1 from Base will be used. If both mvL0 and mvL1 from Base are invalid, a zero vector will be used and the reference picture will be set as the first temporal reference picture of that prediction direction of current block (list 0 or list 1). A motion vector from the inter-view reference block (Base 222) may be invalid if the inter-view reference block (Base 222) has no L0 MV, or the list 0 prediction for the inter-view reference block (Base 222) is inter-view disparity compensated prediction.        5. The current inter-view prediction residuals are calculated according to (Curr−Base). The residual predictor of the inter-view residual signal of current PU/block can be calculated as the difference between these two respective blocks in a reference time, i.e., another access unit (CurrRef−BaseRef).        
In ARP, the first temporal reference picture of each reference list in the reference view (i.e., V0) is selected as the reference picture of the corresponding block as shown in FIG. 1. The motion vectors of the current PU are then scaled towards the selected reference picture of the corresponding block in the reference view to generate the predicting residual signal by performing motion compensation. When the current PU is bi-prediction coded and with identical motion for list 0 and list 1, different predicting residual signals may be generated for list 0 and for list 1 due to different reference pictures for list 0 and list 1 as shown in FIG. 3. Picture 310 corresponds to the current picture in view 1 (V1) and block 312 corresponds to the current prediction unit (PU). Disparity vector (DV) 314 of the current PU points to a corresponding block (332) in an inter-view reference picture (330) in the reference view (V0). In this example, the corresponding block (332) refers to a same reference picture in list 0 as the current block (312). However, the corresponding block (332) refers to a reference picture (350) in list 1 different from the current block (312) as shown in FIG. 3. The list-1 MV is scaled to the reference picture (350) in the opposite direction. It may result in degradation of the prediction performance by scaling a MV to the reference picture in the opposite direction when a reference picture in the same direction is available.
According to the existing standard development, the first temporal reference picture in the reference picture list is used as the reference picture for the corresponding block. Thus, even a change of reference list reordering between slices in the same picture may cause the reference picture of the corresponding block to change. Frequent reference picture changes of the base view may create a performance burden for decoders, due to DRAM (dynamic random access memory) access limitations.
For high-performance decoders, optimization of DRAM accesses is critical to efficient implementations. Newer generations of DDR (double data rate) DRAM supports much higher bandwidth. However, a system must be designed to use large burst sizes and allow longer latencies in order to achieve bandwidth efficiency. Furthermore, since the temporal reference data (such as temporal reference pictures) is too large to store on-chip, the temporal reference data usually is stored in DRAM. The decoder has to read multiple LCUs (largest coding units) worth of data in a single burst to satisfy the memory access efficiency requirement. Often, a system uses pipeline structure to achieve memory bandwidth efficiency. Accordingly, it has to fetch this data well in advance before it is used by the LCU to mitigate the long memory latency. The practice of allowing the reference picture of the corresponding block to change from slice to slice will interrupt this pipeline model. In the worst case, this practice may even cause the ultimate bottleneck on the decoder performance.
In the current 3D-HEVC (High Efficiency Video Coding (HEVC) based three-dimensional coding), there is a checking mechanism to check if the reference picture used in the current view (the first temporal reference picture in that list) has a corresponding reference picture with the same POC in the reference view to generate list X (X∈{0, 1} predicting residual signal. Besides checking the POC value, it also checks if the reference picture in the reference view is in the decoded picture buffer and marked as “used for reference” and if the reference picture is a texture picture (i.e., a non-depth picture). In addition, it also checks the view index. This process is invoked when the current slice is a P or B slice. The derivation process for the target reference index for residual prediction according to the existing 3D-HEVC standard development is illustrated as below.
The variables RpRefIdxL0 and RpRefIdxL1 corresponding to the reference picture index for ARP in list 0 and list 1 respectively are set equal to −1, the variables RpRefPicAvailFlagL0 and RpRefPicAvailFlagL1 corresponding to flags indicating whether the reference picture for ARP is available in list 0 and list 1 respectively are set equal to 0. The following procedure applies for list X, where X∈{0, 1},                When X is equal to 0 or the current slice is a B slice the following applies:For i in the range from 0 to num_ref_idx_lX_active_minus1, inclusive, the following applies:  (O-1)When PicOrderCnt(RefPicListX[i])!=PicOrderCntVal and RpRefPicAvailFlagLX==0, the following applies:  (O-2)RpRefIdxLX=i  (H-32)RpRefPicAvailFlagLX=1  (H-33)The variable RpRefPicAvailFlag is set to (RpRefPicAvailFlagL0∥RpRefPicAvailFlagL1).When RpRefPicAvailFlag is equal to 1, the following applies for X, where X∈{0,1}:  (O-3)When X is equal to 0 or the current slice is a B slice the following applies:  (O-4)For i in the range from 0 to NumActiveRefLayerPics−1, inclusive, the following applies:  (O-5)The variable refViewIdx is set equal to ViewIdx(RefPicListX[i]).  (O-6)The variable RefRpRefAvailFlagLX[refViewIdx] is set equal to 0.  (O-7)When RpRefPicAvailFlagLX is equal to 1 and there is a picture picA in the DPB (decoded picture buffer) with PicOrderCnt(picA) equal to PicOrderCnt(RefPicListX[RpRefIdxLX]), ViewIdx(picA) equal to refViewIdx, DepthFlag(picA) equal to 0 and marked as “used for reference”, RefRpRefAvailFlagLX[refViewIdx] is set equal to 1.  (O-8)        
The loop corresponding to step (O-1) checks through all indexes of the reference pictures to determine whether a reference picture in the list is available. PicOrderCntVal in step O-2 corresponds to the POC (picture order count) of the current picture. Variable num_ref_idx_lX_active_minus1 is related to the number of the reference pictures in the list. When a reference picture is available, the index of this reference picture is assigned to RpRefIdxLX as shown in step (H-32) and the available flag is set as shown in step (H-33). In the rest of the procedures (steps O-3 through O-8), the process checks if the reference picture in the reference view is in the decoded picture buffer and marked as “used for reference”, if the reference picture is a texture picture (i.e., a non-depth picture), and the view index.
As described above, the existing ARP according to HTM-9.0 has issues with memory access efficiency due to reference picture change from slice to slice. The existing ARP according to HTM-9.0 may have performance issues when a MV is scaled to the reference picture in the opposite direction while a reference picture in the same direction is available as illustrated in FIG. 3. Therefore, it is desirable to develop a method for 3D or multi-view coding to overcome these issues.