Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras.
FIG. 1 shows an exemplary prediction structure used in the common test conditions for 3D video coding. The video pictures and depth maps corresponding to a particular camera position are indicated by a view identifier (i.e., V0, V1 and V2 in FIG. 1). All texture pictures and depth maps that belong to the same camera position are associated with the same viewId (i.e., view identifier). The view identifiers are used for specifying the coding order within the access units and detecting missing views in error-prone environments. An access unit includes all video pictures and depth maps corresponding to the same time instant. Inside an access unit, the video picture and, when present, the associated depth map having viewId equal to 0 are coded first, followed by the video picture and depth map having viewId equal to 1, etc. The view with viewId equal to 0 (i.e., V0 in FIG. 1) is also referred to as the base view or the independent view. The base view video pictures can be coded using a conventional HEVC video coder without dependence on other views.
The example shown in FIG. 1 corresponds to a view coding order from V0 (i.e., base view) to V1, and followed by V2. The current block in the current picture being coded is in V2. According to HTM-6.0, all the MVs of reference blocks in the previously coded views can be considered as an inter-view candidate even if the inter-view pictures are not in the reference picture list of current picture. In FIG. 1, frames 110, 120 and 130 correspond to a video picture or a depth map from views V0, V1 and V2 at time t1 respectively. Block 132 is the current block in the current view, and blocks 112 and 122 are the current blocks in V0 and V1 respectively. For current block 112 in V0, a disparity vector (116) is used to locate the inter-view collocated block (114). Similarly, for current block 122 in V1, a disparity vector (126) is used to locate the inter-view collocated block (124).
Illumination compensation (IC) is a technique to reduce the intensity differences between views caused by the different light fields of two views captured by different cameras at different locations. In HTM, a linear IC model is disclosed by Liu et al. (“3D-CE2.h: Results of Illumination Compensation for Inter-View Prediction”, Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 2nd Meeting: Shanghai, CN, 13-19 Oct. 2012, Document: JCT3V-B0045) to compensate the illumination discrepancy between different views. Parameters in IC model are estimated for each PU using available nearest reconstructed neighbouring pixels. Therefore, there is no need to transmit the IC parameters to the decoder. Whether to apply IC or not is decided at the coding unit (CU) level, and an IC flag is coded to indicate whether IC is enabled at the CU level. The flag is present only for the CUs that are coded using inter-view prediction. If IC is enabled for a CU and a PU within the CU is coded by temporal prediction (i.e., Inter prediction), the PU block is inferred to have IC disabled. The linear IC model used in inter-view prediction is shown in eqn. (1):p(i,j)=aIC·r(i+dvx,j+dvy)+bIC where (i,j)ϵPUc  (1)
where PUc is the current PU, (i, j) is the pixel coordinate in PUc, (dvx, dvy) is the disparity vector of PUc, p(i, j) is the prediction of PUc, r(·,·) is the reference picture of PU from a neighboring view, and aIC and bIC are parameters of the linear IC model.
To estimate parameters aIC and bIC for a PU, two set of pixels as shown in FIG. 2A and FIG. 2B are used. As shown in FIG. 2A, the neighboring pixels consists of reconstructed neighboring pixels in the left column and in the above row (shown as circles) of the current CU (indicated by thick lined box), where the CU that contains the current PU. As shown in FIG. 2B, the other set of pixels correspond to neighboring pixels (shown as circles) of a reference block (indicated by thick lined box) of the current CU. The reference block of the current CU is located by using the location of the current PU and the disparity vector of the current PU.
An adaptive luminance compensation tool for inter-view video coding is disclosed by Mishurovskiy et al. (“CE2.A results on inter-view coding with adaptive luminance compensation,” Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 2nd Meeting: Shanghai, CN, 13-19 Oct. 2012, Document: JCT3V-B0031). This adaptive luminance compensation is only applied to P slices. A macroblock (MB) level flag is transmitted for a Skip MB, P16×16, P16×8, P8×16 and P8×8 MB to turn the adaptive luminance compensation On or Off.
In HTM-6.0, when the Inter prediction is uni-prediction, the clipping is first applied to the interpolated pixel output (i.e., predSamplesL0[x][y] and predSamplesL1[x][y] for L0 (reference list 0) and L1 (reference list 1) respectively) from the DCT-based interpolation filter (DCTIF) to clip the pixel value to a valid range as shown in eqn. (2) and eqn. (3) for L0 and L1 respectively. The interpolated pixels correspond to motion-compensated reconstructed pixels.clipPredVal=Clip3(0,(1<<bitDepth)−1,(predSamplesL0[x][y]+offset1)>>shift1),  (2)clipPredVal=Clip3(0,(1<<bitDepth)−1,(predSamplesL1[x][y]+offset1)>>shift1),  (3)
where Clip3(a,b,c) is a clipping function that clips the value c between a and b. offset1, shift1 are rounding factors. As shown in eqn. (2) and eqn. (3), the interpolated pixel output is clipped to the range from 0 to the largest value that can be represented by bitDepth bits (i.e., (1<<bitDepth)−1).
The clipped pixels of the reference block are then processed by the illumination compensation according to the linear IC model to clip the value to a valid range. After the IC operations, the pixel values are again clipped to a valid range as shown in eqn. (4) and eqn. (5) for L0 and L1 respectively and the clipped results are used as predictors.predSamples[x][y]=!puIcFlagL0?clipPredVal: (Clip3(0,(1<<bitDepth)−1,(clipPredVal*icWeightL0)>>icShiftL0)+icOffsetL0)  (4)predSamples[x][y]=!puIcFlagL1?clipPredVal: (Clip3(0,(1<<bitDepth)−1,(clipPredVal*icWeightL1)>>icShiftL1)+icOffsetL1)  (5)
where s=t?u:v means if t is TRUE (or equal to 1), s=u; otherwise s=v. According to eqn. (4) and (5), when the IC flag for the PU (i.e., puIcFlagL0 and puIcFlagL1) is not TRUE, the clipped values based on eqn. (2) and (3) are used directly; otherwise the clipped values based on eqn. (2) and (3) are further illumination compensated and clipped. Therefore, when IC flag is TRUE, the reconstructed reference pixels will be clipped twice (i.e., in eqn, 2&4 or in eqn. 3&5).
For the bi-prediction case, the interpolated pixels for each list (i.e., L0 or L1) are not clipped nor the illumination compensated outputs as shown in eqn. (6) and eqn. (7). The illumination compensated pixels for each list are then averaged and clipped into a valid range as shown in eqn. (8).predVal0=!puIcFlagL0?predSamplesL0[x][y]: ((predSamplesL0[x][y]*icWeightL0)>>icShiftL0)+(icOffsetL0<<shift1))  (6)predVal1=!puIcFlagL1?predSamplesL1[x][y]: ((predSamplesL1[x][y]*icWeightL1)>>icShiftL1)+(icOffsetL1<<shift1))  (7)predSamples[x][y]=Clip3(0,(1<<bitDepth)−1,(predVal0+predVal1+offset2)>>shift2)  (8)
where icWeightL0, icShiftL0, icOffsetL0, icWeightL1, icShiftL1 and icOffsetL1 are IC parameters.
As shown above, the derivation process for the inter-view prediction with IC enabled is different between uni-prediction and bi-prediction. When motion information associated with the bi-prediction refers to a same reference block, an encoder may simplify the derivation process by using the uni-prediction instead of bi-prediction. If the decoder still use bi-prediction, the encoder and decoder may have different predictions. In this case, one side may have uni-prediction and the other side may have bi-prediction. When the motion information associated with bi-prediction points to a same reference block, it is computationally more efficient to use the uni-prediction instead of the bi-prediction. The uni-prediction can be done by converting the original bi-prediction into uni-prediction using only list 0 prediction corresponding to the original list 0 MV and reference index. Due to the different IC derivation process for uni-prediction and bi-prediction, a coding system may run into mismatch issue when the motion information associated with bi-prediction refers to a same reference block. It is desirable to take advantage of the reduced computational complexity of uni-prediction without causing mismatch issue when motion information associated with bi-prediction refers to a same reference block.