Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras.
FIG. 1 shows an exemplary prediction structure used in the common test conditions for 3D video coding. The video pictures and depth maps corresponding to a particular camera position are indicated by a view identifier (i.e., V0, V1 and V2 in FIG. 1). All texture pictures and depth maps that belong to the same camera position are associated with the same viewId (i.e., view identifier). The view identifiers are used for specifying the coding order within the access units and detecting missing views in error-prone environments. An access unit includes all video pictures and depth maps corresponding to the same time instant. Inside an access unit, the video picture and, when present, the associated depth map having viewId equal to 0 are coded first, followed by the video picture and depth map having viewId equal to 1, etc. The view with viewId equal to 0 (i.e., V0 in FIG. 1) is also referred to as the base view or the independent view. The base view video pictures can be coded using a conventional HEVC video coder without dependence on other views.
The example shown in FIG. 1 corresponds to a view coding order from V0 (i.e., base view) to V1, and followed by V2. The current block in the current picture being coded is in V2. According to HTM-6.0, all the MVs of reference blocks in the previously coded views can be considered as an inter-view candidate. In FIG. 1, frames 110, 120 and 130 correspond to a video picture or a depth map from views V0, V1 and V2 at time t1 respectively. Block 132 is the current block in the current view, and blocks 112 and 122 are the current blocks in V0 and V1 respectively. For current block 112 in V0, a disparity vector (116) is used to locate the inter-view collocated block (114). Similarly, for current block 122 in V1, a disparity vector (126) is used to locate the inter-view collocated block (124).
Illumination Compensation (IC)
Illumination compensation (IC) is a technique to reduce the intensity differences between views caused by the different light fields of two views captured by different cameras at different locations. In HTM, a linear IC model is disclosed by Liu et al. (“3D-CE2.h: Results of Illumination Compensation for Inter-View Prediction”, Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 2nd Meeting: Shanghai, CN, 13-19 Oct. 2012, Document: JCT3V-B0045) to compensate the illumination discrepancy between different views. Parameters in IC model are estimated for each prediction unit (PU) using available nearest reconstructed neighbouring pixels. Therefore, there is no need to transmit the IC parameters to the decoder. Whether to apply IC or not is decided at the coding unit (CU) level, and an IC flag is coded to indicate whether IC is enabled at the CU level. The flag is present only for the CUs that are coded using inter-view prediction. If IC is enabled for a CU and a PU within the CU is coded by temporal prediction (i.e., Inter prediction), the PU block is inferred to have IC disabled. The linear IC model used in inter-view prediction is shown in eqn. (1):p(i,j)=aIC·r(d+dvr,j+dvy)+bIC where(,j)∈PUc  (1)where PUc is the current PU, (i, j) is the pixel coordinate in PUc, (dvx, dvy) is the disparity vector of PUc, p(i, j) is the prediction of PUc, (r⋅,⋅) is the reference picture of PU from a neighboring view, and aIC and bIC are parameters of the linear IC model.
To estimate parameters aIC and bIC for a PU, two set of pixels as shown in FIG. 2A and FIG. 2B are used. As shown in FIG. 2A, the neighboring pixels consists of reconstructed neighboring pixels in the left column and in the above row (shown as circles) of the current CU (indicated by thick lined box), where the CU that contains the current PU. As shown in FIG. 2B, the other set of pixels correspond to neighboring pixels (shown as circles) of a reference block (indicated by thick lined box) of the current CU. The reference block of the current CU is located by using the location of the current PU and the disparity vector of the current PU.
An adaptive luminance compensation tool for inter-view video coding is disclosed by Mishurovskiy et al. (“CE2.A results on inter-view coding with adaptive luminance compensation,” Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 2nd Meeting: Shanghai, CN, 13-19 Oct. 2012, Document: JCT3V-B0031). This adaptive luminance compensation is only applied to P slices. A macroblock (MB) level flag is transmitted for a Skip MB, P16×16, P16×8, P8×16 and P8×8 MB to turn the adaptive luminance compensation On or Off.
Signaling of IC
Whether illumination compensation is used is signaled in the coding unit level. In Skip/Merge mode, ic_flag is conditionally sent depending on merge_idx and the slice segment header flag slice_ic_disable_merge_zero_idx_flag. If ic_flag is not sent in Merge mode, ic_flag is inferred to be 0. In 3D-HEVC (Three-Dimensional Video Coding based on High Efficiency Video Coding) test model, HTM-7.0, a process is used to derive a disparity vector predictor, known as NBDV (Neighboring Block Disparity Vector). The disparity vector derived from NBDV is then used to fetch a depth block in the depth image of the reference view. The fetched depth block will have the same size as the current prediction unit (PU), and it will then be used to perform backward warping for the current PU.
When merge_idx is equal to 0, the temporal inter-view motion predictor candidate is typically used. The inter-view prediction is not used very often in this case. To reduce the overhead of associated with signaling the ic_flag in this case of merge_idx being 0, the illumination compensation is not allowed. This system configuration is indicated by setting the value of a control flag (e.g., slice_ic_disable_merge_zero_idx_flag) to 1 in the slice level. For some pictures that the inter-view prediction may be frequently used, the above assumption does not hold. In this case, the merge_idx based ic_flag skipping is only applied under the condition that (POC % IntraPeriod) is not 0, where POC corresponds to Picture Order Count. This POC based decision is made by the encoder. The encoder can indicate the decision regarding whether to enable the ic_flag skipping in this case of merge_idx being 0 by sending a slice header flag (e.g., slice_ic_disable_merge_zero_idx_flag). This allows the encoder to control the condition depending on coding structure or sequences. In addition, for an inter-coded PU, illumination compensation is always disabled when Advanced Residual Prediction (ARP) is applied. Therefore, when the ARP weighting factor for an inter-coded PU is not equal to 0, the signaling of ic_flag is skipped and its value is set to 0.
Encoding of IC
According to the current HTM, the encoder decides whether the IC is enabled for the current slice/picture. The decision is made based on statistics of the pixels of the current picture and pixels of the inter-view reference picture. Therefore, the decision cannot be made until the statistics are collected, which introduces a latency of at least one slice when the IC control flag is signaled in the slice level.
In particular, the encoder will first check if there is any inter-view reference picture in the reference picture list of current slice/picture according to the current HTM. If no inter-view reference picture in the reference list, the IC will be turned Off for the current slice/picture. If at least one inter-view reference picture exists in the reference list, it will derive two histograms of pixel values based on the current picture and the inter-view reference picture. After the two histograms are derived, a summation of the absolute differences (SAD) between corresponding entries of the two histograms is calculated. If the summation of absolute differences value is larger than a predefined threshold, IC is enabled for the current slice/picture. Otherwise, IC is disabled for the current slice.
Depth Lookup Table (DLT)
Depth lookup table (DLT) has been adopted into 3D-HEVC. Very often, there are only limited values appearing in the depth component. Therefore, DLT is a compact representation of the valid values in a block. When a CU is coded in Intra simplified depth coding (SDC) mode or depth map modeling (DMM) mode, DLT is used to map the valid depth values to DLT indexes. FIG. 3 demonstrates an example of DLT representation of depth values in a picture. While the range of depth values is from 0 to 255, only 5 depth values (i.e., 50, 108, 110, 112 and 200) appear in the picture. Accordingly, the DLT consists of 5 values with indexes from 0 to 4. The DLT is signaled in the picture parameter set (PPS) and it is up to the encoder to generate the DLT.
According to the current HTM, as many as 24 or more depth pictures in a sample picture set for a view are analyzed first before the encoding process starts. All the depth values appearing in the sample picture set are included in the DLT for this view. This approach imposes a high encoding latency and cannot adapt well to the dynamic environment such as scene change.
It is desirable to develop a method for IC and/or DLT coding that does not suffer from long latency for the IC and/or DLT design at the encoder side.