Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. Accordingly, the multiple cameras will capture multiple video sequences corresponding to multiple views. In order to provide more views, more cameras have been used to generate multi-view video with a large number of video sequences associated with the views. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space or the transmission bandwidth.
A straightforward approach may be to simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such coding system would be very inefficient. In order to improve efficiency of multi-view video coding, typical multi-view video coding exploit sinter-view redundancy. Therefore, most 3D Video Coding (3DVC) systems take into account of the correlation of video data associated with multiple views and depth maps. The standard development body, the Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG), extended H.264/MPEG-4 AVC to multi-view video coding (MVC) for stereo and multi-view videos.
The MVC adopts both temporal and spatial predictions to improve compression efficiency. During the development of MVC, some macro block-level coding tools are disclosed, including illumination compensation, adaptive reference filtering, motion skip mode, and view synthesis prediction. These coding tools are developed to exploit the redundancy between multiple views. The multi-view/3D coding tools that utilize inter-view motion information are briefly reviewed as follows.
3D video coding is developed for encoding/decoding video of multiple views simultaneously captured by several cameras. A multi-view video contains a large amount of inter-view redundancy, since all cameras capture the same scene from different viewpoints. In 3D-AVC (3D video coding based on Advanced Video Coding (AVC) standard), Depth-based motion vector prediction (DMVP) is a coding tool which utilizes the motion information of reference views or disparity vectors for further improving the accuracy of motion vector predictors. The DMVP tool consists of two parts, direction-separated MVP (DS-MVP) for the Inter mode and disparity-based Skip and Direct modes, which are described as follows.
Direction-Separated MVP (DS-MVP)
Conventional median-based MVP of H.264/AVC is restricted to identical prediction directions (i.e., temporal or inter-view) of motion vector candidates. All available neighboring blocks are classified according to the direction of their prediction.
Inter-View Prediction
FIG. 1B illustrates an exemplary flowchart of Inter-View prediction process to derive an inter-view motion predictor. The disparity vector is derived according to 3D-AVC Test Model 7 (DmytroRusanovskyy, et al., Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 5th Meeting: Vienna, AT, 27 Jul.-2 Aug. 2013, document: JCT3V-E1003). As shown in FIG. 1A and FIG. 1B, if the current block Cb uses an inter-view reference picture, the neighboring blocks that do not utilize inter-view prediction are marked as unavailable for MVP. If the target reference picture is an inter-view prediction picture, the inter-view motion vectors of the adjacent blocks around the current block Cb, such as A, B, and C in FIG. 1A are employed in the derivation of the motion vector prediction. If motion information is not available from block C, block D is used instead.
As shown in FIG. 1B, when the target reference picture is an inter-view prediction picture, the inter-view motion vectors of the neighboring blocks are used to derive the inter-view motion vector predictor. In block 110 of FIG. 1B, inter-view motion vectors of the spatially neighboring blocks are used as input. The depth map associated with the current block Cb is also provided in block 160. The availability of inter-view motion vector for blocks A, B and C is checked in block 120. If an inter-view motion vector is unavailable, the disparity vector for the current block is used to replace the unavailable inter-view motion vector as shown in block 130. The disparity vector is derived from the maximum depth value of the associated depth block as shown in block 170. The median of the inter-view motion vectors of blocks A, B and C is used as the inter-view motion vector predictor as shown in block 140. Block D is used only when inter-view motion vector associated with C is unavailable. Inter-view motion vector coding based on the motion vector predictor is performed as shown in block 150.
Inter (Temporal) Prediction
If Cb uses temporal prediction, neighboring blocks that used inter-view reference pictures are marked as unavailable for MVP. Motion vectors of the neighboring blocks marked as unavailable are replaced with a motion vector of a corresponding block in a reference view. The corresponding block is derived by applying disparity vector, DV to the coordinates of the current texture block. The disparity vector is derived as specified according to 3D-AVC Test Model 7. If corresponding block is not coded with inter-prediction (i.e., no motion information available), a zero vector is considered. The flowchart of this process is depicted in FIG. 2A and FIG. 2B.
If the target reference picture is a temporal prediction picture, the temporal motion vectors of the adjacent blocks around the current block Cb, such as A, B, and C in FIG. 2A are employed in the derivation of the motion vector prediction. In block 210 of FIG. 2B, temporal motion vectors of the spatially neighboring blocks are provided as input. The depth map associated with the current block Cb is also provided in block 260. The temporal motion vector of a neighboring block is checked in step 220. If the temporal motion vector of a neighboring block is unavailable, an inter-view motion vector is used as shown in step 230. The inter-view motion vector is derived from the corresponding block located using a DV converted from maximum disparity as shown step 270. The motion vector prediction is then derived as the median of the motion vectors of the adjacent blocks A, B, and C as shown in step 240. Block D is used only when C is unavailable. The result is provided as the temporal MV predictor as shown in step 250.
Disparity-Based Skip and Direct Modes
When the derived motion vector predictor provides a good prediction, the motion residue may be zero or very small so that there is no need to transmit any motion information. This case is referred as a Skip or Direct mode and no motion information needs to be coded. The motion information can be derived at the encoder and decoder sides through an identical process. Therefore, only predicted block residue needs to be coded. Furthermore, the reference block associated with the motion information may provide a good prediction for a current block so that there is no need to transmit the block residue. This case is referred as a Skip mode. For the Direct mode, no motion information needs to be coded and only block residue is coded. For inter-view prediction, motion information for coding of the current block, Cb (310) in Skip/Direct modes is derived from motion information of the corresponding block (340) in the base view. The correspondence between Cb and corresponding block in the base view is established through a disparity vector (320) that is applied at the central sample (330) of block Cb shown in FIG. 3. The corresponding block (340) referenced by this vector (320) in the base view provides motion information (reference index and motion vectors) for coding of the current Cb.
The disparity derivation procedure for the Skip/Direct mode according to 3D-AVC Test Model 7 is illustrates in FIG. 4. The depth associated with a current block, Cb is received in step 410 and the maximum disparity of the depth is determined in step 420. The inter-view motion vector is then derived from the corresponding block located using a DV converted from maximum disparity as shown step 430. If the corresponding block in base view is not available, the direction-separated MVP derivation with reference index equal to zero is used. The inter-view MV is then used as Skip/Direct candidate as shown in step 440.
In the current 3D-AVC Test Model version 9.0 (ATM-9.0), motion parameters such as motion vector (MV) and reference picture index of the corresponding block in the base view or reference view are directly inherited by the current block. However, when the inherited MV is a conventional temporal MV instead of an inter-view MV and when the picture order count (POC) of the reference picture pointed by the inherited reference picture index in current view is not the same as the POC of the reference picture of the corresponding block in the base view (or reference view), it may cause misalignment between the inherited reference picture index and the inherited MV for coding current block. FIG. 5 illustrates an example of misalignment between the inherited reference picture index and the inherited MV for coding current block. In FIG. 5, current block 512 of the current picture 510 is in a dependent view and corresponding block 522 of reference picture 520 in a base view or reference view. Block 512 inherits the motion parameters 524 (motion vector and reference picture index pointing to a reference picture 530). However, the corresponding reference picture 550 as shown in a dash lined box is not in the reference picture list for the current block. The inherited reference picture index points to a reference picture 540 in list 1 for the current block, which has a different POC from the inherited reference picture (530). Likewise, when the POC of the reference picture of the reference block used to derive motion parameter does not match any reference picture in the reference picture list of the direction (list0 or list1) of the current picture, it also causes misalignment between the inherited reference picture index and the inherited MV for coding current block.
Similarly, when the inherited MV is an inter-view MV, which reuses the inherited reference picture index and MVs may cause misalignment between the inherited reference picture index and the inherited MV for coding current block. This is because the view index of the reference picture pointed by the inherited reference picture index in the current view may not be identical to the view index of the reference picture of the corresponding block in the base view (or reference view). Even when the view index of the reference picture pointed by the inherited reference picture index in current view is identical to the view index of the reference picture of the corresponding block in the base view (or reference view), reusing the inherited MVs (which is DVs) directly will also cause misalignment because the inherited MVs/DVs should be scaled according to the view distance.
It becomes even worse when the inherited reference picture index is not valid for current block. FIG. 6 illustrates an example where the inherited reference picture index may exceed the maximum reference picture index of the current block. In FIG. 6, current block 612 of the current picture 610 is in a dependent view and corresponding block 622 of reference picture 620 in a base view or reference view. Block 612 inherits the motion parameter 624 including the motion vector and reference picture index equal to 1. However, the inherited reference picture index, corresponding reference picture 640 as shown in a dash lined box, is beyond the last picture in the reference picture list for the current block (In this example, the size of the reference picture list for the current block is 1, therefore, the maximum reference picture index is 0). In this case, the encoder/decoder may crash due to memory fault related to the invalid reference picture index. Therefore, it is desirable to design a system that can avoid these issues.