Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras. The disparity model, such as an affine model, is used to indicate the displacement of an object in two view frames. Furthermore, motion vector for frames in one view can be derived from the motion vector for respective frames in another view.
For 3D video, besides the conventional texture data associated with multiple views, depth data is often captured or derived as well. The depth data may be captured for video associated with one view or multiple views. The depth information may also be derived from images of different views. The depth data may be represented in lower spatial resolution than the texture data. The depth information is useful for view synthesis and inter-view prediction. To share the previously encoded texture information of reference views, the concept of disparity-compensated prediction (DCP) has been added as an alternative to the motion-compensated prediction (MCP). MCP refers to an Inter picture prediction that uses previously coded pictures of the same view in a different access unit, while DCP refers to an Inter picture prediction that uses already coded pictures of other views in the same access unit. The vector used for DCP is termed disparity vector (DV), which is analog to the motion vector (MV) used in MCP.
The depth-based motion vector prediction method in 3DV-ATM version 2 (Test Model for AVC based 3D Video Coding) consists of two major tools. The first tool is direction-separated motion vector prediction for Inter mode and the second tool is depth-based motion vector competition for Skip and Direct modes. The motion vector for a current block can be predicted based on motion vector prediction and the candidate motion vectors associated with neighboring blocks are used for motion vector prediction. FIG. 1A illustrates an example of MVP (motion vector predictor) derivation based on neighboring blocks, where block Cb corresponds to a current block and blocks A, B and C correspond to three spatially neighboring blocks. If the target reference picture is a temporal prediction picture, the motion vectors of the spatially neighboring blocks (i.e., blocks A, B, and C) are provided and the motion vectors are derived based on the texture data of respective blocks. If a temporal motion vector for the neighboring block is unavailable, a zero vector is used as the MV (motion vector) candidate. The temporal motion vector prediction is then derived based on the median of the motion vectors of the adjacent blocks A, B, and C.
On the other hand, if the target reference picture is an inter-view prediction picture, the inter-view motion vectors of the neighboring blocks are used to derive the inter-view motion vector predictor. In block 110 of FIG. 1B, inter-view motion vectors of the spatially neighboring blocks are derived based on the texture data of respective blocks. The depth map associated with the current block Cb is also provided in block 160. The availability of inter-view motion vector for blocks A, B and C is checked in block 120. If an inter-view motion vector is unavailable, the disparity vector for the current block is used to replace the unavailable inter-view motion vector as shown in block 130. The disparity vector is derived from the maximum depth value of the associated depth block as shown in block 170. The median of the inter-view motion vectors of blocks A, B and C is used as the inter-view motion vector predictor. The conventional MVP procedure, where a final MVP is derived based on the median of the motion vectors of the inter-view MVPs or temporal MVPs as shown in block 140. Motion vector coding based on the motion vector predictor is performed as shown in block 150.
Flowcharts of the process for the Depth-based Motion Competition (DMC) in the Skip and Direct modes according to 3DV-ATM version 2 are shown in FIG. 2A and FIG. 2B respectively. The inputs to the process include motion data 210 associated with blocks A, B and C, and depth map 220 associated with block Cb and blocks A, B and C. The block configuration of Cb, A, B and C are shown in FIG. 1A. In the Skip mode, motion vectors {mvi} of texture data blocks {A, B, C} are separated into respective temporal and inter-view groups (step 212) according to their prediction directions. The DMC is performed separately for temporal MVs (step 214) and inter-view MVs (step 222).
For each motion vector mvi within a given group (temporal or inter-view), a motion-compensated depth block d(cb,mvi) is derived, where the motion vector mvi is applied to the position of d(cb) to obtain the depth block from the reference depth map pointed to by the motion vector mvi. The similarity between d(cb) and d(cb,mvi) is then estimated according to equation (2):SAD(mvi)=SAD(d(cb,mvi),d(cb)).  (2)
The mvi that achieves the minimum sum of absolute differences (SAD) within a given group is selected as the optimal predictor for the group in a particular direction (mvpdir), i.e.
                              mvp          dir                ⁢        arg        ⁢                                  ⁢                              min                          mvp              dir                                ⁢                                    (                              SAD                ⁡                                  (                                      mv                    i                                    )                                            )                        .                                              (        3        )            
The predictor in the temporal direction (i.e., mvptmp) competes against the predictor in the inter-view direction (i.e., mvpinter). The predictor that achieves the minimum SAD can be determined according to equation (4) for the Skip mode (step 232):
                              mvp          opt                =                  arg          ⁢                                          ⁢                                    min                              mvp                dir                                      ⁢                                          (                                                      SAD                    ⁡                                          (                                              mvp                        tmp                                            )                                                        ,                                      SAD                    ⁡                                          (                                              mvp                        inter                                            )                                                                      )                            .                                                          (        4        )            
Finally, if the optimal MVP mvpopt refers to another view (inter-view prediction), the following check is applied to the optimal MVP. In the case that the optimal MVP corresponds to “Zero-MV”, the optimal MVP is replaced by the “disparity-MV” predictor (step 234) and the derivation of the “disparity-MV” predictor is shown in equation (1). The final MVP is used for Skip mode as shown in step 236.D(cb)=(1/N)ΣiD(cb(i))   (1)where i is index of pixels within current Cb, N is a total number of pixels in Cb.
The flowchart of MVP derivation for the Direct mode of B slices is illustrated in FIG. 2B, which is similar to that for the Skip mode. However, DMC is performed over both reference pictures lists (i.e., List 0 and List 1) separately (step 242). Therefore, for each prediction direction (temporal or inter-view), DMC produces two predictors (mvp0dir and mvp1dir) for List 0 and List 1 respectively (step 244 and step 254). The bi-direction compensated blocks (steps 246 and step 256) associated with mvp0dir and mvp1dir are computed according to equation (5):
                              d          ⁡                      (                          cb              ,                              mvp                dir                                      )                          =                                                            d                ⁡                                  (                                      cb                    ,                                          mvp                      ⁢                                                                                          ⁢                                              0                        dir                                                                              )                                            +                              d                ⁡                                  (                                      cb                    ,                                          mvp                      ⁢                                                                                          ⁢                                              1                        dir                                                                              )                                                      2                    .                                    (        5        )            
The SAD value between this bi-direction compensated block and Cb is calculated according to equation (2) for each direction separately. The MVP for the Direct mode is then selected from available mvpinter and mvptmp(step 262) according to equation (4). If the optimal MVP mvpopt refers to another view (i.e., MVP corresponding to inter-view prediction), the following check is applied to the optimal MVP. If the optimal MVP corresponds to “Zero-MV”, the “zero-MV” in each reference list is replaced by the “disparity-MV” predictor (step 264) and the derivation of the “disparity-MV” predictor is shown in (1). The final MVP is used for the Direct mode as shown in step 266.
As shown above, the disparity vector derivation from depth information is quite complicated for the Skip and Direct modes according to 3DV-ATM version 2. Furthermore, the disparity vector derivation from depth information is different between Inter mode and Skip/Direct mode. It is desirable to simplify the derivation process without noticeable impact on the performance.