Three dimensional (3D) video makes people see different scenarios with slight direction differences for their left and right eyes respectively; thus it can provide a viewing experience with depth perception compared to the conventional 2D video. At present, the common 3D video display is stereoscopic display, which provides two views of videos. With the development of multimedia technology, multi-view display becomes more and more popular in multimedia information industry due to its 3D visual perception for naked eyes. However, the increase of views will multiply the video data and in turn generate a great burden to the transmission and storage. An effective coding strategy needs to solve these problems. The up-to-date 2D video coding standard is the High Efficiency Video Coding (HEVC) standard and was officially approved in 2013. Meanwhile, 3D video coding standardization has been in progress.
Multi-view plus depth (MVD) format in 3D-HEVC includes two or three texture videos and their corresponding depth maps, as shown in FIG. 1. Depth maps can be seen as a set of greyscale image sequences and the luminance component is used to denote the depth value. Therefore, depth maps can effectively present the object location in the three dimensional space. With depth values, depth image based rendering (DIBR) technology can be applied to synthesize virtual views between the original two or three views and can effectively solve the large data problems due to the increase of the views.
Conventional video coding methods use rate distortion optimization techniques to make decision for the motion vector and mode selection process by choosing the vector or mode with the least rate distortion cost. The rate distortion cost is calculate by J=D+λ·R, where J is the rate distortion cost, D is the distortion between the original data and the reconstructed data, λ is the Lagrangian multiplier, and R is the number of bits used. D is usually measured by calculating the sum of squared differences (SSD) or the sum of absolute differences (SAD) between the original data and reconstructed data of current video. While depth maps are only used to synthesize virtual views and cannot be seen by the audiences directly, it may not achieve the satisfactory coding results using the conventional video coding method for depth maps. The distortion measure for depth maps need to also consider distortions in the synthesized intermediate views.
In 3D-HEVC, Synthesized View Distortion Change (SVDC) is used as the distortion calculation metric for rate distortion optimization in depth coding. SVDC defines the distortion difference between two synthesized textures, as shown in FIG. 2. S′T,Ref denotes a reference texture rendered by the original texture and the original depth map. S′T denotes a texture rendered from the reconstructed texture and a depth map SD, which is composed of the encoded depth map in encoded blocks and the original depth map in other blocks. {tilde over (S)}T denotes a texture rendered from the reconstructed texture and a depth map {tilde over (S)}D which differs from SD where it contains the distorted depth map for the current block. SSD denotes the process of calculating the sum of squared differences. Therefore, the rate distortion optimization for depth maps in 3D-HEVC includes the view synthesis process.
The original SVDC method includes warping, interpolating, hole filling and blending to get the synthesized views. Then the encoder compares the two virtual views synthesized by the original depth maps and the encoded depth maps, respectively. Finally, the sum of squared differences of each synthesized pixel will be calculated. The whole process of the SVDC method is shown in FIG. 3. It brings high coding complexity due to its pixel-by-pixel rendering operation with the increase of video resolution. This present disclosure provides a fast rate distortion optimization method based on texture flatness for depth map coding in 3D-HEVC to decrease the coding complexity and basically maintain the quality of synthesized views.