According to the current 3D Video Coding (3DVC) standard established by the Moving Picture Experts Group (MPEG), it wishes to perform multi-view 3D visual effects under the traffic restriction of current digital transmission environment. Comparing 3DVC with multi-view video coding (MVC), 3DVC do not have to record huge views information and build multiple views by view synthesis to save large amount of data.
The overall structure of 3DVC would like to utilize mainly the so-called “view synthesis” way to synthesize multiple virtual-view images by only using the texture images (real-view image) of few frames and the corresponding depth maps of these few frames within texture images. Take the Depth Image Based Rendering (DIBR) algorithm as an example, DIBR could use three groups (real-view images plus the respective corresponding depth maps) of information, to produce nine different view images that including real-view images and virtual-view images. No matter the audience viewing from which angel, the three-dimensional image can be viewed with just let the left eye and the right eye receive the corresponding view image respectively.
Texture image is a real-view image that camera shots, but the depth map may regard as the corresponding 8 bits grey level image. The pixel values of depth map (between 0 and 255) represent the distance of objects in the scene from the video camera. Depth map show the relationships between objects in the spatial coordinates, which is independent of the actual texture information of the object itself.
For example, if we define texture images: the pixels correspond to larger depth values (lighter in color) will be attributed to the foreground object, and the pixels correspond to smaller depth values (darker in color) will be attributed to the background. It can be simplified to explain, the view synthesis process could be looked as how many distance ran to virtual-view images because of the so-called “view warping” of pixels in the real-view images, and each pixel of texture image warps how many distance is decided by the pixel value of corresponding pixel coordinate of the depth map, which could be called for short as “depth value.” Under the doctrine of view synthesis theory, the greater the depth value of the depth map of the corresponding texture image pixel, the larger the pixel warping offset will also be.
During the process of view synthesis, the larger depth value pixels warp more distance and the smaller depth value pixels warp less distance. Because the offsets of warping are different, it may cause the result that there are some pixels in the virtual-view image have no value. We may call these empty pixels—“hole.” For example, in general the hole information could be marked as so-called “hole mask” in the corresponding pixel coordinate. The following procedure will then take the hole information as references to process the hole filing algorithm. In general, when comparing the foreground/background regions between textual image and depth map, if the results do not match, some boundary noise will be formed in the synthesized image.