1. Field
The following description relates generally to video coding, and more particularly, to a method and apparatus for encoding and decoding a depth map for images, and a 3D video coding method using the same.
2. Description of the Related Art
Due to increased interest in realistic media, research into the realistic media is being actively conducted. Realistic media allows users to see, hear and feel virtual environments realistically as if they are real worlds. Through realistic media, users can feel the same senses of reality and immersion as in the real world, in a virtual environment. Realistic media is expected to be applied to various fields including advertising, exhibition, education, medical treatment, etc., as well as broadcasting and communications fields.
An example of the realistic media is multi-view pictures. Multi-view pictures include images that are acquired from multiple view-points with respect to a scene. A user can observe an object at various angles through multi-view images of the object. Representative video coding for multi-view images includes Multi-view Video Coding (MVC) and 3D Video Coding.
MVC has been developed to efficiently encode/code multiple view images acquired from a plurality of cameras (for example, 4 or 8 cameras). MVC provides users with only the view-points of the cameras used for image capturing. Accordingly, in order for the MVC to provide multi-view images acquired from more view-points using encoded image data, use of more cameras having different view-points is inevitable. However, increasing the number of cameras increases the amount of image data required to be processed, and since existing delivery media (broadcasting media, communication media, storage media, etc.) have bandwidth-restrictions or storage capacity limitations, MVC can deliver only a limited number of view-points to users, without being able to provide any images with view-points that are different from view-points of the cameras.
3D video coding has been developed to overcome the drawback of MVC. 3D video coding supports creation of images with virtual view-points, as well as images with cameras' view-points. For the creation of images with virtual view-points, 3D video coding codes a depth map in addition to coding images with N view-points. Images with virtual view-points may be created by performing view-interpolation on images acquired from cameras having adjacent view-points and depth maps corresponding to the acquired images. 3D video coding supports a 3D display system such as Free View-Point Television (FTV).
FIG. 1 is a view for explaining an example of rendering through a FTV system to which 3D video coding is applied. In the FTV system illustrated in FIG. 1, 5 cameras are used. Upon coding, the FTV system encodes image information acquired from the 5 cameras and depth information of each image. Then, the FTV system decodes the encoded image information and depth information, and then uses the decoded image information and depth information for each camera's view-point to render an image with the camera's view-point and images with different view-points, that is, images with arbitrary view-points that are within a fan-shaped region (see FIG. 1).
In summary, 3D video coding, which can be applied to FTV systems, does not need all multi-view images to be acquired through cameras. Accordingly, while providing multi-view images with more view-points, 3D video coding is free from bandwidth-restrictions or storage capacity limitations, in comparison to MVC. Furthermore, 3D video coding makes it possible to provide images with a user's desired view-points without particular restrictions.
However, MVC encodes/decodes only image information of images with specific view-points, whereas 3D video coding has to encode/decode depth maps as well as image information. That is, 3D video coding further requires processes to create and encode/decode depth maps and to reconstruct images with virtual view-points using the depth maps. Most research that is currently in progress in association with 3D video coding is focused on creation of depth maps, rather than encoding/decoding of depth maps. The reason is because it has been considered that encoding/decoding of depth maps can be sufficiently covered by existing methods (hereinafter, referred to as image information coding methods) to code image information (brightness, chrominance, etc.).
A depth map is a map which represents distances from a camera to objects at a certain view-point. The distances from the camera to objects may depend on the locations of the objects, that is, the spatial locations of the objects in the corresponding image. The depth map may also be represented in units of pixels, like image information. For example, a depth map may be created by expressing the distances from a camera to objects with a predetermined amount of bits at the resolution of a current image.
However, since the real distances from a camera to objects may have great variations depending on frames, the distances from objects represented by individual pixels to the camera, that is, depth information may be expressed by relative values, not by absolute values. For example, after measuring the nearest point Znear and furthest point Zfar from objects represented by individual pixels to the camera on the same frame, depth information is expressed by relative values that are defined within the nearest and furthest points Znear and Zfar. For example, if depth information is expressed by 8 bits, a pixel displaying an object at the nearest point Znear from the camera is set to have a value of “255”, a pixel displaying an object at the furthest point Zfar from the camera is set to have a value of “0”, and pixels displaying objects at all other distances between the nearest and furthest points Znear and Zfar are set to have predetermined values between “0” and “255” based on their distance to the camera.
In this manner, a depth map represents distance information based on the distances between real objects and the camera. Meanwhile, in existing video encoding/decoding, only image information, such as brightness or chrominance information, RGB values and so on, is subjected to encoding/decoding. Depth information and image information for a certain image may show similar characteristics in consideration of the fact that objects displayed on the image may be maintained constant in distance to a camera, in brightness, in color, etc., but there are many cases where depth information has little relevancy to image information. For example, objects (or different sides of an object) that are represented with different values of image information may be represented with the same or similar depth information. On the other hand, objects (or different sides of an object) that are represented with the same image information, such as brightness or chrominance, may be represented with different values of depth information.
However, since existing video coding methods have been developed with the aim of efficient compression in consideration of the characteristics of image information, they are not easily applied to efficiently encode depth maps in which the depth information has different characteristics to the image information. When reconstructing an image with a different view-point from a camera's view-point using a depth map, the accuracy of the depth map may have a direct influence on the quality of the finally reconstructed image. For these reasons, in order to maximize the advantage of 3D video coding against MVC, there is a need to develop a method of encoding/decoding depth maps efficiently in consideration of the unique characteristics of depth maps.