1. Technical Field
The present invention relates to a device and a method for estimating a depth map, and a method for generating an intermediate image and a method for encoding multi-view video using the same. More particularly, the present invention relates to a device and a method for estimating a depth map that are capable of acquiring a depth map that reduces errors and complexity, and is resistant to external influence by dividing an area into segments on the basis of similarity, acquiring a segment-unit initial depth map by using a three-dimensional warping method and a self adaptation function to which an extended gradient map is reflected, and refining the initial depth map by performing a belief propagation method by the segment unit, and achieving smoother view switching and improved encoding efficiency by generating an intermediate image with the depth map and utilizing the intermediate image for encoding a multi-view video, and a method for generating an intermediate image and a method for encoding a multi-view video using the same.
2. Related Art
As digital technology becomes more developed and different types of broadcasting media are used by fusing broadcasting with communication, new broadcasting-related additional services using the characteristics of a digital technology are created as a result thereto. A developmental direction of a TV is moving toward high resolution and a large size screen, but a TV screen itself is only two-dimensional. Therefore it is impossible to feel a three-dimensional effect through an existing screen.
A three-dimensional video processing technology, as a core technology of a future IT service field, is the most advanced technology. Competition for development of the three-dimensional video processing technology is keen with the progress to an information industrial society. Such a three-dimensional video processing technology is an essential element for providing a high-quality image service in multimedia application. Currently, the three-dimensional video processing technology has been applied to various fields such as broadcasting, medical service, education, military, a game, virtual reality, etc. in addition to the IT field. Moreover, the three-dimensional video processing technology is also established as a core fundamental technology of future real three-dimensional multimedia commonly required in various fields. Therefore, research on the three-dimensional video processing technology is actively pursued mainly in developed countries.
In general, there are two ways to define a three-dimensional video. First, the three-dimensional video may be defined as a video in which a user senses three-dimensional depth perception, by projecting a part of an image projects from a screen and by applying information on a depth to an image. Second, the three-dimensional video may be defined as a video in which the image becomes realistic to the user by providing the user with multiple views. This three-dimensional video may be classified into a stereoscopic type, a multi-view type, an IP (Integral Photography), multiple views (omni and panorama), hologram, etc. according to an acquisition method, a depth impression, a display type, etc. A method of representing the three-dimensional video generally includes an image-based representation and a mesh-based representation.
Recently, as the method of representing the three-dimensional video, a depth image-based rendering (DIBR) comes into the spotlight. The depth image-based rendering represents a method of creating scenes in different views by using reference images having information such as depth or angle differences for each corresponding pixel. The depth image-based rendering can easily render the shape of a three-dimensional model which is difficult and complex to represent and enables application of a signal processing method such as general image filtering to generate a high-quality three-dimensional video. The depth image-based rendering uses a depth image and a texture image acquired by a depth camera and a multi-view camera.
The depth image is an image representing a distance between an object positioned in a three-dimensional space and a camera photographing the object in black and white units. The depth image is mainly used for a three-dimension reconstruction technology or a three-dimensional warping technology by using depth information and camera parameters. The depth image is also applied to a free-view TV and a three-dimensional TV. The free-view TV allows a user to not view the image only in one fixed view, but view the image in a predetermined view according to user's selection. The three-dimensional TV implements a real image by adding the depth image to an existing two-dimensional TV. The three-dimensional TV has been positively researched and developed in recent years.
In order to achieve smooth view switching in the free-view TV and the three-dimensional TV, a more improved intermediate image should be generated, such that it is important to accurately estimate the depth map. A stereo matching algorithm is used to estimate the depth map. However, many errors occur in the vicinity of a pixel having a discontinuous point of a depth value when the known stereo matching algorithm is used. These errors cause a problem that a boundary between objects is duplicated or obscured in generating the intermediate image. In the known stereo matching algorithm, since a searching operation is performed in adjacent images only in a horizontal direction in order to determine a disparity value, only an image acquired in a parallel camera configuration or under a rectification process may be used as an input. Accordingly, according to this method, there is a limit in estimating the depth map for the multi-view image having various camera configurations such as the parallel camera configuration and a circular camera configuration. Moreover, the known stereo matching algorithm is suitable for a stereo image since it searches for the disparity value according to a pixel unit, but the known stereo algorithm has may errors for the multi-view image having the amount of data larger than the stereo image in the case of searching for the disparity value according to the pixel unit, thereby increasing complexity.