The present disclosure relates to a hole filling method using estimated spatio-temporal background information, and a recording medium and apparatus for performing the same.
Due to industrial and academic development of the three-dimensional (3D) video field, much research has been conducted on various systems and display devices that provide 3D content. Additionally, further research has also been conducted regarding system and display devises that enable a user to experience virtual reality without equipment, such as 3D glasses.
For this, a method of composing a virtual-viewpoint image through depth-image-based rendering, in which a virtual-viewpoint image is composed using actual viewpoint images, has been proposed to provide a free viewpoint. Depth-image-based rendering uses 3D warping, and thus holes are created in the virtual-viewpoint image. In this case, a small hole is created due to an estimation error of a depth value, while a large hole is created by a region exposed in an actual viewpoint image being hidden in a virtual-viewpoint image.
An interpolation method and an in-painting method have been proposed as representative methods of filling such holes.
However, according to the interpolation method, geometric distortion and blurring occur along a boundary between a background region and a foreground region, and the blurring becomes more severe as the hole region increases in size.
On the other hand, the in-painting method is utilized to fill holes by using characteristics of an unknown region and a neighboring known region in images sharing similar statistical properties or geometric structures.
It has been found that the in-painting method is capable of effectively filling holes in combination with depth information for distinguishing a background region and a foreground region. However, the in-painting method has a limitation with respect to hole filling performance when there is restricted information regarding a background region and a foreground region in a hidden region.
Therefore, in order to create a satisfactory composite image from a virtual viewpoint, accuracy of separation between a background region and a foreground region in a hidden region is very important.
Various studies have been conducted on performing a hole filling process using temporal information to separate a foreground region and a background region in a hidden region.
For example, a method of determining a global threshold by using a background sprite of a depth image and separating a background region and a foreground region by using the global threshold has been proposed.
The method includes passively selecting a hole filling action for the separated foreground region and background region and applying an in-painting action thereto, and thus has a large variation in hole filling performance depending on an in-painting order.
As another example, a method of estimating global movement between consecutive frames in a group of pictures (GOP) and determining an in-painting order thereof by using updated temporal information has been proposed.
The method generates a frame delay because a display order and a hole filling order thereof are different, and generates a serious geometric distortion when movements of objects present in a frame are different.
Also, recent methods of estimating consistent temporal background information and applying the estimated temporal background information to a hole filling process have been proposed to enhance accuracy of separation of background and foreground regions.
As an example, a method of estimating a background region between consecutive virtual viewpoint images by using depth-image-based structural similarity and utilizing background information in the in-painting process has been proposed.
As another example, a Gaussian mixture model has been proposed for estimating a background sprite in a depth image.
The methods have limitations in estimating background regions present in preceding images, and thus also have a problem of there being a limitation in using in-painting to generate a satisfactory virtual viewpoint image.
Accordingly, spatio-temporal information also needs to be considered to enhance accuracy of background information.
Similarity between textures, depth images, frames, or the like is used to estimate temporal background information.
A codebook is utilized to detect codewords corresponding to texture and depth information to estimate temporal similarity information.
However, the codebook has limitations in effectively and quickly estimating background pixels present in a preceding image because a fixed threshold value is used to select a codeword corresponding to background information.