1. Field of the Invention
The present invention relates to a method and apparatus for interpolating a reference pixel in an annular image and for encoding/decoding an annular image.
2. Description of the Related Art
With the development of various digital technologies relating to hardware and software, the age where independent communication media are used has gone, and the Ubiquitous Age where people can receive online any service anywhere and at any time is coming. Users in the Ubiquitous Age desire to freely obtain and use information as they use parts of their bodies. As a prelude, interactive broadcasting and three-dimensional (3D) broadcasting are being researched and developed more actively than ever before. For interactive broadcasting and 3D broadcasting, 3D video camera systems such as stereoscopic camera systems, omnidirectional video systems, or multiview camera systems are required.
To standardize the compression of a 3D image obtained from a 3D video camera system, the Moving Picture Expert Group (MPEG) has established 3D Audio Visual (3DAV) Exploration Experiments (EE)-1 through EE-4, which are currently under research. In 3DAV EE-1, research on omnidirectional-video compression is being performed, but only the topic of image transformation is handled.
Because conventional video compression methods such as MPEG-1, MPEG-2, MPEG-4, H.263, and H.264 have been designed for two-dimensional (2D) images, they cannot be applied to the compression of 3D images, in particular, omnidirectional-video compression. Unlike a general picture, an annular image created using a hyperboloid mirror includes 360° of information and has a unique circular distortion. Due to the characteristic of the annular image, the use of a conventional 2D video coding algorithm for an annular image would lead to degradation in the efficiency of prediction and compression of the image.
An annular image is obtained by reflecting an image off a mirror and capturing the reflected image by a camera in a mirror-based camera system using omnidirectional cameras. An annular image includes 360° of whole view information. FIG. 1 illustrates an example of an annular image. The annular image can be captured using an omnidirectional image sensor which can receive 360°-whole view information from the center of projection unlike conventional image sensors having a limited field of view (FOV).
If a codec for a general 2D image is applied to an annular image, the efficiency of a prediction algorithm such as intraprediction and interprediction using spatial and temporal correlations is degraded due to the spatial distortion resulting from the characteristic of the annular image. This is because the degree of distortion of the annular image is greater than that of the 2D image. For example, when an object moves vertically as shown in FIGS. 2A and 2B, as long as a distance between a camera and the object is maintained constant, the shape of the object is not distorted in a 2D image even when the object moves. However, the shape of the object is easily distorted due to the characteristic of a hyperboloid mirror in an omnidirectional camera.
Since the annular image spatially has a distortion ratio similar to number π, the spatial correlation degrades, causing a significant reduction in the efficiency in interprediction. Since the shape of the object is not maintained constant, but is severely distorted by a temporal movement as shown in FIGS. 2A and 2B, the temporal correlation degrades. For these reasons, it is difficult to accurately match motion vectors, and a reference pixel having low spatial correlation is referred to during ½ or ¼ pixel-based interpolation, causing degradation of the coding efficiency.
In the following description, existing prediction methods will be introduced based on H.264, which is one of the 2D moving picture coding methods.
Interprediction creates a prediction model from at least one previously-encoded video frame or field using block-based motion compensation.
A partition and a sub-partition of an inter-coded macroblock are predicted from a partition and a sub-partition of a reference image which have the same sizes and positions as those of the partition and the sub-partition of the inter-coded macroblock. With regard to positions, luminance components have a ¼-pixel resolution and chrominance components have a ⅛-pixel resolution. Since samples corresponding to luminance and chrominance component samples at a sub-sample position cannot be found in a reference image, the luminance and chrominance component samples are interpolated using adjacent coded samples.
Referring to FIG. 3A, a 4×4 block of a current frame is predicted from an area around a corresponding 4×4 block of a reference image. If both horizontal and vertical components of a motion vector are integers (1, −1), appropriate reference block samples exist in the form of gray dots in the reference image, as shown in FIG. 3B. If one or two components of a motion vector are decimals (0.75, −0.5), prediction samples expressed by gray dots are interpolated between adjacent samples such as white dots, in a reference frame, as shown in FIG. 3C.
Luminance component interpolation in units of a ¼ pixel is directed to obtaining a luminance component at a ½ pixel position by applying a 6-tap filter having a coefficient of (1, −5, 20, 20, −5, 1) in horizontal and vertical directions of a luminance pixel at an integer position and obtaining a ¼ pixel-based luminance sample by averaging samples at an integer position and a ½ pixel position. Since a chrominance component has a resolution that is half of that of a luminance component, when a ¼ pixel-based motion vector of the luminance component is used for motion compensation of the chrominance component, it is recognized as a ⅛ pixel-based motion vector. Thus, chrominance component interpolation in units of a ⅛ pixel is required.
For interpolation, pixel values are calculated using the following equations. A pixel b in FIG. 4 is calculated using equation (1):b=round(E−5F+20G+20H−5I+J)/32)  (1)
After the luminance component at a ½ pixel position is obtained using the 6-tap filter, a ¼ pixel a is calculated as follows:a=round(G+b+1)/2)
For example, referring to FIG. 5A, the pixel a is obtained using a pixel G and the pixel b, referring to FIG. 5B, a pixel d is obtained using the pixel G and a pixel h, and referring to FIG. 5C, a pixel e is obtained using the pixel b and the pixel h.
As such, according to a conventional interpolation method, interpolation is performed using reference pixels in a horizontal or vertical direction without considering the distortion characteristic of an image. However, when interpolation is performed on an annular image using reference pixels in a horizontal or vertical direction like in the conventional interpolation method, spatial correlation is degraded and thus prediction of a pixel is not correctly performed. For example, as shown in FIGS. 6A and 6B, blocking effects arise.
FIGS. 6A and 6B are reference diagrams illustrating blocking effects in interprediction of an annular image according to the prior art.
Since a basic processing unit of all moving picture codecs is a 2D square block or macroblock, errors as shown in FIGS. 6A and 6B occur when an annular image having circular distortion is processed. In FIGS. 6A and 6B, interprediction data of an annular image extracted from a H.264 baseline profile are shown. Referring to FIG. 6B, a portion of a human face shape is empty. The blocking effects as shown in FIGS. 6A and 6B occur because coordinates from which reference pixels of a 6-tap filter of a ½ pixel and a ¼ pixel used in interprediction have low spatial correlation.