Information transmitted by a communication system or information recorded in a storage apparatus include images or video. Conventionally, techniques for coding images (including video, hereinafter) in order to transmit and store these images have been known.
Video coding schemes, such as Advanced Video Coding (AVC), (H.264/Moving Picture Experts Group (MPEG)-4 AVC) and its succeeding codec High-Efficiency Video Coding (HEVC) (Non-Patent Literature 1), have been known.
According to these video coding schemes, typically, a predictive image is generated on the basis of a local decoded image obtained by coding/decoding an input image, and a predictive residue (referred to as a “difference image” or a “residual image”), which is obtained by subtracting the predictive image from the input image (original image), is coded. Methods of generating a predictive image include inter-screen prediction (inter prediction), and intra-screen prediction (intra prediction).
In the intra prediction, based on locally decoded images in the same picture, predictive images in this picture are sequentially generated.
In the inter prediction, predictive images are generated through inter-picture motion compensation. The decoded picture used to generate the predictive image in the inter prediction is called a reference picture.
Furthermore, a technique has been known that classifies videos related to each other into layers (hierarchical layers) and codes the videos to generate coded data from the videos. This technique is called a hierarchical coding technique. The coded data generated by the hierarchical coding technique is also called hierarchically coded data.
As a representative hierarchical coding technique, HEVC-based scalable HEVC (SHVC) has been known (Non-Patent Literature 2).
SHVC supports spatial scalability, temporal scalability, and signal-to-noise ratio (SNR) scalability. For example, in the case of the spatial scalability, videos with different resolutions are classified into layers and coded to generate hierarchically coded data. For example, an image down sampled from an original image to have a desired resolution is coded as a lower layer. Next, inter-layer prediction is applied to the original image in order to remove inter-layer redundancy, and the image is coded as a higher layer.
As another representative hierarchical coding technique, HEVC-based multi view HEVC (MV-HEVC) has been known (Non-Patent Literature 3).
MV-HEVC supports view scalability. In the case of view scalability, videos corresponding to different viewpoints are classified into layers and coded to generate hierarchically coded data. For example, a video corresponding to a viewpoint serving as a basis (base view) is coded as a lower layer. Next, inter-layer prediction is applied to the videos corresponding to the different viewpoints, and the videos are coded as a higher layer.
Inter-layer predictions in SHVC and MV-HEVC include inter-layer image prediction and inter-layer motion prediction. In the inter-layer image prediction, a decoded image on a lower layer is used to generate a predictive image. In the inter-layer motion prediction, motion information on a lower layer is used to derive a predictive value of motion information. A picture used for prediction in the inter-layer prediction is called an inter-layer prediction picture. A layer including the inter-layer prediction picture is called a reference layer. In the following description, a reference picture used for the inter prediction, and a reference picture used for the inter-layer prediction are collectively and simply called reference pictures.
In SHVC and MV-HEVC, any of the inter prediction, intra prediction, and inter-layer image prediction can be used to generate a predictive image.
One of applications using SHVC or MV-HEVC is a video application that considers a region of interest. For example, a video reproduction terminal typically reproduces a video of the entire area at a relatively low resolution. When a viewer of the video reproduction terminal designates a part of a displayed video as a region of interest, this region of interest is displayed on the reproduction terminal at a high resolution.
The video application in consideration of the region of interest can be achieved using hierarchically coded data, in which the video of the entire area with a relatively low resolution is coded as lower layer coded data while the high resolution video of the region of interest is coded as higher layer coded data. That is, when the entire area is reproduced, only the lower layer coded data is decoded and reproduced. When the high resolution video of the region of interest is reproduced, the higher layer coded data is added to the lower layer coded data and transmitted. In such a case, the application can thus be achieved with a smaller transmission band than in the case where both the coded data on the low resolution video and the coded data on the high resolution video are transmitted. Coded data corresponding to an area including the region of interest may be extracted from the higher layer and the lower layer and transmitted, which allows the transmission band to be further reduced.
When the video application in consideration of the region of interest as described above generates coded data which is on the higher layer and the lower layer and covers the region of interest, the positional relationship between the higher layer pixels and the lower layer pixels is changed. Consequently, this change causes a problem of reducing the prediction correctness in the case of predicting the higher layer pixel values based on the lower layer pixel values.
Non-Patent Literature 4 discloses a method that transmits inter-layer phase correspondence information for the sake of adjusting the positional relationship between the higher layer pixels and the lower layer pixels, and calculates the pixel positions on the lower layer corresponding to the respective pixels on the higher layer using the inter-layer phase correspondence information.