As one piece of information transmitted with a communication system or information recorded in an accumulation device, there is an image or a moving image. In the related art, technologies for coding images are known to transmit and accumulate such images (hereafter including moving images).
As a moving image coding scheme, AVC (H.264/MPEG-4 Advanced Video Coding) and High-Efficiency Video Coding (HEVC), which is a succession codec, are known (see NPL 1).
In such a moving image coding scheme, normally, a predicted image is generated based on a local decoded image obtained by coding/decoding an input image, and a predicted residual obtained by subtracting the predicted image from an input image (original image) is coded. An inter-frame prediction (inter-prediction) and intra-frame prediction (intra-prediction) are exemplified as a method of generating a predicted image.
In intra-prediction, predicted images are sequentially generated in a picture based on a local decoded image in the same picture.
In inter-prediction, a predicted image is generated through inter-picture motion compensation. The decoded picture used to generate a predicted image through inter-prediction is referred to as a reference picture.
There are also known technologies for generating coded data from a plurality of moving images by dividing and coding the plurality of mutually relevant moving images into layers (hierarchies) and these technologies are referred to as hierarchical coding technologies. Coded data generated by the hierarchical coding technologies is also referred to as hierarchical coded data.
As a representative hierarchical coding technology, scalable HEVC (SHVC) based on HEVC is known (see NPL 2).
In SHVC, spatial scalability, temporal scalability, and SNR scalability are supported. For example, in the case of the spatial scalability, moving images with a plurality of different resolutions are divided into layers to be coded to generate hierarchical coded data. For example, an image obtained to have a desired resolution by performing down-sampling on an original image is coded as a lower layer. Next, inter-layer prediction is applied to remove redundancy between the layers, and then the original image is coded as a higher layer.
As another representative hierarchical coding technology, multi view HEVC (MV-HEVC) which is based on HEVC is known. MV-HEVC supports view scalability. In the view scalability, moving images corresponding to a plurality of different viewpoints (views) are divided into layers to be coded to generate hierarchical coded data. For example, a moving image corresponding to a viewpoint serving as a base (base view) is coded as a lower layer. Next, inter-layer prediction is applied, and then moving images corresponding to different viewpoints are coded as higher layers.
As inter-layer prediction of SHVC and MV-HEVC, there are inter-layer image prediction and inter-layer motion prediction. In the inter-layer image prediction, a predicted image is generated using a decoded image of a lower layer. In the inter-layer motion prediction, a prediction value of motion information is derived using motion information of a lower layer. A picture used for prediction in the inter-layer prediction is referred to as an inter-layer reference picture. A layer including the inter-layer reference picture is referred to as a reference layer. Hereinafter, a reference picture used for inter-prediction and a reference picture used for inter-layer prediction are simply collectively referred to as a reference picture.
The inter-layer image prediction includes a reference pixel position derivation process of deriving a pixel position on a lower layer which corresponds to the position of a prediction target pixel on a higher layer and a scale derivation process of deriving a scale corresponding to a magnification ratio in a scaling process applied to a picture of a lower layer.
In SHVC and MV-HEVC, any of inter-prediction, intra-prediction, and inter-layer image prediction can be used to generate a predicted image.
As one application using SHVC and MV-HEVC, there is a video application considering a region of interest. For example, a video reproduction terminal normally reproduces a video of an entire region at a relatively low resolution. In a case in which a part of a displayed video is designated as a region of interest by a viewer of a video reproduction terminal, the region of interest is displayed at a high resolution on the reproduction terminal.
A video application considering the foregoing region of interest can be realized using a hierarchical coded data in which a video with a relative low resolution of an entire region is coded as coded data of a lower layer and a video with a high resolution of a region of interest is coded as coded data of a higher layer. That is, in a case in which an entire region is reproduced, only coded data of the lower layer is decoded and reproduced. In a case in which a video with a high resolution of a region of interest is reproduced, coded data of a higher layer is added to the coded data of the lower layer to be transmitted. In this way, it is possible to realize the application in a transmission band less than that in a case in which both of coded data for a low-resolution video and coded data for a high-resolution video are transmitted. At this time, by extracting the coded data corresponding to a region including the region of interest from each of the higher layer and the lower layer and transmitting the coded data, it is possible to further suppress a transmission band.
In the foregoing video application considering a region of interest, a positional relation between pixels of the higher layer and pixels of the lower layer is changed in a case in which the coded data of the higher layer and the lower layer including the region of interest is generated. As a result, there is a problem in that prediction accuracy deteriorates in a case in which a pixel value of the higher layer is predicted based on a pixel value of the lower layer.
In SHVC (see NPL 2), a scaled reference layer offset is adopted as a parameter indicating a positional relation between pixels of a higher layer and pixels of a lower layer. The scaled reference layer offset is a set of offsets indicating the position of a predetermined region on a reference layer (for example, an entire reference layer picture) on a higher layer which is a target layer.
NPL 3 discloses a method of realizing matching between reference pixel positions (correspondence reference positions) or before and after extraction of a scale by transmitting a reference layer offset indicating the position of a region used for scale calculation on a lower layer in addition to the above-described scaled reference layer offset and by calculating a reference pixel position or a scale using the scaled reference layer offset and the reference layer offset even in a case in which partial data corresponding to a region of interest is extracted from hierarchical coded data.