One of the types of information transmitted in a communication system or information recorded in a storage device is an image or a moving image. In the related art, there is known an image encoding technology for transmission or storage of these images (hereinafter, include a moving image).
As a moving image encoding scheme, there is known H.264/MPEG-4 advanced video coding (AVC) or high-efficiency video coding (HEVC) as a follow-up codec thereof (NPL 1).
In these moving image encoding schemes, generally, a predicted image is generated on the basis of a locally decoded image obtained by encoding/decoding an input image, and encoded is a predicted residual (referred to as “difference image” or “residual difference image”) obtained by subtracting the predicted image from the input image (source image). A method for generating the predicted image is exemplified by inter-frame prediction (inter prediction) and intra-frame prediction (intra prediction).
In intra prediction, the predicted image in one picture is generated in order on the basis of a locally decoded image in the same picture.
In inter prediction, the predicted image is generated by motion compensation between pictures. A previously decoded picture used in predicted image generation in inter prediction is referred to as a reference picture.
There is also known a technology that generates encoded data from a plurality of moving images by encoding a plurality of relevant moving images in layers (hierarchy), referred to as a hierarchical encoding technology. Encoded data generated by the hierarchical encoding technology is referred to as hierarchically encoded data.
As a representative hierarchical encoding technology, there is known scalable HEVC (SHVC) that is based on HEVC (NPL 2).
SHVC supports spatial scalability, temporal scalability, and SNR scalability. In spatial scalability, for example, hierarchically encoded data is generated by encoding a plurality of moving images of different resolutions in layers. For example, an image downsampled from a source image to a desired resolution is encoded as a lower layer. Next, inter-layer prediction is applied to the source image in order to remove inter-layer redundancy, and the source image is encoded as a higher layer.
As another representative hierarchical encoding technology, there is known multiview HEVC (MV-HEVC) that is based on HEVC (NPL 3).
MV-HEVC supports view scalability. In view scalability, hierarchically encoded data is generated by encoding a plurality of moving images corresponding to different viewpoints (views) in layers. For example, a moving image corresponding to a base viewpoint (base view) is encoded as a lower layer. Next, inter-layer prediction is applied to a moving image corresponding to a different viewpoint, and the moving image is encoded as a higher layer.
Types of inter-layer prediction in SHVC or in MV-HEVC include inter-layer image prediction and inter-layer motion prediction. In inter-layer image prediction, a lower layer decoded image is used to generate the predicted image. In inter-layer motion prediction, lower layer motion information is used to derive a predicted value of the motion information. A picture used in prediction in inter-layer prediction is referred to as an inter-layer reference picture. A layer included in the inter-layer reference picture is referred to as a reference layer. Hereinafter, the reference picture used in inter prediction and the reference picture used in inter-layer prediction will be collectively and simply referred to as a reference picture.
In SHVC or MV-HEVC, any of inter prediction, intra prediction, and inter-layer image prediction can be used in predicted image generation.
One of the types of applications using SHVC or MV-HEVC is a video application considering a region of interest. In a video reproducing terminal, for example, generally, the entire region of a video is reproduced at a comparatively low resolution. If a viewer of the video reproducing terminal specifies a part of the displayed video as a region of interest, the region of interest is displayed at a high resolution on the reproducing terminal.
The video application considering a region of interest can be realized by using hierarchically encoded data in which the entire region of the video at a comparatively low resolution is encoded as lower layer encoded data and the region of interest of the video at a high resolution encoded as higher layer encoded data. That is, only lower layer encoded data is decoded and reproduced in a case of reproducing the entire region, and higher layer encoded data is transmitted in addition to the lower layer encoded data in a case of reproducing the region of interest of the video at a high resolution. Thus, the application can be realized with a small transmission bandwidth in comparison with a case of transmitting both of the encoded data of the low-resolution video and the encoded data of the high-resolution video.