A great advance has been made in digital technology. As a result, it has become very popular to take a high-resolution motion picture using a digital camera or a digital video camera. To store a digital motion picture in an efficient manner in a storage medium typified by a flash memory, the data is generally compressed (coded). H.264/MPEG-4 AVC (hereinafter referred to as H.264) is a technique widely used to code motion pictures.
A Joint Collaborative Team on Video Coding (JCT-VC) has been established by the ISO/IEC and the ITU-T to develop a further high efficiency coding standard as a successor to the H.264 coding standard. More specifically, a High Efficiency Video Coding (hereinafter referred to as HEVC) standard is under development in the JCT-VC.
In the standardization of HEVC, various coding tools are under discussion, in terms of not only an improvement in coding efficiency but also other factors including implementability, processing time, and the like. Issues under discussion include parallel processing of coding/decoding, a technique of dividing a picture into slices along a horizontal direction to increase error resilience, a technique of dividing a picture into rectangular areas called tiles, and other techniques (NPL 1). Use of slices or tiles makes it possible to perform coding and decoding in parallel, which allows an increase in processing speed. Use of slices or tiles also allows a reduction in memory capacity necessary in the coding/decoding process. HEVC allows it use a mixture of dividing into slices and dividing into tiles.
A technique called a motion constrained tile sets (MCTS) technique is used to code a video sequence using the division into tiles such that it is allowed to decode only a particular tile independently of the other tiles from a coded stream of successive pictures (NPL 4). When a coded stream includes an MCTS SEI message, a video sequence is supposed to be coded so as to satisfy the following conditions.                All pictures in the video sequence are coded such that the division into tiles is performed in the same manner.        In MCTS coding, coding is performed without using a motion vector that refers to a pixel outside the tile set.        
In decoding of a coded stream, when the coded stream includes an MCTS SEI message, it is allowed to extract only a tile set specified as MCTS from a sequence of pictures and quickly decode or play back the extracted MCTS tile set as a partial motion picture. Use of MCTS make it possible to quickly decode only a region a user is interested in. Hereinafter, such a region of interest will also be referred as a ROI.
An AVC (Advanced Video Coding) file format (NPL 2) is widely used as a media file format to store H.264 video data. It is expected that HEVC will provide a media file format similar to the AVC file format.
When a low-resolution device is used to play back a movie including a sequence of one or more high-resolution pictures each including, for example, 4096 pixels in a horizontal direction and 2048 pixels in a vertical direction (hereinafter referred to as 4096×2048 pixels), it may be advantageous to extract a particular area and play back only the extracted area. This may apply, for example, to a use case in which a face of a particular person is extracted from a scene including many people and the extracted face is displayed in an enlarged manner. In such a use case, if a whole picture area of a picture in a movie is first decoded and a partial area is extracted and displayed, a long decoding time (a delay time before the picture is displayed) and large power consumption are necessary. Thus, when a partial area is extracted and the extracted area is played back, the capability of dividing each picture into tiles and coding the resultant tiles, and, in a playback operation, decoding only particular tiles provides advantages in particular in terms of a reduction in delay time before the picture is displayed and a reduction in power consumption.
In the AVC file format described in NPL 2, coded data of each picture (denoted as sample data in NPL 2) is stored in units of coded data of slices. The coded data of each slice is added with one-byte data called a NAL header thereby being converted into NAL unit data. NAL stands for Network Abstraction Layer, and a detailed description thereof may be found, for example, in Section 7.4.1 of NPL 1, and thus a further description thereof here is omitted. In front of each NAL unit data, data indicating a NAL unit data length is put to indicate the data length, in bytes, of the NAL unit data. Thus, in a process of playing back the media file written in the AVC file format, it is allowed to access coded data of an arbitrary slice in a picture without coding the slice.
In a case where coding is performed according to HEVC using a mode in which one slice is divided into a plurality of tiles, coding parameters necessary in decoding each tile are described in a slice header to which the tile belongs. Therefore, even in a case where only part of tiles in a slice are decoded, it is necessary to decode the slice header of this slice.
In HEVC, it is possible to calculate the number of pixels in the horizontal direction and that in the vertical direction of a tile from coding parameters in a picture parameter set (PPS) described in Section 7.4.2.3 of NPL 1. More specifically, for example, it is possible to calculate the numbers of pixels in the horizontal and vertical directions for each tile from a parameter (num_tile_columns_minus1) indicating the number of tile columns minus 1, a parameter (num_tile_rows_minus1) indicating the number of tile rows minus 1, and the numbers of horizontal and vertical pixels in a sequence parameter set (SPS) described in NPL 1.
However, the numbers of pixels in the horizontal and vertical directions of each slice are not described in SPS or PPS, and thus acquisition of the numbers of pixels in the horizontal and vertical directions of each slice is possible only by decoding the slice of interest.
That is, when a particular tile in a picture is extracted and decoded, it is not possible to know the ordinal position of a slice in which the tile of interest to be decoded is included without decoding slices. Therefore, it is necessary to decode the whole picture area, which results in a long decoding time and large power consumption.
HEVC also allows a coding mode in which each picture is divided into tiles and slices such that a plurality of slices are included in one tile. However, as in the previous case, no way is provided to know which slice is to be decoded to get a correct tile to be decoded, without decoding slices. Therefore, it is necessary to code the whole picture area, which results in a long decoding time and large power consumption.
In view of the above, the present invention provides a technique of extracting a particular tile in a picture and decoding the extracted tile at an improved processing speed, with reduced power consumption, and with a reduced memory capacity.