H. 264/Moving Picture Experts Group (MPEG)-4 Advanced Video Coding (AVC) (hereinafter referred to as H. 264) is known as an encoding method for compression recording of a moving image.
In recent years, an activity of international standardization of a higher-efficiency encoding method is started as a successor of H. 264 and Joint Collaborative Team on Video Coding (JCT-VC) is established between International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) and International Telecommunication Union Telecommunication Standardization Sector (ITU-T). In JCT-VC, standardization of High Efficiency Video Coding (hereinafter referred to as HEVC) is underway (refer to NPL 1).
In HEVC, a technology called a tile division method is adopted in which an image is divided into rectangular areas (tiles) to independently perform encoding and decoding of the individual areas. In addition, in the tile division method, a technique to perform the encoding and decoding of motion constrained tile sets (hereinafter referred to as MCTS) each composed of one or more tiles independently of the other tiles is proposed (refer to NPL 2). In the proposal described in NPL 2, the MCI'S that is capable of being set for each sequence is defined. In other words, the MCTS is arranged at the relatively same position in each frame in the same sequence. In the above proposal, in the encoding and decoding of the MCTS in a frame to be processed, a pixel group arranged at the relatively same position as that of the MCTS in another frame is subjected to inter-frame prediction. In other words, the pixels other than the pixels in the pixel group are not used as reference pixels that are referred to in motion vector search. This allows the independence of the encoding and decoding in the MCTS to be ensured. The position of each tile included in the MCTS in an image is included in a supplemental enhancement information (SEI) message for encoding.
In the standardization of the HEVC, extension to hierarchical coding is also considered. In the hierarchical coding, a tile to be encoded is encoded on a base layer and an enhancement layer. The tiles encoded in the respective layers are multiplexed to generate a bit stream. In the hierarchical coding described above, it is possible to independently set the boundary position of the tile on the base layer and the boundary position of the tile on the enhancement layer. Since it is necessary to refer to a tile to be encoded on the base layer in the encoding of the corresponding tile on the enhancement layer, it is necessary to identify the position of the tile on the base layer. Accordingly, use of tile_boundaries_aligned_flag as a Video Usability Information (VUI) parameter (vui_parameters) on the enhancement layer is proposed (refer to NPL 3). The tile_boundaries_aligned_flag results from encoding of coincidence information indicating whether the tile is arranged at the relatively same position in the respective layers. If the tile_boundaries_aligned_flag has a value of one, it is ensured that the boundary position of the tile on the enhancement layer coincides with the boundary position of the corresponding tile on the base layer. Since this allows the position of the tile on the base layer, which is called in the encoding and decoding of the tile on the enhancement layer, to be identified, it is possible to independently encode and decode the tile on the enhancement layer to enable high-speed encoding and decoding. The base layer is the highest-level layer and the succeeding enhancement layers are the lower-level layers.
However, in the MCTS described in NPL 2, the hierarchical coding is not considered. Specifically, when the boundary of the tile and the position of the MCTS are capable of being set for each layer, the relative positions of the tile on the respective layers may not coincide with each other. For example, when a certain tile on the enhancement layer is included in the MCTS and the tile at the position corresponding to the certain tile on the base layer is not included in the MCTS, it is necessary to also decode surrounding tiles, in addition to the tile at the position corresponding to the certain tile, on the base layer.
This will now be specifically described with reference to FIG. 13. FIG. 13 illustrates how to divide a frame into tiles. Referring to FIG. 13, reference numerals 1301 to 1310 each denote a frame. Each of the frames 1301 to 1310 includes 12 tiles of tile numbers 0 to 11. The tile of the tile number one is hereinafter referred to as a tile 1. The same applies to the other tile numbers. For description, on the base layer, each frame is horizontally divided into two tiles and is not vertically divided. On the enhancement layer, each frame is horizontally divided into four tiles and vertically divided into three tiles. Thin-line boxes represent the boundaries of the tiles in FIG. 13.
Each of the frames 1301, 1303, 1305, 1307, and 1309 indicates the frame of each layer at a time t. The frame 1301 indicates the frame on the base layer at the time t. The frame 1305 indicates the frame on an enhancement first layer (a first enhancement layer) at the time t. The frame 1303 indicates the frame resulting from enlargement of a reconstructed image resulting from local decoding of the frame 1301 to the resolution of the first enhancement layer. The frame 1309 indicates the frame on an enhancement second layer (a second enhancement layer) at the time t. The frame 1307 indicates the frame resulting from enlargement of a decoded image of the frame 1305 to the resolution of the second enhancement layer.
Each of the frames 1302, 1304, 1306, 1308, and 1310 indicates the frame of each layer at a time t+delta. The frame 1302 indicates the frame on the base layer at the time t+delta. The frame 1306 indicates the frame on the first enhancement layer at the time t+delta. The frame 1304 indicates the frame resulting from enlargement of the decoded image of the frame 1302 to the resolution of the first enhancement layer. The frame 1310 indicates the frame on the second enhancement layer at the time t+delta. The frame 1308 indicates the frame resulting from enlargement of the decoded image of the frame 1306 to the resolution of the second enhancement layer.
The tile 5 on each of the frames (the frames 1305, 1306, 1309, and 1310) on the enhancement layer is described as a tile in the MCTS here. Referring to FIG. 13, each bold-line box indicates the tile belonging to the MCTS or the position corresponding to the tile.
Referring to FIG. 13, the tile 5 in the frame 1306 on the first enhancement layer is required to be decoded in order to decode the MCTS (the tile 5) in the frame 1310 on the second enhancement layer. In addition, the tile 0 in the frame 1302 on the base layer is required to be decoded in order to decode the tile 5 in the frame 1306 on the first enhancement layer. Furthermore, the inter-frame prediction is required to be performed with reference to the frame 1301 and all the tiles in the frame 1301 are required to be decoded in order to decode the tile 0 in the frame 1302 on the base layer.
In other words, in related art, in the decoding of the MCTS on the second enhancement layer at the time t+delta, it is necessary to decode an area other than the area indicating the position of the tile 5 in the frame 1302 on the base layer at the time t+delta (the area denoted by broken lines in the frame 1304). Accordingly, in the encoding and decoding of a certain tile using the MCTS or the like in the hierarchical coding, there is a problem in that it is not possible to independently encode and decode only the tiles corresponding to the position of the MCTS.