In recent years, an action for international standardization of a further efficient coding method as a successor to H.264/MPEG-4 AVC (hereinafter, H.264) has been started. Hence, Joint Collaborative Team on Video Coding (JCT-VC) has been established between ISO/IEC and ITU-T. JCT-VC is pursuing standardization of High Efficiency Video Coding (hereinafter, HEVC). ITU-T has issued the standard as the H.264 coding method in June, 2013 (NPL 1: ITU-T H.265 (April 2013) High efficiency video coding). For the standardization of HEVC, addition of functions, such as hierarchical coding and range enhancement, is continuously discussed.
For the standardization of HEVC, various coding tools are developed and high coding efficiency is achieved. In particular, HEVC divides an image into rectangular tiles and coding and decoding can be performed on a tile basis as compared with H.264 of related art. As an improved method of this tile dividing method, a prediction limit is provided for a tile so that the tile can be independently coded and decoded irrespective of other tile (NPL 2: Contributed by JCT-VC, JCTVC-M0235 Internet <http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/>). This technique defines a tile that can be independently coded and decoded on a sequence basis. Tile sets that can be independently coded and decoded are called motion-constrained tile sets (hereinafter, abbreviated as MCTS). Only tile sets at relatively equivalent positions are subjects of inter-frame prediction, and prediction for tiles other than the tile sets is not performed. Accordingly, independence of coding and decoding is ensured. The positions of the tiles included in MCTS are included in a Supplemental Enhancement Information (SEI) message and coded.
Meanwhile, as described above, for the standardization of HEVC, extension to hierarchical coding is also discussed. One of suggestions is a suggestion on a technique that fixes the position of tile division by spatial-resolution hierarchical coding or the like (NPL 3: Contributed by JCT-VC, JCTVC-M0202 Internet <http://phenix.int-evry.fr/jct/doc_end_user/documents/13_Incheon/wg11/>). This is providing a tile_boundaries_aligned_flag code in Video Usability Information (VUI) parameters (vui_parameters). This code represents tile-position alignment information indicative of whether or not relative positions of tiles in respective hierarchical layers are aligned. If this code is 1, it is assured that the position of the boundary of a tile in an enhancement layer is aligned with the position of a corresponding tile in a base layer. Accordingly, since the position of the image in the base layer called by decoding the tile in the enhancement layers can be specified, decoding can be performed at high speed. At this time, the base layer is the highest layer, and the successive enhancement layers are lower layers.
In hierarchical coding, by performing coding so that the tile in the enhancement layer can be independently decoded, and by performing decoding independently on a tile basis, a desirable image is required to be acquired at high speed.
However, in MCTS described in NPL 2, hierarchical coding is not considered. That is, in the present standard, MCTS is set on a sequence basis. In a coding method having a plurality of layers, such as hierarchical coding, it is not sure how MCTS is handled in each layer. Since handling of MCTS is conceived on a sequence basis, for example, it may be considered that MCTS is set only in the base layer or MCTS is set in all layers. In the former case, for the enhancement layer with a high resolution, it is difficult to satisfy the request for partial reading by using MCTS. Also, in the latter case, since the tile in the base layer and the tile in the enhancement layer are included in MCTS, tile division has to be performed even for the base layer.
Specific description is given with reference to FIG. 15. FIG. 15 shows tile division. Reference signs 1501 to 1510 are frames. The frame 1501 represents a frame in a base layer at a time t. Each layer is formed of tiles numbered 0 to 11. The frame 1505 represents a frame in an enhancement first hierarchical layer at the time t. The frame 1503 represents a frame in which decoded image data of the frame 1501 is enlarged to the resolution of the enhancement first hierarchical layer. The frame 1509 represents a frame of an enhancement second hierarchical layer at the time t. The frame 1507 represents a frame in which decoded image data of the frame 1505 is enlarged to the resolution of the enhancement second hierarchical layer. The frame 1502 represents a frame of the base layer at a time t+delta. The frame 1506 represents a frame of the enhancement first hierarchical layer at the time t+delta. The frame 1504 represents a frame in which decoded image data of the frame 1502 is enlarged to the resolution of the enhancement first hierarchical layer. The frame 1510 represents a frame of the enhancement second hierarchical layer at the time t+delta. The frame 1508 represents a frame in which decoded image data of the frame 1506 is enlarged to the resolution of the enhancement second hierarchical layer. For description, it is assumed that tile division of each frame includes division into four in the horizontal direction and division into three in the vertical direction. Thin lines in the drawing represent the boundaries of tiles.
Herein, for description, MCTS includes a tile with a tile number 5 and a tile with a tile number 6. In FIG. 15, an area surrounded by thick lines is MCTS. Hence, to decode the tile with the tile number 5 in the frame 1510 in the enhancement second hierarchical layer, the tile with the tile number 5 in the frame 1506 of the enhancement first hierarchical layer and the tile with the tile number 5 in the frame 1502 of the base layer are decoded, and the necessary tiles can be decoded at high speed.
When the technique is used for monitoring cameras or the like, in general, decoding of a base layer is performed, videos of a plurality of cameras are monitored, and if an abnormal state is detected, the abnormal area is enlarged and displayed. In MCTS, since inter-frame prediction is limited to image data in a tile set, coding efficiency is decreased. For example, if a subject, such as a person, enters from the outside of MCTS, image data outside MCTS is not referenced, and hence inter-frame prediction of the subject cannot be performed. Also, since the base layer has a sufficiently smaller image size than that of the enhancement layer and has a small code amount, processing cost for decoding is markedly lower than that of the enhancement layer. Owing to this, there is a small effect of an increase in speed by the improvement on parallel processing through tile division. Hence, a problem may arise in which the code amount is increased in a hierarchical layer with a low resolution, such as a base layer, which should have a small code amount, by setting of MCTS.
Accordingly, to address the above-described problem, the present invention improves the image quality and increases the coding efficiency in a higher hierarchical layer such as a base layer while ensuring independence of coding and decoding of a tile even during hierarchical coding. A tile that can be independently decoded, such as a tile included in MCTS, is called independent decoding tile, and a group of independent decoding tiles, such as MCTS, is called independent decoding tile set.