The disclosed embodiments of the present invention relate to video/image processing, and more particularly, to a method and apparatus for accessing data of a multi-tile encoded picture stored in a buffering apparatus.
VP8 is an open video compression format released by Google®. Like many modern video compression schemes, VP8 is based on decomposition of frames into square subblocks of pixels, prediction of such subblocks using previously constructed blocks, and adjustment of such predictions (as well as synthesis of unpredicted blocks) using a discrete cosine transform (DCT). In one special case, however, VP8 uses a Walsh-Hadamard transform (WHT) instead of the commonly used DCT.
WebP is an image format developed by Google® according to VP8. Specifically, WebP is based on VP8's intra-frame coding and uses a container based on resource interchange file format (RIFF). Besides, WebP is announced to be a new open specification that provides lossy compression for photographic images. In a large scale study of 900,000 web images, WebP images are found 39.8% smaller than Joint Photographic Experts Group (JPEG) images of similar quality. Webmasters, web developers and browser developers therefore can use the WebP format to create smaller, better looking images that can help to improve user's web surfing.
In accordance with the VP8/WebP specification, the input to a VP8/WebP decoder is a sequence of compressed frames whose order matches their order in time. Besides, every compressed frame has multiple partitions included therein. As the VP8/WebP bitstream is configured to transmit compressed frames each having a plurality of partitions included therein, how to efficiently buffer and decode each compressed frame of a multi-partition VP8/WebP bitstream becomes an important issue in this technical field.
As proposed in High-Efficiency Video Coding (HEVC) specification, one picture can be partitioned into multiple tiles. FIG. 19 is a diagram illustrating tiles adopted in the HEVC specification. FIG. 20 is a diagram illustrating a conventional decoding order of the tiles shown in FIG. 19. As shown in FIG. 19, one picture 10 is partitioned into a plurality of tiles T11′-T13′, T21′-T23′, T31′-T33′ separated by row boundaries (i.e., horizontal boundaries) HB1′, HB2′ and column boundaries (i.e., vertical boundaries) VB1′, VB2′. Inside each tile, largest coding units (LCUs)/treeblocks (TBs) are raster scanned, as shown in FIG. 20. For example, LCUs/TBs orderly indexed by the Arabic numbers in the same tile T11 are decoded sequentially. Inside each multi-tile picture, tiles are raster scanned, as shown in FIG. 20. For example, the tiles T11′-T13′, T21′-T23′ and T31′-T33′ are decoded sequentially. Specifically, one picture can be uniformly partitioned by tiles or partitioned into specified LCU-column-row tiles. A tile is a partition which has vertical and horizontal boundaries, and it is always rectangular with an integer number of LCUs/TBs included therein.
In accordance with HEVC specification, there are two types of tiles, independent tiles and dependent tiles. As to the independent tiles, they are treated as sub-pictures/sub-streams. Hence, encoding/decoding LCUs/TBs of an independent tile (e.g., motion vector prediction, intra prediction, deblocking filter (DF), sample adaptive offset (SAO), adaptive loop filter (ALF), entropy coding, etc.) does not need data from other tiles. Besides, assume that data of the LCUs/TBs is encoded/decoded using arithmetic coding such as a context-based adaptive binary arithmetic coding (CABAC) algorithm. Regarding each independent tile, the CABAC statistics are initialized/re-initialized at the start of the tile, and the LCUs outside the tile boundaries of the tile are regarded as unavailable. For example, the CABAC statistics at the first LCU/TB indexed by “1” in the tile T11′ would be initialized when decoding of the tile T11′ is started, the CABAC statistics at the first LCU/TB indexed by “13” in the tile T12′ would be re-initialized when decoding of the tile T12′ is started, the CABAC statistics at the first LCU/TB indexed by “31” in the tile T13′ would be re-initialized when decoding of the tile T13′ is started, and the CABAC statistics at the first LCU/TB indexed by “40” in the tile T21′ would be re-initialized when decoding of the tile T21′ is started.
However, encoding/decoding LCUs/TBs of a dependent tile (e.g., motion vector prediction, intra prediction, DF, SAO, ALF, entropy coding, etc.) has to consider data provided by other tiles. Hence, vertical and horizontal buffers are required for successfully decoding a multi-tile encoded picture/compressed frame having dependent tiles included therein. Specifically, the vertical buffer is used for buffering decoded information of LCUs/TBs of an adjacent tile beside a vertical boundary (e.g., a left vertical boundary) of a currently decoded tile, and the horizontal buffer is used for buffering decoded information of LCUs/TBs of another adjacent tile beside a horizontal boundary (e.g., a top horizontal boundary) of the currently decoded tile. As a result, the buffer size for decoding the multi-tile encoded picture/compressed frame would be large, leading to higher production cost. Besides, assume that data of the LCUs/TBs is encoded/decoded using arithmetic coding such as a CABAC algorithm. Regarding a dependent tile, the CABAC statistics may be initialized at the start of the tile or inherited from another tile. For example, the CABAC statistics at the first LCU/TB indexed by “1” in the tile T11′ would be initialized when decoding of the tile T11′ is started, the CABAC statistics at the first LCU/TB indexed by “13” in the tile T12′ would be inherited from the CABAC statistics at the last LCU/TB indexed by “12” in the tile T11′ when decoding of the tile T12′ is started, the CABAC statistics at the first LCU/TB indexed by “31” in the tile T13′ would be inherited from the CABAC statistics at the last LCU/TB indexed by “30” in the tile T12′ when decoding of the tile T13′ is started, and the CABAC statistics at the first LCU/TB indexed by “40” in the tile T21′ would be inherited from the CABAC statistics at the last LCU/TB indexed by “39” in the tile T13′ when decoding of the tile T21′ is started.
Regarding the joint Photographic Experts Group extended range (JPEG-XR) specification, one picture can be partitioned into specified Macroblock-column-row tiles. A tile is a partition which has vertical and horizontal boundaries, and it is always rectangular with an integer number of macroblocks (MBs) included therein. Inside each tile, MBs are raster scanned. Inside each multi-tile picture, tiles are raster scanned. In accordance with JPEG-XR specification, there are two types of tiles, hard tiles and soft tiles. As to the hard tiles, they are treated as sub-pictures. Hence, encoding/decoding MBs of a hard tile does not need data from other tiles. However, encoding/decoding MBs of a soft tile has to consider data provided by other tiles. For example, in soft tiles, overlap filtering may be applied across tile boundaries.
As the multi-tile HEVC/JPEG-XR bitstream is configured to transmit encoded/compressed frames each having a plurality of tiles included therein, how to efficiently buffer and decode each encoded/compressed frame of the multi-tile HEVC/JPEG-XR bitstream becomes an important issue in this technical field.