The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventor, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present invention.
A video encoder is typically implemented by dividing each frame of original video data in blocks of pixels. In existing standards for video compression (e.g., MPEG1, MPEG2, H.261, H.263, and H.264) these blocks would normally be of sized 16×16 and be referred to as macroblocks (MB). In the future HEVC/H.265 standard, the blocks would typically be larger (e.g. 64×64) and might be rectangular, for instance at frame boundaries.
Typically, the blocks are processed and/or transmitted in raster scan order, i.e. from the top row of blocks to the bottom row of blocks, and from left to right within each row of blocks.
For each block of original pixel data the encoding is typically performed in the following steps:                Produce prediction pixels using reconstructed pixel values from i) the previous frame (inter prediction), or ii) previously reconstructed pixels in the current frame (intra prediction). Depending on the prediction type, the block is classified as an inter block or an intra block.        Compute the difference between each original pixel and the corresponding prediction pixel within the block.        Apply a two-dimensional transform to the difference samples resulting in a set of transform coefficients.        Quantize each transform coefficient to an integer number.        Perform lossless entropy coding of the quantized transform coefficient.        Apply a two-dimensional inverse transform to the quantized transform coefficient to compute a quantized version of the difference samples.        Add the prediction to form the reconstructed pixels for the current block.        
Moreover, in reference to FIG. 1, a current frame as well as a prediction frame are input to a subtractor 9. The subtractor 9 is provided with input from an intra prediction processing path 3 and a motion compensation processing path 5, the selection of which is controlled by switch 7. Intra prediction processing is selected for finding similarities within the current image frame, and is thus referred to as “intra” prediction. Motion compensation has a temporal component and thus involves anslysis between successive frames that is referred to as “inter” prediction.
The output of the switch 7 is subtracted from the pixels of the current frame in a subtractor 9, prior to being subjected to a two dimensional transform process 13. The transformed coefficients are then subjected to quantization in a quantizer 15 and then subject to an entropy encoder 17. Entropy encoding removes redundancies without losing information, and is referred to as a lossless encoding process. Subsequently, the encoded data is arranged in network packets via a packetizer, prior to be transmitted in a bit stream.
However, the output of the quantizer 15 is also applied to an inverse transform and used for assisting in prediction processing. The output is applied to a deblocking filter 8, which suppresses some of the sharpness in the edges to improve clarity and better support prediction processing. The output of the deblocking filer 8 is applied to a frame memory 6, which holds the processed image pixel data in memory for use in subsequent motion processing.
The corresponding decoding process for each block can be described as follows (as indicated in FIG. 2). After entropy decoding 22 (to produce the quantized transform coefficients) and two dimensional inverse transformation 26 on the quantized transform coefficient to provide a quantized version of the difference samples, the resultant image is reconstructed after adding the inter prediction and intra prediction data previously discussed.
Some of the more detailed encoder and decoder processing steps will now be described in more detail. In video encoders, blocks can be divided into sub-blocks. Typically, the blocks are of fixed (square) size, while the sub-blocks can be of various e.g. (rectangular) shapes. Also, the partitioning into sub-blocks will typically vary from one block to another.
Inter prediction is typically achieved by deriving a set of motion vectors for each sub-block. The motion vectors define the spatial displacement between the original pixel data and the corresponding reconstructed pixel data in the previous frame. Thus, the amount of data that needs to be transmitted to a decoder can be greatly reduced if a feature in a first frame can be identified to have moved to another location in a subsequent frame. In this situation, a motion vector may by used to efficiently convey the information about the feature that has changed position from one frame to the next.
Intra prediction is typically achieved by deriving an intra direction mode for each sub-block. The intra direction mode defines the spatial displacement between the original pixel data and the previously reconstructed pixel data in the current frame.
Both motion vectors and intra direction modes are encoded and transmitted to the decoder as side information for each sub-block. In order to reduce the number of bits used for this side information, encoding of these parameters depends on the corresponding parameters of previously processed sub-blocks.
Typically, some form of adaptive entropy coding is used. The adaptation makes the entropy encoding/decoding for a sub-block dependent on previously processed sub-blocks. Entropy encoding is lossless encoding that reduces the number of bits that are needed to convey the information to a receiving site.
Many video encoding/decoding systems and methods apply a deblocking filter (8 in FIG. 2) across boundaries between blocks. Moreover, a deblocking filter is applied to blocks in decoded video to improve visual quality and prediction performance by smoothing the sharp edges which can form between blocks when block coding techniques are used. The filter aims to improve the appearance of decoded pictures.
The AVC/H.264 standard for video compression supports two mechanisms for parallel processing of blocks: Slices and Slice groups.
Slices
A slice in AVC/H.264 is defined as a number of consecutive blocks in raster scan order. The use of slices is optional on the encoder side, and the information about slice boundaries is sent to the decoder in the network transportation layer or in the bit-stream as a unique bit pattern.
The most important feature for slice design in AVC/H.264 is to allow transportation of compressed video over packet-based networks. Typically, one slice of compressed video data is transported as one packet. To ensure resilience to packet loss, each slice is independently decodable. As recognized by the present inventor this requirement implies that all dependencies between blocks of different slices are broken. In addition, key parameters for the entire slice is transported in a slice header.
Slice Groups
Slice groups in AVC/H.264 define a partitioning of the blocks within a frame. The partitioning is signalled in the picture header. Blocks are processed and transmitted in raster-scan order within a slice group. Also, as recognized by the present inventor, since a slice can not span more than one slice group, dependencies are broken between slice groups in the same manner as between slices. As recognized by the present inventor, slice groups are different from “tiles” (as will be subsequently be discussed in detail) in at least two important aspects. First, with slice groups, blocks are transmitted in raster scan order within the slice group. Having to decode a bit stream that uses raster-scan order within a slice-group is a highly undesirable requirement for many decoders, especially those using a single core. This is because pixels are best decoded in the same order as they are rendered and stored in memory for rendering on a display device. In the extreme case, a bit stream with slice groups could force the decoder to decode each frame one column (of blocks) at a time rather than one row (of blocks) at a time. Secondly, slice groups can specify non-contiguous partitions of a frame (e.g. checkerboard pattern). Having to decode e.g. all the “white” blocks before all the “black” blocks of a checkerboard pattern or even more sophisticated patterns place an even worse burden on a decoder. Because of these difficulties in implementing generic slice groups in AVC/H.264 decoders, the latest revision of the AVC/H.264 standard introduced a new profile (constrained profile) which disallowed the use of slice groups in the bit stream. Decoders not being able to decode slice groups could then claim compliance with the constrained profile (instead of the baseline profile). With tiles (as will be discussed in the detailed description), blocks are transmitted in raster-scan order within the frame which is the optimal transmission order for most single core decoders.