Often video data is transmitted as an encoded bitstream, and accordingly an apparatus receiving that encoded bitstream will need to perform a decoding operation in order to derive the pixel data for each video frame of a video image encoded in that bitstream. As shown schematically in FIG. 1, each frame 10 can be considered to be formed of a series of macroblocks (MBs) 20 arranged in a series of rows and columns. Each macroblock represents a block of adjacent pixels, for example a rectangular 16×16 array of pixels. However, for the purposes of the present application, the term macroblock is not intended to imply any particular size of block, and accordingly larger or smaller arrays of pixels can be represented by each macroblock.
The encoded bitstream will contain a sequence of macroblocks in encoded form, and each of those macroblocks will need to be decoded in order to derive the pixel data for each video frame of a video image. Such a bitstream is illustrated in FIG. 2, where the bitstream 100 comprises a sequence of macroblocks 105, 110, 115, 120, 125, 130 in encoded form. Due to the content represented by each macroblock, macroblocks may be of significantly different sizes, as illustrated schematically in FIG. 2. Further, there will be dependencies between various macroblocks, meaning that the content represented by one macroblock can only be fully decoded once one or more other macroblocks have been decoded. Accordingly, it is known to split the decoding operation into two stages. A first stage is a parsing stage, in which each of the encoded macroblocks in the bitstream are partially decoded in their received sequence in order to remove any bitstream encoding dependencies between the macroblocks. This creates a partially decoded bitstream in an intermediate form which is then input to a second stage of decode referred to as a pipe stage, where individual macroblocks are fully decoded in order to determine the pixel data represented by those macroblocks. Due to the removal of bitstream encoding dependencies, the intermediate form is a representation which can be interpreted without being read in the same macroblock order as the original bitstream. It should be noted however that there are typically still other types of dependencies, such as intra prediction pixel dependencies, which are not removed by the parsing stage.
Once the parsing stage has been performed to generate the intermediate form where any bitstream encoding dependencies between the macroblocks have been removed, the amount of time taken to process each macroblock within the pipe stage, for example to perform inverse transform operations, motion compensation operations, etc, is predictable, i.e. relatively fixed. However, the time taken to parse any particular macroblock can vary very significantly, since the amount of processing required to perform the parsing operation will vary from macroblock to macroblock dependent on the content and hence sizes of those macroblocks. Thus, if the parse stage is to be run in synchronisation with the pipe stage, then the parse circuitry needs to be able to cope in real time with the significant variation in complexity of the parsing operation on a macroblock by macroblock basis, leading to a significant increase in the cost and complexity of the parser circuitry.
Accordingly, as shown in FIG. 3, it is known to provide a buffer to buffer the output from the parser circuitry prior to it being forwarded to the pipe circuitry, in order to allow the activities of the parser circuitry to be decoupled in time from the activities of the pipe circuitry. Thus, as shown in FIG. 3, the parser circuitry 55 receives the encoded bitstream 50 and performs a partial decode operation in order to produce an intermediate form 60 in which any bitstream encoding dependencies between the macroblocks have been removed. A buffer 65 is then provided to buffer the intermediate form for multiple macroblocks. Thereafter the intermediate form for selected macroblocks 70 can be read out to the pipe circuitry 75, where a full decode operation can be performed.
By providing buffering between the parse circuitry and the pipe circuitry, this alleviates the earlier mentioned constraints on the parser circuitry, allowing for a simpler and more cost effective implementation of the parser circuitry. Furthermore, whilst the parsing operation is an inherently serial operation where each macroblock in turn in the bitstream is subjected to the parsing operation, the full decode operation performed by the pipe circuitry need not be performed in such a sequential manner. Indeed, as shown by the schematic illustration 80 in FIG. 3, multiple non-sequential macroblocks can be processed in parallel within the pipe circuitry, the illustration 80 showing a two-row parallel progression where one macroblock in a first row and another macroblock in a second row (the location of that macroblock being staggered with respect to the location of the first macroblock) are processed in parallel. Such row parallel progression gives rise to significant performance benefits, as is understood in the art. In particular, macroblocks can take a variety of different forms, for example I-macroblocks and P-macroblocks. For an I-macroblock, the macroblock immediately to the left needs to have been decoded before it is possible to fully decode the I-macroblock, and hence it would not be possible to decode two adjacent I-macroblocks in a particular row in parallel. Further, for P-macroblocks, the decoding of these macroblocks involves the use of reference frame data when performing motion compensation, and often that reference frame data is locally cached. By processing macroblocks in parallel in the way illustrated by the element 80 in FIG. 3, improved locality within the cache can be achieved (i.e. there is a significant re-use of the cached data for the macroblocks being processed).
The article “Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding” by F Seitner et al, MoMM 2008, Nov. 24-26, 2008, Lintz, Austria, describes a variety of techniques for arranging multiple processing units to perform the pipe stage of a video decode operation on multiple macroblock rows in parallel. However, as discussed in section 5.3, all of the data obtained by parsing the bitstream (i.e. the intermediate form referred to in FIG. 3) must be kept in a buffer until no longer required by the pipe circuitry, and this can lead to buffers of significant size due to the large amount of intermediate data that needs to be retained. In particular, if the buffer is retained internally within the decode engine (i.e. the unit providing the parser circuitry and pipe circuitry), the size of the buffer required is likely to have a significant impact on the area and hence cost of the decoder. If instead the buffer is provided externally to the decode engine, then this can give rise to significant bandwidth issues due to the need to write the intermediate data from the parser circuitry into the buffer and then to read that data from the buffer when subsequently required by the pipe circuitry.
The article “A Multi-Standards HDTV Video Decoder for Blu-Ray Disc Standard” by N Minegishi et al, Mitsubishi Electric Research Laboratories, TR2008-004, April 2008, describes an HDTV video decoder that employs a data compression method to reduce memory data usage and access bandwidth. As described earlier, the decode operation is divided into two parts, namely a parse stage (referred to in the article as the VLC decode section) and a pipe stage (referred to in the article as a pixel operation section), and the intermediate form output from the parse stage is compressed prior to storage in an external buffer. The intermediate form is then later decompressed when retrieved by the pipe stage. The compression technique described is based upon exponential Golomb codes.
Whilst compression of the intermediate form can alleviate the earlier mentioned problems with regard to buffer size and bandwidth requirements, a trade-off needs to be made between the complexity and cost introduced by the compression, and the buffer size/bandwidth reduction resulting from the performance of the compression.
Accordingly, it would be desirable to provide an improved mechanism for decoupling the parsing stage and the pipe stage of a video decoding operation, whilst alleviating the above described disadvantages of the known buffering techniques.