Field of the Invention
The present invention relates to decoding an encoded video bitstream, in particular where the encoded video bitstream represents frames of video data encoded in rows of macroblocks.
Description of the Prior Art
Contemporary video encoding techniques, such as those represented by the H.264 “Advanced Video Coding” standard, provide for highly efficient encoding of video data. As such, the decoding process for decoding an encoded video bitstream will typically comprise several stages, from initially interpreting the bit pattern of the bitstream, through extracting information related to individual macroblocks, to reconstructing an entire frame of video data on the basis of those macroblocks.
In some known video decoders, the video decoding process is split into two phases, a first parsing phase in which the received encoded video bitstream is initially interpreted in order to generate macroblock information, and a second pipelined stage in which the macroblock information is processed and combined to reconstruct individual frames of video data.
Whilst it may be the case that the macroblocks in an encoded video bitstream are represented in that bitstream in the same order as those macroblocks appear in the frames of video data (i.e. in raster scan order), video encoding standards such as H.264 permit a Flexible Macroblock Order (FMO), wherein the order in which the macroblocks are encoded in the encoded video bitstream does not correspond to raster scan order. This may for example occur when a frame of video data is encoded in more than one slice, wherein those slices overlap one another. For example, a checkerboard pattern of slices is possible, wherein alternate macroblocks belong to two separate slices of encoded video data. In other words, a video decoding apparatus receiving such an encoded video bit stream will first receive information related to odd (say) numbered macroblocks for the frame, followed by information related to the even numbered macroblocks of that frame. Such interleaving of macroblock slices has advantages in terms of error resilience, because even if a slice of video data is lost in transmission, a reasonable approximation to the original frame of video data may nonetheless be reconstructed by interpolating to generate the missing data. For this reason, these techniques find application in environments where the transmission medium is known to be lossy, but where the absolute quality of the reconstructed video data is of lesser importance, such as in mobile video conferencing.
However, allowing FMO in the encoded video bitstream presents the video decoding apparatus with an increased level of complexity. In particular, a problem arises when the parsing of the encoded video bitstream occurs most efficiently when it is carried out in bitstream order. This can for example be because each slice of encoded video may make reference to itself in its encoding, for example a given macroblock row in one slice may refer to the previous row in that slice. Hence, for the parser, it is most efficient if the identified items of macroblock information are written to memory in bitstream order, such that as the parsing process continues, the parser may easily make reference to earlier identified macroblocks of the same slice. Furthermore, handling the slice as it is received enables the parser to maintain a consistent context for its entropy decoder, thus avoiding the extra bandwidth and processing associated with context switching between slices.
On the other hand, the reconstruction of the identified macroblocks into the frames of video data taking place in the reconstruction pipeline happens most efficiently when it is carried out in raster scan order. This is because it is the raster scan order that defines where the macroblocks lie in relation to one another in the final frames of video data and sequential access to these macroblocks typically permits the most efficient use of, for example, the frame buffer and the motion cache.
FIG. 1 schematically illustrates a known video decoding apparatus 100 in which a video engine 110 comprising a parser 120 and a reconstruction pipeline 130 receives an encoded video bitstream and decodes it, typically for display. The video engine 110 is coupled to memory 140 which the parser 120 and reconstruction pipeline 130 make use of in performing their parts in the decoding operation. As can be seen in FIG. 1, both the parser 120 and reconstruction pipeline 130 are arranged with feedback paths, wherein some of the information they output is fed back as an input (NB these feedback paths are merely schematic, the use of previously output information as an input typically taking place by means of an access to memory 140). It is in particular these feedback loops which determine that parser 120 operates most efficiently in a bitstream order and reconstruction pipeline 130 operates most efficiently in raster scan order.
One approach to unify this situation is to configure the parser to parse the encoded video bitstream in raster scan order. Although this simplifies the overall control of the video encoder, it has the disadvantage that the parsing cannot start until the bitstream for the entire frame has been received, which increases the latency of the video decoder. Furthermore when interleaved slices are received, the process of switching between different slices on a macroblock by macroblock basis involves the above mentioned context switching in the entropy decoder and consequently increases memory access bandwidth.
Alternatively, it would be possible to allow the parser to operate in bitstream order, and also to perform some of the reconstruction in bitstream order, namely decoding slices into pixels, followed by running the deblocking as a second pass. However, this has the disadvantage that the access to the frame buffer (which is the highest bandwidth access in the process) is no longer sequential and therefore inefficient. Furthermore the motion cache in the reconstruction pipeline is also poorly utilised.
Some background information on the technological issues involved can be found in the Wikipedia article “Arbitrary slice ordering” (retrieved from http://en.wikipedia.org/wiki/Arbitrary_slice_ordering on 26 Mar. 2010). The paper “Macroblock-level decoding and deblocking method and its pipeline implementation in H.264 decoder SOC design” Wang S. et al. (Journal—Zhejiang University Science A 2007, Vol. 8, Number 1, pages 36-41) is concerned with the problems raised by FMO and presents a multi-stage video decoder which allows the later phases to run in raster scan order. However, the approach taken involves searching the input bitstream which can be very costly in the case of an entropy encoded bitstream (e.g. CABAC), and involves switching between slices which is expensive in terms of memory accesses.
Accordingly, it would be desirable to provide a technique which enabled each stage of a video decoder to operate in an efficient configuration, avoiding the above described disadvantages of handling data in an order preferred by another part of the video decoder.