Video picture data can be compressed for storage and transmission using various compression standards [i.e., MPEG-2, MPEG-4, H.264 and VC-1 (formerly known as VC-9)]. A video picture is divided into macroblocks that are encoded and placed in a compressed bit stream. The H.264 standard provides a macroblock-adaptive field frame (MBAFF) encoding mode. In the MBAFF encoding mode, pairs of vertically adjacent macroblocks are encoded and placed consecutively in the bit stream. As used herein, the term “macroblock (pair)” (or macroblock-pair) refers to a pair of vertically adjacent macroblocks for H.264 MBAFF and a single macroblock for non-H.264 or non-MBAFF encoding.
Advanced video encoder/decoders (CODECs), such as H.264, VC-1, etc., allow improved compression by processing between macroblock (pair) rows. For example, motion vector prediction between macroblock (pair) rows saves on the number of bits needed to compress motion vectors. Statistical context between macroblock (pair) rows saves on the number of bits used when context based entropy coding is used (e.g., as in context-adaptive binary arithmetic coding (CABAC) or context-adaptive variable length code (CAVLC) in H.264 and some header VLCs in VC-1). Intra prediction (H.264) or AC prediction (VC-1 and MPEG-4) across macroblock (pair) rows saves on the number of bits used to encode intra macroblocks. Deblocking between macroblock (pair) rows improves the subjective quality of the picture being encoded and makes the picture being encoded a better reference for other pictures.
For H.264, VC-1, MPEG-2 and MPEG-4 processing, the technique of motion vector prediction between macroblock (pair) rows cannot be used across slice boundaries. For H.264 and VC-1, the technique of statistical context between macroblock (pair) rows cannot be used across slice boundaries. For H.264, VC-1 and MPEG-4, the technique of intra prediction or AC prediction across macroblock (pair) rows cannot be used across slice boundaries. In order to get the advantage of the above techniques, multiple rows need to be coded across slice boundaries. For VC-1, deblocking cannot be used across slice boundaries, whereas for H.264 deblocking may be performed across slice boundaries.
Referring to FIG. 1, a block diagram illustrating a conventional parallel encoder 10 is shown. The parallel encoder 10 includes a picture slicer 12, an arbitrary number of encoder chips 14a-14n, and a multiplexer 16. The picture slicer 12 divides a video picture in the video signal VIDEO into a number of slices or strips (i.e., slice 0 through slice n) equal to the number of chips. Each of the encoder chips 14a-14n encodes a corresponding one of the slices, slice 0 through slice n. The multiplexer 16 combines the encoded slices to produce a compressed bit stream.
Referring to FIG. 2, a block diagram is shown illustrating a conventional parallel encoding scheme using a conventional parallel encoder with five chips. Each picture is broken up-into five pieces (i.e., horizontal strips or slices) that are the full width of the picture. During each picture time, each horizontal strip is encoded in normal (raster) order by the corresponding encoder chip. The strips are encoded in parallel (i.e., at the same time). Since the chips are encoding in parallel, the bottom of the area encoded by chip i is not available when chip i+1 starts encoding. Therefore, the slices cannot span chip boundaries and deblocking cannot be performed across chip boundaries.
It would be desirable to implement a method and/or apparatus to provide an efficient parallel video encoder that allows whole pictures to be deblocked and/or compressed as a single slice.