The invention relates to video data processing systems and methods, and in particular to video coding (encoding/decoding) systems and methods.
Commonly-used video coding methods are based on MPEG (Moving Pictures Experts Group) standards such as MPEG-2, MPEG-4 (MPEG 4 Part 2) or H.264 (MPEG 4 Part 10). Such coding methods typically employ three types of frames: I- (intra), P- (predicted), and B- (bidirectional) frames. An I-frame is encoded spatially using data only from that frame (intra-coded). P- and B-frames are encoded using data from the current frame and/or other frames (inter-coded). Inter-encoding involves encoding differences between frames, rather than the full data of each frame, in order to take advantage of the similarity of spatially and/or temporally proximal areas in typical video sequences. Some encoding methods also use intra-frame predictions to encode data differentially with respect to prediction data from the same frame.
Each frame is typically divided into multiple non-overlapping rectangular blocks. Blocks of 16×16 pixels are commonly termed macroblocks. Other block sizes used in encoders using the H.264 standard include 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 pixels. To encode a block predictively, a typical MPEG encoder searches for a corresponding similar block (prediction) in one or more reference frames. If a similar block is found, the MPEG encoder stores residual data representing differences between the current block and the similar block, as well as motion vectors identifying the difference in position between the blocks. The residual data is converted to the frequency domain using a transform such as a discrete cosine transform (DCT). The resulting frequency-domain data is quantized and variable-length (entropy) coded before storage/transmission. During decoding, the data of a block of interest is generated by summing decoded residual and prediction data.
Some video sequences may be encoded as a series of complete frames (progressive sampling), or as a sequence of interlaced fields (interlaced sampling). An interlaced field includes either the odd-numbered or the even-numbered lines in a frame. A video encoder may encode macroblocks in a frame DCT mode, in which each block is frequency-transformed as a whole, or in a field DCT mode, in which the luminance samples from field 1 are placed in the top half of the macroblock and the samples from field 2 are placed in the bottom half of the macroblock before frequency-domain transfer. In a field motion compensation mode, the data of the two fields in a macroblock may be motion-compensated separately; in such a mode, each macroblock has two associated motion compensation vectors—one for each field. The type of encoding (field or frame) may be specified for each frame or slice. In a macroblock-adaptive frame/field (MBAFF) encoding mode, the type of encoding (field or frame) is specified at the macroblock level. In MBAFF encoding, data may be encoded using vertically-stacked macroblock pairs, 16×32 pixels each. Each macroblock pair may be encoded in a frame mode, with the two macroblocks in the pair encoded separately, or in a field mode, with the 16×16 field 1 of the macroblock pair and the 16×16 field 2 of the macroblock pair encoded separately.
Searching for a prediction for a current macroblock is commonly performed in a search window, which is a sub-area of a reference frame. A search window may be a horizontal stripe or band vertically centered about the current macroblock position, and may include tens or hundreds of macroblocks. Accessing (reading/writing) prediction data to and from memory may require relatively high memory bandwidths.