The invention relates to video data processing systems and methods, and in particular to video coding (encoding and/or decoding) systems and methods.
Commonly-used video encoding methods are based on MPEG (Moving Pictures Experts Group) standards such as MPEG-2, MPEG-4 (MPEG 4 Part 2) or H.264 (MPEG 4 Part 10, or AVC). Such encoding methods typically employ three types of frames: I- (intra), P- (predicted), and B- (bidirectional) frames. An I-frame is encoded spatially using data only from that frame (intra-coded). P- and B-frames are encoded using data from the current frame and other frames (inter-coded). Inter-coding involves encoding differences between frames, rather than the full data of each frame, in order to take advantage of the similarity of neighboring frames in typical video sequences. A P-frame employs data from one or more preceding frames in display order. A B-frame employs data from preceding and/or subsequent frames. Frames used as a reference in encoding other frames are commonly termed anchor or reference frames. In methods using the MPEG-2 standard, I- and P-frames can serve as anchor frames. In methods using the H.264 standard, I-, P-, and B-frames can serve as anchor frames. In methods using the H.264 standard, each macroblock in a frame may be predicted from a corresponding macroblock in any one of a number (e.g. 16) of anchor frames, and/or from another macroblock in the same frame. Different macroblocks in a frame may be encoded with reference to macroblocks in different anchor frames.
Inter-coded (P-and B-) frames may include both intra-coded and inter-coded blocks. For any given inter-frame block, the encoder may calculate the bit cost of encoding the block as an intra-coded block or as an inter-coded block. In some instances, for example in parts of fast-changing video sequences, inter-encoding may not provide encoding cost savings for some blocks, and such blocks can be intra-encoded. If inter-encoding provides desired encoding cost savings for a block, the block is inter-encoded.
Each frame is typically divided into multiple non-overlapping rectangular blocks. Blocks of 16×16 pixels are commonly termed macroblocks. Other block sizes used in encoders using the H.264 standard include 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 pixels. For each block in a frame, an encoder may search for a corresponding, similar block in that frame's anchor frames or in the frame itself. If a sufficiently similar block is not found, the current block is encoded non-predictively, without reference to external data. If a similar block is found, the MPEG encoder stores residual data representing differences between the current block and the similar block, as well as motion vectors identifying the difference in position between the blocks. The residual data is converted to the frequency domain using a transform such as a discrete cosine transform (DCT). The resulting frequency-domain data is quantized and variable-length (entropy) coded before storage/transmission.
Quantizing the data involves reducing the precision used to represent various frequency coefficients, usually through division and rounding operations. Quantization can be used to exploit the human visual system's different sensitivities to different frequencies by representing coefficients for different frequencies with different precisions. Quantization is generally lossy and irreversible. A quantization scale factor or quantization parameter QP can be used to control system bitrates as the visual complexity of the encoded images varies. Such bitrate control can be used to maintain buffer fullness within desired limits, for example. The quantization parameter is used to scale a quantization table, and thus the quantization precision. Higher quantization precisions lead to locally increased bitrates, and lower quantization precisions lead to decreased bitrates.
Designers of video encoding/decoding systems normally balance multiple constraints, including system bandwidth, channel error rates, distortion/image quality, various syntactical and other constraints imposed by video encoding standards, and/or processing and power resources required on the encoder and decoder sides. Video decoders, typically used for playback, tend to be used in higher numbers than encoders, which are used for recording or other video encoding. Moreover, playback devices having video decoders are often of lower-cost than recording or other devices including encoders. Video encoders tend to be more complex and costly than video decoders, and the computing resources available on the decoder side are often more scarce than those available on the encoder side. As a result, system designers often try particularly hard to minimize the processing resources required by video decoders. At the same time, emerging applications including mobile wireless video devices pose new challenges to system designers attempting to maximize perceived image quality in environments with limited bandwidth and available processing power.
Some video encoding/decoding systems allow the encoder to skip transmission of certain data, which is to be recovered by the decoder. For example, in the article “Geometric-Structure-Based Error Concealment with Novel Applications in Block-Based Low-Bit-Rate Coding,” IEEE Transactions on Circuits and Systems for Video Technology, 9(4):648-665, June 1999, Zeng et al. describe a system in which the encoder intentionally skips transmission of certain macroblocks to the decoder. The decoder uses a pre-set spatial directional interpolation scheme to recover the data of a skipped macroblock using data from neighboring macroblocks.