The invention relates to data processing systems and methods, and in particular to video encoding systems and methods.
Commonly-used video encoding methods are based on MPEG (Moving Pictures Experts Group) standards such as MPEG-2, MPEG-4 (MPEG 4 Part 2) or H.264 (MPEG 4 Part 10). Such encoding methods typically employ three types of frames: I—(intra), P—(predicted), and B—(bidirectional) frames. An I-frame is encoded spatially using data only from that frame (intra-coded). P- and B-frames are encoded using data from the current frame and other frames (inter-coded). Inter-coding involves encoding differences between frames, rather than the full data of each frame, in order to take advantage of the similarity of neighboring frames in typical video sequences. A P-frame employs data from one other frame, often a preceding frame in display order. A B-frame employs data from two other frames, which may be preceding and/or subsequent frames. Frames used as a reference in encoding other frames are commonly termed anchor frames. In methods using the MPEG-2 standard, I- and P-frames can serve as anchor frames. In methods using the H.264 standard, I-, P-, and B-frames can serve as anchor frames. In methods using the H.264 standard, each macroblock in a frame may be predicted from a corresponding macroblock in any one of a number (e.g. 16) of anchor frames, and/or from another macroblock in the same frame. Different macroblocks in a frame may be encoded with reference to macroblocks in different anchor frames.
Inter-coded (P- and B-) frames may include both intra-coded and inter-coded blocks. For any given inter-frame block, the encoder may calculate the bit cost of encoding the block as an intra-coded block or as an inter-coded block. In some instances, for example in parts of fast-changing video sequences, inter-encoding may not provide encoding cost savings for some blocks, and such blocks can be intra-encoded. If inter-encoding provides desired encoding cost savings for a block, the block is inter-encoded.
Each frame is typically divided into multiple non-overlapping rectangular blocks. Blocks of 16×16 pixels are commonly termed macroblocks. Other block sizes used in encoders using the H.264 standard include 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 pixels. For each block in a frame, an encoder may search for a corresponding, similar block in that frame's anchor frames or in the frame itself. If a sufficiently similar block is not found, the current block is intra-coded. If a similar block is found, the MPEG encoder stores residual data representing differences between the current block and the similar block, as well as motion vectors identifying the difference in position between the blocks. The difference data is converted to the frequency domain using a transform such as a discrete cosine transform (DCT). The resulting frequency-domain data is quantized and variable-length (entropy) coded before storage/transmission.
Quantizing the data involves reducing the precision used to represent various frequency coefficients, usually through division and rounding operations. Quantization can be used to exploit the human visual system's different sensitivities to different frequencies by representing coefficients for different frequencies with different precisions. Quantization is generally lossy and irreversible. A quantization scale factor or quantization parameter QP can be used to control system bitrates as the visual complexity of the encoded images varies. Such bitrate control can be used to maintain buffer fullness within desired limits, for example. The quantization parameter is used to scale a quantization table, and thus the quantization precision. Higher quantization precisions lead to locally increased bitrates, and lower quantization precisions lead to decreased bitrates.
Determining a quantization parameter for each block to be encoded can be a computationally-intensive process. The choice of quantization parameters affects both system bitrates and distortion, and optimizing quantization parameter choices to achieve simultaneously desired bitrates and distortion characteristics may require computationally-complex steps. Such computational complexity may be of particular concern in systems subject to power limitations, such as mobile video devices.