The invention relates to data processing systems and methods, and in particular to video encoding systems and methods.
Commonly-used video encoding methods are based on MPEG (Moving Pictures Experts Group) standards such as MPEG-2, MPEG-4 (MPEG 4 Part 2) or H.264 (MPEG 4 Part 10). Such encoding methods typically employ three types of frames: I- (intra), P- (predicted), and B- (bidirectional) frames. An I-frame is encoded spatially using data only from that frame (intra-coded). P- and B-frames are encoded using data from the current frame and other frames (inter-coded). Inter-coding involves encoding differences between frames, rather than the full data of each frame, in order to take advantage of the similarity of neighboring frames in typical video sequences. A P-frame employs data from one other frame, often a preceding frame in display order. A B-frame employs data from two other frames, which may be preceding and/or subsequent frames. Frames used as a reference in encoding other frames are commonly termed anchor frames. In methods using the MPEG-2 standard, I- and P-frames can serve as anchor frames. In methods using the H.264 standard, I-, P-, and B-frames can serve as anchor frames.
Each frame is typically divided into multiple non-overlapping rectangular blocks. Blocks of 16×16 pixels are commonly termed macroblocks. Other block sizes used in encoders using the H.264 standard include 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 pixels. For each block in an inter-frame, a typical MPEG encoder searches for a corresponding, similar block in that inter-frame's anchor frame(s). If a sufficiently similar block is not found in the anchor frames, then the current block is intra-coded. If a similar block is found, the MPEG encoder stores residual data representing differences between the current block and the similar block in the anchor frame, as well as motion vectors identifying the difference in position between the blocks. The difference data is converted to the frequency domain using a transform such as a discrete cosine transform (DCT). The resulting frequency-domain data is quantized and variable-length (entropy) coded before storage/transmission.
Quantizing the data involves reducing the precision used to represent various frequency coefficients, usually through division and rounding operations. Quantization can be used to exploit the human visual system's different sensitivities to different frequencies by representing coefficients for different frequencies with different precisions. Quantization is generally lossy and irreversible. A quantization scale factor MQuant or quantization parameter Q can be used to control system bitrates as the visual complexity of the encoded images varies. Such bitrate control can be used to maintain buffer fullness within desired limits, for example. The quantization parameter is used to scale a quantization table, and thus the quantization precision. Higher quantization precisions lead to locally increased bitrates, and lower quantization precisions lead to decreased bitrates.
MPEG frames are typically organized in groups-of-pictures (GOPs). A GOP includes at least one I-frame, which is normally the first frame in the GOP. A closed GOP is one in which all predictions take place within the GOP; inter-frames do not use data from frames outside the GOP. Some MPEG applications may also use an open GOP structure, such as I-B-I-B-I- . . . . A closed GOP structure facilitates separating a bit stream into independently-decodable discrete parts.
FIG. 1 illustrates an exemplary frame sequence 20, shown in display order, including a closed GOP 22 and an immediately subsequent I-frame 24. GOP 22 includes an I-frame 26a and a number of subsequent B- and P-frames. A first P-frame 26c is inter-coded with reference to I-frame 26a, while a second P-frame 26d is inter-coded with reference to first P-frame 26c. A B-frame 26b is inter-coded with reference to I-frame 26a and P-frame 26c. 
Inter-coded (P- and B-) frames may include both intra-coded and inter-coded blocks. For any given inter-frame block, the encoder can calculate the bit cost of encoding the block as an intra-coded block or as an inter-coded block. In some instances, for example in parts of fast-changing video sequences, inter-encoding may not provide encoding cost savings for some blocks, and such blocks can be intra-encoded. If inter-encoding provides desired encoding cost savings for a block, the block is inter-encoded.
Each inter-encoded block in a P-frame may be encoded with reference to a block in a preceding or subsequent frame, in display (temporal) order. Each inter-encoded block in a B-frame may be encoded with reference to one or two other frames. The reference frames can be before and/or after the frame to be encoded, in display order. If two reference frames are used, several temporal order combinations are possible: one past and one future reference frame, two past reference frames, and two future reference frames.
While MPEG-based video coding methods can yield remarkably-accurate representations of motion pictures, the human visual system can sometimes detect imperfections or artifacts in MPEG-encoded video sequences. Such artifacts can become particularly noticeable at relatively low system bandwidths.