The video coding standard H.264/MPEG-4 AVC is intended to achieve much higher coding efficiency than previous video coding standards at the expense of increased encoding complexity. Macroblock adaptive frame/field (MBAFF) is one of the distinguishing features of this standard. MBAFF provides a choice of either frame or field structures on a macroblock pair basis. MBAFF improves coding efficiency especially for interlaced video sequences. However, a straightforward implementation of MBAFF that selects the better structure after finding the optimal macroblock prediction modes with both structures doubles the encoding computational complexity. The present invention accelerates the decision. First, this invention estimates more suitable structure using sum of absolute difference (SAD) between picture samples and their mean. Second, this invention uses the correlation observed between the optimal macroblock prediction modes found with the inferred structure and the probability of the non-inferred structure being actually better than the inferred one. The present invention can significantly reduce the computational complexity at the cost of slight degradation of coding efficiency.
The H.264/MPEG-4 AVC standard includes support for both interlaced and progressive video sequences. The interlaced video sequences include fields that are sampled with half vertical resolution. Temporally consecutive fields are vertically interleaved line by line (i.e. horizontal-line by horizontal-line) to form a frame. To support interlaced video sequences, the standard provides a choice of frame or field structures on a frame-to-frame basis. Encoding progressive video sequences with the frame structure and interlaced video sequences with the field structure generally maximizes coding efficiency. Recent video coding standards commonly divide pictures into square macroblocks (MBs) of 16×16 pixels and code with the macroblocks.
The H.264/MPEG-4 AVC standard allows the choice of field/frame structure at the macroblock level as well as at the frame level. This is called macroblock adaptive frame/field (MBAFF). When MBAFF is enabled, vertically neighboring macroblocks form a macroblock pair as illustrated in FIG. 1A. The top and the bottom macroblocks are illustrated for frame structure in FIG. 1B and for field structure in FIG. 1C. The use of MBAFF leads to an improvement in coding efficiency even though the determined structures of some macroblock pairs are different from that of the video sequence.
The H.264/MPEG-4 AVC standard is intended to achieve higher coding efficiency than previous video coding standards through an extensive variety of intra/inter prediction modes with other coding tools. Table 1 lists a brief classification and explanation of the prediction modes.
TABLE 1Prediction Modes in H.264/MPEG-4 AVCPredictionModeDescriptionB-DirectInter prediction applied; no motioninformation codedInter_16x16Inter prediction applied for whole macroblockInter_16x8Macroblock partitioned into two 16 × 8 blocks;then inter prediction applied respectivelyInter_8x16Macroblock partitioned into two 8 × 16 blocks;then inter prediction applied respectivelyInter_8x8Macroblock partitioned into four 8 × 8sub-macroblocks; macroblocks sub-dividedfurther into 8 × 4, 4 × 8, 4 × 4 or remains 8 × 8;then inter prediction applied respectivelyB/P_SkipInter prediction applied; neither motioninformation nor residual data is codedIntra_16x16Inter prediction for whole 16 × 16 macroblockIntra_4x4Macroblock partitioned into sixteen 4 × 4sub-macroblocks; then inter predictionapplied respectivelyI_PCMMacroblock sample is coded without anytransformation compression
The encoding process must find the optimal prediction 5 mode among the large varieties for best encoding efficiency. Inter-prediction includes the process of finding the optimal macroblock/sub-macroblock partitioning and finding the optimal motion vector for each block. It is a computationally expensive task to find the optimal prediction mode for a macroblock. In most cases when the encoder runs on a PC more than 90% of total encoding time is consumed by this task. During encoding all possible prediction modes for a macroblock are tested and the best one is chosen base on a cost measure. The cost function generally has the form:COST=D+λ*R  (1)where: D is the distortion between the pixels of the macroblock to be coded and its prediction, which is usually measured in SAE (sum of absolute error) or SSE (sum of square error); R is the bit-length required to convey the prediction information such as prediction mode and motion vectors; and λ is a tradeoff parameter between the distortion and the bit length. Encoding a macroblock determines the optimal prediction mode and the associated minimal COST.
FIG. 2 illustrates a flowchart of a straight forward and exhaustive way to determine the frame/field structure of a macroblock pair. The method begins at begin block 201. The method searches for the optimal prediction modes of the top and the bottom macroblocks with the frame structure in block 202 and with the field structure in block 203. Decision block 204 calculates a cost function. In block 204: COSTT,FRM is the COST function for the top macroblock with frame structure; COSTB,FRM represent the COST function the bottom macroblock with the frame structure; COSTT,FLD is the COST function for the top macroblock with field structure; and COSTB,FLD is the COST function the bottom macroblock with the field structure. The method then selects the better structure based on the COST query 204. The field structure is selected in block 205 and the frame structure is selected in block 206. A post process block 207 checks the skip prediction mode as described below. Block 208 encodes the macroblock pair based on the frame/field decision made as a result of query 204. This exhaustive method must find the optimal prediction mode twice for the top and the bottom macroblocks with each of the frame and the field structures. The worse one of the two results is discarded regardless of its computational cost. Hence, a method to make an early decision of the structure with only a slight degradation of coding efficiency is desirable.