Inter and intra coding methods can both be used to encode interframes (P and B frames) in video compression standards. Intra coding uses spatial correlation while inter coding uses temporal correlation from previously coded frames. In general, inter coding is used for macroblocks that are well predicted from previous pictures, and intra coding is used for macroblocks that are not well predicted from previous pictures, or for macroblocks with low spatial activity. Typically, an encoder may make an inter/intra coding decision for each macroblock, slice, picture, field, and/or frame based on coding efficiency and subjective quality considerations. In the JVT/H.264/MPEG AVC (“H.264”) standard, inter coding allows various block partitions and multiple reference pictures to be used for predicting a macroblock.
The H.264 standard uses tree-structured hierarchical macroblock partitions. Inter-coded 16×16 pixel macroblocks may be further broken into macroblock partitions, of sizes 16×8, 8×16, or 8×8. Macroblock partitions of 8×8 pixels are also known as sub-macroblocks. Sub-macroblocks may be further broken into sub-macroblock partitions, of sizes 8×4, 4×8, and 4×4. An encoder may select how to divide the macroblock into partitions and sub-macroblock partitions based on the characteristics of a particular macroblock, in order to maximize compression efficiency and subjective quality.
Furthermore, the H.264 standard also supports INTRA, SKIP and DIRECT modes. Intra modes allow three types: INTRA4×4, INTRA16×16, and INTRA8×8 which is a Fidelity Range extensions mode only. INTRA4×4 and INTRA8×8 support 9 prediction modes: vertical; horizontal; DC, diagonal down/left; diagonal down/right; vertical-left; horizontal-down; vertical-right; and horizontal-up prediction. INTRA16×16 supports 4 prediction modes: vertical; horizontal; DC; and plane prediction.
Multiple reference pictures may be used for inter-prediction, with a reference picture index coded to indicate which of the multiple reference pictures is used. In P pictures (or P slices), only single directional prediction is used, and the allowable reference pictures are managed in list 0. In B pictures (or B slices), two lists of reference pictures are managed, list 0 and list 1. In B pictures (or B slices), single directional prediction using either list 0 or list 1 is allowed, or bi-prediction using both list 0 and list 1 is allowed. When bi-prediction is used, the list 0 and the list 1 predictors are averaged together to form a final predictor.
Thus, in the H.264 standard, four different types of inter-picture predictions are supported for B slices: list 0, list 1, bi-predictive, and direct prediction. While list 0 prediction indicates that the prediction is based on a picture of the first reference picture buffer, a picture of the second reference picture buffer is used for the prediction if list 1 prediction is used. In the bi-predictive mode, the prediction signal is built by using both list 0 and list 1 prediction signal. The direct prediction mode is inferred from previously transmitted syntax element(s) and can be either list 0 or list 1 prediction or bi-predictive. B slices allow various block partitions (more specifically 16×16, 16×8, 8×16, and 8×8 for a macroblock) to be used for predicting a 16×16 macroblock. Additionally, for each block partition, the prediction mode (list 0, list 1, bi-predictive) can be chosen separately. For a block coded in direct prediction, if no error signal is transmitted, then the coding is also referred to as SKIP mode and the block can be coded very efficiently.
For the H.264 standard, each macroblock partition may have an independent reference picture index, prediction type (list 0, list 1, bipred), and an independent motion vector. Each sub-macroblock partition may have independent motion vectors, but all sub-macroblock partitions in the same sub-macroblock use the same reference picture index and prediction type.
For inter-coded macroblocks, besides the above macroblock partition, P frame also supports SKIP mode, while B frame supports both SKIP mode and DIRECT mode. In SKIP mode, no motion and residual information are encoded. The motion information for a SKIP macroblock is the same as a motion vector predictor specified by the picture/slice type (P or B), and other information such as sequence and slice level parameters, and is related to other temporally or spatial adjacent macroblocks and its own macroblock position within the slice. In contrast, in DIRECT mode, no motion information is encoded, but prediction residue is encoded. Both macroblocks and sub-macroblocks support DIRECT mode.
As for mode decision, inter pictures need to support both inter and intra modes. Intra modes include INTRA4×4 and INTRA16×16. For P pictures, inter modes include SKIP and 16×16, 16×8, 8×16 and sub-macroblock 8×8 partitions. 8×8 further supports 8×8, 8×4, 4×8 and 4×4 partitions. For B pictures, both list 0 and list 1 and DIRECT mode are considered for both macroblocks and sub-macroblocks.
In the prior art, a Rate-Distortion Optimization (RDO) framework is used for mode decision. For inter modes, motion estimation is separately considered from mode decision. Motion estimation is first performed for all block types of inter modes, then the mode decision is made by comparing the cost of each inter mode and intra mode. The mode with the minimal cost is selected as the best mode.
A conventional procedure to encode one macroblock s in a P- or B-picture (hereinafter the “conventional macroblock encoding procedure”) is summarized as follows.
In a first step of the conventional macroblock encoding procedure, given the last decoded pictures, the Lagrangian multiplier λMODE, λMOTION, and the macroblock quantizer QP.
In a second step of the conventional macroblock encoding procedure, motion estimation and reference picture selection are performed by minimizingJ(REF,m(REF)|λMOTION)=SA(T)D(s,c(REF,m(REF)))+λMOTION(R(m(REF)−p(REF))+R(REF))for each reference picture and motion vector of a possible macroblock mode. In the preceding equation, m is the current motion vector being considered, REF denotes the reference picture, p is the motion vector used for the prediction during motion vector coding, c(REF, m(REF)) is the candidate macroblock that is determined by REF, m(REF), R(m-p) represents the bits used for coding motion vector and R(REF) is the bits for coding reference picture. SA(T)D denotes the Sum of Absolute (Transform) Differences between the original signal and the reference signal predicted by the motion vector.
In a third step of the conventional macroblock encoding procedure, the macroblock prediction mode is chosen by minimizing J(s,c,MODE|QP,λMODE)=SSD(s,c,MODE|QP)+λMODE˜R(s,c,MODE|QP), given QP and λMODE when varying MODE. SSD denotes the Sum of Square Differences between the original signal and the reconstructed signal. R(s,c,MODE) is the number of bits associated with choosing MODE, including the bits for the macroblock header, the motion and all DCT coefficients. MODE indicates a mode out of the set of potential macroblock modes:
                    P        ⁢                  -                ⁢        frame        ⁢                  :                                              MODE          ∈                      {                                                                                                      INTRA                      ⁢                                                                                          ⁢                      4                      ×                      4                                        ,                                          INTRA                      ⁢                                                                                          ⁢                      16                      ×                      16                                        ,                    SKIP                    ,                                                                                                                                          16                      ×                      16                                        ,                                          16                      ×                      8                                        ,                                          8                      ×                      16                                        ,                                          8                      ×                      8                                        ,                                          8                      ×                      4                                        ,                                          4                      ×                      8                                        ,                                          4                      ×                      4                                                                                            }                          ,                                B        ⁢                  -                ⁢        frame        ⁢                  :                                    MODE        ∈                              {                                                                                                      INTRA                      ⁢                                                                                          ⁢                      4                      ×                      4                                        ,                                          INTRA                      ⁢                                                                                          ⁢                      16                      ×                      16                                        ,                    BIDIRECT                    ,                    DIRECT                    ,                                                                                                                                          FWD                      ⁢                                                                                          ⁢                      16                      ×                      16                                        ,                                          FWD                      ⁢                                                                                          ⁢                      16                      ×                      8                                        ,                                          FWD                      ⁢                                                                                          ⁢                      8                      ×                      16                                        ,                                          FWD                      ⁢                                                                                          ⁢                      8                      ×                      8                                        ,                                                                                                                                          FWD                      ⁢                                                                                          ⁢                      8                      ×                      4                                        ,                                          FWD                      ⁢                                                                                          ⁢                      4                      ×                      8                                        ,                                          FWD                      ⁢                                                                                          ⁢                      4                      ×                      4                                        ,                                          BAK                      ⁢                                                                                          ⁢                      16                      ×                      16                                        ,                                                                                                                                          BAK                      ⁢                                                                                          ⁢                      16                      ×                      8                                        ,                                          BAK                      ⁢                                                                                          ⁢                      8                      ×                      16                                        ,                                          BAK                      ⁢                                                                                          ⁢                      8                      ×                      8                                        ,                                          BAK                      ⁢                                                                                          ⁢                      8                      ×                      4                                        ,                                                                                                                                          BAK                      ⁢                                                                                          ⁢                      4                      ×                      8                                        ,                                          BAK                      ⁢                                                                                          ⁢                      4                      ×                      4                                                                                            }                    .                    
The INTRA4×4 includes modes:
  MODE  ∈      {                                        vertical            ,            horizontal            ,            DC            ,                          diagonal              -                              down                /                left                                      ,                                                                          diagonal              -                              down                /                right                                      ,                          vertical              -              left                        ,                          horizontal              -              down                        ,                                                                          vertical              -              right                        ,                          horizontal              -              up                                          and INTRA16×16 includes modes: MODE ε {vetical,horizontal,DC,plane}.
With respect to the conventional macroblock encoding procedure, a conventional fast mode selection was introduced which could considerably reduce the complexity of mode decision while having little impact in quality by considering that the mode decision error surface is more likely to be monotonic and therefore if certain modes are examined first it might be simpler to find the best mode. If mode decision for a given mode is not performed, then this essentially implies that motion estimation also is not performed, the latter being the most costly part of encoding even if a fast motion estimation algorithm is used. More specifically, in this approach SKIP and 16×16 modes were examined first. According to their distortion relationship (i.e. (J(SKIP)<J(16×16)) and the availability of residual, a further decision was made whether or not to terminate the search. Otherwise, J(8×8) and J(4×4) were also computed. Based on the relationship of J(16×16), J(8×8), and J(4×4), additional decisions were made to determine which of the remaining block sizes should be tested. For example, if the distortion is monotonic (i.e., J(16×16)>J(8×8)>J(4×4) or J(16×16)<J(8×8)<J(4×4)), then it determined which additional partitions should be examined. For the first case, for example, only small partitions (8×4 and 4×8) are tested, while in the second case only 16×8 and 8×16 are examined. If the distortion is not monotonic, then all possible modes are tested.
In a different conventional fast mode decision approach, additional conditions were introduced based on the distortion values (see FIG. 1) and the relationships between different modes (see FIG. 2), which allowed the search to terminate even faster without much impact in quality.
Turning to FIG. 1, a method for motion vector and mode decision based on distortion values is generally indicated using the reference numeral 100. The method 100 includes a start block 102 that passes control to a function block 104. The function block 104 checks SKIP mode and 16×16 mode, and passes control to a decision block 106. The decision block 106 determines whether or not the distortion in SKIP mode, J(SKIP), is less than the distortion in 16×16 mode, J(16×16), and whether or not 16×16 mode has any residue. If the distortion in SKIP mode is not less than the distortion in 16×16 mode and/or 16×16 mode has a residue, then control is passed to a function block 108. Otherwise, if the distortion in SKIP mode is less than the distortion in 16×16 mode and 16×16 mode has no residue, then control is passed to a decision block 126.
The function block 108 checks 8×8 mode for a current (i.e., currently evaluated) 8×8 sub-partition, and passes control to a decision block 110 and to a function block 114. The decision block 110 determines whether or not 8×8 mode has the same motion information as 16×16 mode for the current 8×8 sub-partition. If 8×8 mode does not have the same motion information as 16×16 mode for the subject sub-partition, then control is passed to a function block 112. Otherwise, if 8×8 mode has the same motion information as 16×16 mode for the current 8×8 sub-partition, then control is passed to a function block 114.
The function block 112 checks 16×8 and 8×16 sub-partitions, and passes control to function block 114.
The function block 114 checks 4×4 mode for a current 4×4 sub-partition, and passes control to a decision block 116 and to a function block 120. The decision block 116 determines whether or not 4×4 mode has the same motion information as 8×8 mode for the current 4×4 sub-partition. If 4×4 mode does not have the same motion information as 8×8 mode for the current 4×4 sub-partition, then control is passed to a function block 118. Otherwise, if 4×4 mode has the same motion information as 8×8 mode for the current 4×4 sub-partition, then control is passed to a function block 120.
The function block 118 checks 8×4 and 4×8 sub-partitions, and passes control to function block 120.
The function block 120 checks intra modes, and passes control to a function block 122. The function block 122 selects the best mode from among the evaluated modes, and passes control to an end block 124. The end block 124 ends the macroblock encoding.
The decision block 126 determines whether or not SKIP mode has the same motion information as 16×16 mode for a current (i.e., currently evaluated) 16×16 MB. If SKIP mode does not have the same motion information as 16×16 mode for the current 16×16 MB, then control is passed to decision block 108. Otherwise, if SKIP mode has the same motion information as 16×16 mode for the current 16×16 MB, then control is passed to function 120.
Turning to FIG. 2, a method for motion vector and mode decision based on relationships between different modes is generally indicated using the reference numeral 200. The method 200 includes a start block 202 that passes control to a function block 204. The function block 204 checks SKIP mode and 16×16 mode, and passes control to a decision block 206. The decision block 206 determines whether or not MC2>T1, where MC2=min(J(SKIP), J(16×16)), the minimum distortion between SKIP mode and 16×16 mode, and T1 is the first threshold. If MC2<=T1, then control is passed to a decision block 208. Otherwise, if MC2=min(J(SKIP), J(16×16))>T1, then control is passed to a function block 210.
The decision block 208 determines whether or not MC2 is greater than T2 (a second threshold). If MC2 is not greater than T2, then control is passed to function block 210. Otherwise, if MC2 is greater than T2, then control is passed to a function block 218.
The function block 210 checks other inter modes, and passes control to a function block 212. The function block 212 checks other non-tested intra modes, and passes control to a function block 214. The function block 214 selects the best mode from among the evaluated modes, and passes control to an end block 216. The end block 216 ends the macroblock encoding.
The function block 218 checks the intra4×4 DC, and passes control to a decision block 220. The decision block 220 determines whether or not J(INTRA4×4 DC) is less than a*MC2+b, where a and b are constant. If J(INTRA4×4 DC) is not less than a*MC2+b, then control is passed to function block 210 and function block 212. Otherwise, if J(INTRA4×4 DC) is less than a*MC2+b, then control is passed to the function block 212.
Inter mode decision is associated with motion estimation, various block sizes and multiple reference picture selection. Intra mode decision is associated with various block types and multiple spatial prediction mode selection. Therefore, mode decision for interframes incurs a big burden on the encoder.
Accordingly, it would desirable and highly advantageous to have a method and apparatus for performing a fast mode decision for interframes that lessens the burden on the encoder.