Video data is generally processed and transferred in the form of bit streams. Typical video compression encoders gain much of their compression efficiency by forming a reference picture prediction of a picture to be encoded, and encoding the difference between the current picture and the prediction. The more closely that the prediction is correlated with the current picture, the fewer the number of bits that are needed to compress that picture, thereby increasing the efficiency of the process. Thus, it is desirable for the best possible reference picture prediction to be formed.
In many video compression standards, including Moving Picture Experts Group (“MPEG”)-1, MPEG-2 and MPEG-4, the motion between a previous reference picture and the current picture is estimated to form a motion compensated version of the previous reference picture. The motion compensated version of the previous reference picture is used as a prediction for the current picture, and only the difference between the current picture and the prediction is coded.
Motion estimation plays an important role in current video coding systems, and is generally the most computationally complex part of the encoder. The block matching algorithm is employed by most current video coding standards. A full search strategy, which estimates the amount of motion on a block-by-block basis, is a popular motion estimation method. Unfortunately, the complexity of the full search strategy is extremely high, especially for advanced video coding standards such as H.264, which employ multi-reference pictures and multi-block types. Several fast-search algorithms, such as the three step search, newer three step search, diamond search, zonal search, hierarchical or multi-resolution search, or combinations thereof have been proposed. Such algorithms reduce the complexity by reducing the number of searching points. Unfortunately, they tend to trap into local minima on the error surface. Thus, their performance is generally worse than the full search strategy.
Block motion estimation is employed by most current video coding standards to reduce the bit rate. Block motion estimation for video coding has been well explored but few algorithms have been proposed for multi-reference picture and multi-block type selection, such as may be used in the H.263++ and JVT/H.264/MPEG AVC standards, for example.
In the JVT/H.264 standard, various modes are provided for motion compensation. Each motion-compensated macroblock mode corresponds to a fixed size block. The block can be partitioned into 16×16, 16×8, 8×16, and 8×8. The 8×8 block can be further sub-partitioned into block sizes of 8×4, 4×8, or 4×4. Thus, 7 block types are supported in total. The prediction signal for each predictive-coded m×n block is obtained by displacing an area of the corresponding reference picture, which is specified by a translational motion vector that is differentially coded from a motion vector predictor. JVT/H.264 also supports multi-picture motion-compensated prediction. That is, more than one prior coded picture can be used as a reference for building the prediction signal of predictive coded blocks. Accordingly, for motion estimation, the encoder has to make decisions for which block type and which reference picture should be selected. This multi-reference picture and multi-block type selection makes motion searching more complicated.
Multiple reference pictures are used for video encoding to achieve better compression. For example, the JVT/H.264 standard permits the use of up to 15 reference frames. Typically, the motion vectors of a predicted block are calculated from all of the reference pictures, and then the best apparent prediction is chosen for the block. Thus, in this example, the computational burden of the motion estimation process might be up to 15 times that of a single reference picture. It will prohibit from using more reference frames, or increase the complexity of the encoder, or slow a software encoding speed.
Unfortunately, the slow and/or complex computations for multiple reference pictures adversely affect system performance and/or cost, respectively, because the motion estimation of a predicted block is applied to all reference pictures. Thus, what is needed is a method for increasing the speed of the motion estimation process where multiple reference pictures are used.