Video data is generally processed and transferred in the form of bit streams. Typical video compression coders and decoders (“CODECs”) gain much of their compression efficiency by forming a reference picture prediction of a picture to be encoded, and encoding the difference between the current picture and the prediction. The more closely that the prediction is correlated with the current picture, the fewer bits that are needed to compress that picture, thereby increasing the efficiency of the process. Thus, it is desirable for the best possible reference picture prediction to be formed.
In many video compression standards, including Moving Picture Experts Group (“MPEG”)-1, MPEG-2 and MPEG-4, the motion between a previous reference picture and the-current picture is estimated to form a motion compensated version of the previous reference picture. The motion compensated version of the previous reference picture is used as a prediction for the current picture, and only the difference between the current picture and the prediction is coded.
Motion estimation plays an important role in current video coding systems, and is generally the most computationally complex part of the encoder. The block matching algorithm is employed by most current video coding standards. A full search strategy, which estimates the amount of motion on a block-by-block basis, is a popular motion estimation method. Unfortunately, the complexity of the full search strategy is extremely high, especially for advanced video coding standards such as H.264, which employ multi-reference pictures and multi-block types. Several fast-search algorithms, such as the three step search, newer three step search, diamond search, zonal search, hierarchical or multi-resolution search, or combinations thereof have been proposed. Such algorithms reduce the complexity by reducing the number of searching points. Unfortunately, they tend to trap into local minima on the error surface. Thus, their performance is generally worse than the full search strategy.
Block motion estimation is employed by most current video coding standards to reduce the bit rate. Block motion estimation for video coding has been well explored but few algorithms have been proposed for multi-reference picture and multi-block type selection, such as in H.263++ and JVT/H.264/MPEG AVC.
In H.264, various modes are provided for motion compensation. Each motion-compensated macroblock mode corresponds to a fixed size block. The block can be partitioned into 16×16, 16×8, 8×16, and 8×8. The 8×8 block can be further sub-partitioned into block sizes of 8×4, 4×8, or 4×4. Thus, 7 block types are supported in total. The prediction signal for each predictive-coded m×n block is obtained by displacing an area of the corresponding reference picture, which is specified by a translational motion vector that is differentially coded from a motion vector predictor. H.264 also supports multi-picture motion-compensated prediction. That is, more than one prior coded picture can be used as a reference for building the prediction signal of predictive coded blocks. Accordingly, for motion estimation, the encoder has to make decisions for which block type and which reference picture should be selected. This multi-reference picture and multi-block type selection makes motion searching more complicated.
Currently, full search (“FS”) and several fast search algorithms have been proposed for motion searching, such as, for example, the three step search, new three step search, diamond search, zonal search, and hierarchical search. Among these, generally only the full search achieves optimal solutions. Thus, what is needed is a method for reducing complexity over the full search algorithm while achieving optimal solutions.