Motion Estimation (ME) and Compensation is an important technique to exploit the temporal correlations among successive frames in a video sequence. Almost all current video compression standards such as MPEG-1/2/4 and H.26x employ a hybrid of block-based motion compensated prediction and transform coding for representing variations in picture content due to moving objects. In block-based motion estimation, a current frame is divided into rectangular blocks and an attempt is made to match each current block with a block from a reference frame, which would serve as the predictor of the current block. The difference between this predictor block and the current block is then encoded and transmitted. The (x,y) offset of the current block from the predictor block is characterized as a motion vector. A significant improvement in compression efficiency is achieved since usually the ‘difference block’ has a much lower energy or information content than the original block.
The improvement in compression efficiency, however, comes at a significant increase in complexity, since the process of matching a current block with a predictor block almost always involves a search algorithm. The current block is searched for the best possible match in the reference frame within a search window located around the position of the block in the current frame. For each search location, some metric—typically the Sum of Absolute Differences (SAD), or the Sum of Squared Difference (SSD) between the pixels of the two blocks—is calculated. The block that produces the smallest value in the metric is then selected as the predictor block. A full search strategy typically involves testing all the available blocks in the search range leading to a high computational complexity. The complexity of the search algorithm thus depends on the size of the search area (amongst other things).
The algorithms aimed at simplifying the number of calculations for motion estimation can be classified as being pel-recursive, block-based or object based. The pel-recursive methods lead to a significant number of operations per frame, as calculations have to be done on every pixel. The object-based methods involve separate operations for object-recognition leading to computational complexity. It has been observed that the computational complexity could be reduced if efficient block-based search techniques could be designed.
Many attempts aimed at reducing the complexity of ME have focused on Fast Motion Estimation (FME) algorithms, which focus on ways to reduce the number of search candidates required to find a ‘good match’ while leading to a minimum degradation in the predicted video quality as compared to the exhaustive search. Several block-based motion estimation algorithms that are computationally faster than the full search have been investigated and developed. The three-step search (TSS), new three-step search (NTSS), four step-search (4SS), block-based gradient descent search (BBGDS), diamond search (DS), hexagon-based search (HEXBS), and Unsymmetrical-cross Multi-Hexagon-grid Search (UMHexagonS) are a few such FME algorithms. In addition, various FME methods are also disclosed in U.S. Pat. Nos. 6,668,020, 6,542,547, 6,414,997, 6,363,117, 6,269,174, 6,259,737, 6,128,047, 5,778,190, 5,706,059, and 5,557,341. In general, these methods are carried out in the spatial domain and depend on the shape and size of the search pattern and on the efficient choice of the search center to increase the speed of the motion vector search. However, the disadvantage is that these techniques may fall into a local distortion minimum and not identify the best predictor block. Also, the reduction in the number of search points depends on the shape of the search pattern.
While FME algorithms can reduce the complexity of the ME process by a factor of 10 or better, they nonetheless suffer from the fact that—like the full search algorithm—their complexity is proportional to the size of the search area. For a highly complex encoder—such as the latest H.264/MPEG-4 AVC encoder—a factor of 10 or 20 improvement may still not be sufficient for real-time performance. There is therefore a need for an alternative mechanism that can perform motion estimation at much lower complexity, but without sacrificing compression efficiency.