Since multimedia applications are becoming more and more popular, the video compression techniques are also becoming increasingly important. A lot of video compression standards have been developed, such as MPEG-4 and H.264/AVC. The main principle of these standards is to eliminate redundancy between successive frames to reduce the storage requirement and the amount of transmission data. Motion estimation plays an important role in video coding for achieving compression, which can reduce the temporal redundancy based on the similarity between successive frames.
FIG. 1 is a diagram illustrating block-matching motion estimation scheme according to the prior art. At first, a current frame 100 of size W×H is divided into a plurality of current blocks of size N×N. For a current block 104, a search window 112 of size (N+SRH−1)×(N+SRV−1) is established in a reference frame 110 (e.g., the previous frame or the following frame), and after comparison between the current block 104 and candidate blocks in the search window 112, a block 114 can be identified as the block in the search window 112 that best matches the current block 104 in the current frame 100. Next, the difference (i.e., residual) between these two blocks 104 and 114 and a motion vector 120 denoting the displacement of the block 104 with respect to the block 114 are calculated. Then, the residual and the motion vector 120 can be used to represent full block 104 to remove the redundancy and achieve data compression, which is the so-called motion estimation. In other words, the purpose of the motion estimation is to estimate the motion vector and the resulting residual of each current block to represent the entire current frame. However, since a lot of candidate blocks need to be compared, the motion estimation is a compute-intensive operation with high bandwidth requirements.
FIG. 2 is a hardware architecture of a video coding system 200, in which the reference and current frames are stored in an external memory 220 and the data required for the motion estimation are loaded via an external bus 230 into an internal memory 212 and then processed by a computation engine (such as an embedded processor) 214. Therefore, during the motion estimation processing, the required candidate blocks in a search window of the reference frames are frequently transferred between the external memory 220 and internal memory 212 via the external bus 230 for data matching computation, which causes high usage of memory bandwidth. Typically, the size of the search window 112 depends on the display resolution and/or compression standards. The larger the search window 112 is, the larger the amount of data required to be loaded into the internal memory is, and also the memory bandwidth requirement is.
Therefore, it is desired to have a method for performing motion estimation capable of reducing memory bandwidth requirements.