Since multimedia applications are becoming more and more popular, the video compression techniques are also becoming increasingly important. A lot of video compression standards have been developed, such as MPEG-4 and H.264/AVC. The main principle of these standards is to eliminate redundancy among successive frames to reduce the storage requirement and the amount of transmission data. Motion estimation plays an important role in video coding for achieving compression, which can reduce the temporal redundancy based on the similarity among successive frames.
FIG. 1 is a diagram illustrating block-matching motion estimation scheme according to the prior art. At first, a current frame 100 of size W×H pixels is divided into a plurality of current blocks of size N×N pixels. For a current block 104, a search window 112 of size (N+SRH−1)×(N+SRV−1) pixels is established in a reference frame 110 (e.g., the previous frame or the following frame), and after comparing the current block 104 with candidate blocks in the search window 112, a block 114 can be identified as the block in the search window 112 that best matches the current block 104 in the current frame 100. Next, the difference (i.e., residual) between these two blocks 104 and 114 and a motion vector 120 denoting the displacement of the block 104 with respect to the block 114 are calculated. Then, the residual and the motion vector 120 can be used to represent the block 104 so as to remove the redundancy and achieve data compression, which is the so-called motion estimation. In other words, the purpose of the motion estimation is to estimate motion vector and resulting residual of each current block and use this information to represent the entire current frame. However, since a lot of candidate blocks need to be compared, the motion estimation is a compute-intensive operation with high bandwidth requirements.
FIG. 2 shows a hardware architecture of a video coding system 200. The reference frames and current frames are stored in an external memory 220, and then the data required for the motion estimation are loaded from the external memory 220 into an internal memory 212 via an external bus 230 to be processed by a computation engine (such as an embedded processor) 214. Therefore, during process of performing the motion estimation, the required candidate blocks in a search window of the reference frames are frequently transferred between the external memory 220 and internal memory 212 via the external bus 230 for data matching computation, which causes high usage of memory bandwidth. Typically, the size of the search window 112 may depend on the display resolution and/or specifications of compression standards. The larger the search window 112 is, the larger the amount of data required to be loaded into the internal memory is, and also the memory bandwidth requirement is.
Therefore, it is desired to have a method for performing motion estimation capable of reducing memory bandwidth requirements.