1. Field of the Invention
The present invention relates generally to a video coding technology, and more particularly to a block matching method for fast motion estimation.
2. The Prior Arts
In order to save the storage medium space for storing image data and reduce the bandwidth used for transmitting the image data, original image data is often compressed to obtain compressed image data. When the image data is to be displayed, the compressed image data is recovered to displayable image data by executing a decompression process. The compression process is known as a coding process, while the decompression process is known as a decoding process.
The H.264/AVC video coding standard is a widely used coding method, which is often applied in image compression required by network image transmission. The image data coding system of the H.264/AVC standard includes motion estimation, motion compensation, block codes, and variable length codes, by which P-frame bitstream, i.e., the compressed data, can be generated. Among the foregoing, the motion estimation occupies very much system resources, such as memory space, computation time, and power consumption. Generally speaking, the motion estimation may occupy 76% of memory access, 77% of memory bandwidth, and 78% of computation time. As such, it is very highly desired to enhance the efficiency of the motion estimation and improve the entire coding efficiency.
Regarding a motion estimation approach, a search window is selected from a reference frame according to a current block in a current frame. Then, a best matching algorithm (BMA) is utilized to find out a best matched block from all reference blocks in the search window, thus obtaining a corresponding motion vector provided for subsequent variable length codes. The BMA typically determines a best matched block having a minimum sum of absolute differences (SAD) according to the SAD defined by the following equation.
      S    ⁢                  ⁢    A    ⁢                  ⁢          D      ⁡              (                  i          ,          j                )              =            ∑              m        =        0            15        ⁢                  ∑                  n          =          0                15            ⁢                                            X            ⁡                          (                              m                ,                n                            )                                -                      Y            ⁡                          (                                                m                  +                  i                                ,                                  n                  +                  j                                            )                                                  In the equation, X(m, n) represents the image data of the current block at coordinates (m,n), Y(m+i, n+j) represents the image data of the reference block at coordinates (m+i, n+j), in which i is a horizontal coordinate, and j is a vertical coordinate, and i and j are integers.
FIG. 1 is a schematic diagram illustrating a conventional video coding hardware system. Referring to FIG. 1, a conventional video coding hardware system 1 includes an encoder 10, for searching for a best matched block in the search window of the reference frame. The encoder 10 loads data stored in an external memory 17 via an external bus 19 and a memory interface 16. The data stored in the external memory 17 is the data of the reference block in the search window. The encoder 10 includes an encoding engine 11, an internal memory 12, and a computation engine 13. The internal memory is adapted for storing the data loaded from the external memory 17. The computation engine 13 executes a logical computation to obtain the SADs. The encoding engine 11 finds out the best matched block having the minimum SAD according to the SADs obtained by the computation engine 13.
Referring to FIG. 2, there is shown a schematic diagram illustrating the search window of the conventional BMA. As shown in FIG. 2, the search window 50 has a width of SRV+N−1, a height SRV+N−1, a horizontal searching range SRH, and a vertical searching range SRV. A reference block 61 positioned at a center point of the search window 50 is a N×N block, in which each of the values is counted by pixel as the unit thereof, and SRH=2PH, and SRV=2PV.
The H.264/AVC video coding standard is featured with the fast motion estimation approach of a specific multiple reference frames scheme, for providing a standard operation for further compression of the image data.
Referring to FIG. 3, there is shown a conventional multiple reference frames scheme. As shown in FIG. 3, regarding a current block 30 of the current frame 20, according to the BMA, best matched blocks are found out from a first search window 51, a second search window 52, a third search window 53, and a fourth search window 54 of a first reference frame 41, a second reference frame 42, a third reference frame 43, and a fourth reference frame 44, respectively. The current block 30 is at a time t, the first search window is at a time t−T, the second search window is at a time t−2T, the third search window is at a time t−3T, and the fourth search window is at a time t−4T, in which T is a frame time interval, i.e., a spacing time between two consecutive frames. The time t−4T is ahead to the time t−3T for a frame time interval T.
Referring to FIG. 4, there is shown a flow chart illustrating the conventional multiple reference frames scheme. As shown in FIG. 4, first at step S10, the current block is loaded. Then at step S12, the search windows of a reference frame are loaded. Then, at step S14, the best matched blocks of the search windows are searched according to the BMA. Then, at step S16, when the best matched blocks of the search windows of the reference frame are not all obtained yet, the flow goes to step S18. At step S18, search windows of a next reference frame are loaded, and the flow then goes back to step S14. At step S20, the flow ends.
It should be noted that the operation of loading the search windows of the reference frame at steps S12 and S18 means loading all data of the search windows into the internal memory 12 from the external memory 17. At step S14, the best matched blocks are found out according to the BMA. Therefore, the flow shown in FIG. 4 can complete the motion estimation of the current block. The entire motion estimation can be achieved by repeating the foregoing steps regarding all current blocks in the current frame. Steps S12 and S18 unfortunately increase the bandwidth for data transmission. Particularly, in this circumstance, in order to complete the motion estimation of a single current block, all of the first search window 51, the second search window 52, the third search window 53, and the fourth search window 54 have to be downloaded, so that more data have to be transmitted via the external bus 19, thus consuming more power. This seriously affects the performance of the electronic product, especially those using batteries for power supplying.
As such, a high-performance block-matching VLSI architecture with low memory bandwidth is high desired.