FIG. 1 is a block diagram of a typical video encoder 30 having a motion estimation engine 32. The motion estimation engine 32 encodes the incoming video signal 34 using intra-coded frames (I-Frames) 36 to generate one or more predictive-coded frames (P-Frames) 38. An I-Frame 36 is typically generated by compressing a single frame of the incoming video signal 34. The P-Frame 38 then provides more compression for subsequent frames by making reference to the data in the previous frame instead of compressing an entire frame of data. For instance, a P-Frame 38 may only include data indicating how the pixel data has changed from the previous frame (Δ Pixels) and one or more motion vectors to identify the motion between frames.
In order to generate a P-Frame 38, the motion estimation engine 32 typically compares blocks of pixel data from the current frame 40 with blocks of data from a previously generated frame of data, referred to as the reference frame 42. The motion estimation engine 32 attempts to find the best fit pixel match between each block in the current frame 40 and each block in the reference frame 42. In this way, the P-Frame only needs to include the small pixel difference (Δ Pixels) between the matched blocks and a motion vector to identify where the block was located in the reference frame 42. An example of this process is further illustrated in FIGS. 2A-2C.
FIG. 2A depicts an example block 50 within a current frame 52 of pixel data. Also shown in FIG. 2A is a predicted motion vector (PMV) 54 that provides an estimate of where the block 50 was likely located in the reference frame. As illustrated, a motion vector 54 typically points from a corner pixel of the current block 50 to a corner pixel of the reference block 56. Methods for calculating a predicted motion vector (PMV) 54 are known in the art and are beyond the scope of the instant application.
Based on the predicted motion vector (PMV) 54, a search area 60 is selected within the reference frame 62, as illustrated in FIG. 2B. As shown, the search area 60 may include all of the blocks surrounding the reference block 56 identified by the predicted motion vector (PMV) 54. The current block 50 is then compared with reference blocks at every pixel location within the search area 60 in order to identify the motion vector location within the search area 60 with the closest pixel match. This comparison is typically performed by calculating a sum of absolute differences (SAD) for each motion vector location within the search area 60, and selecting the motion vector location with the lowest SAD as the best match. It should be understood that other functions, such as motion vector cost, may also be used in this motion vector selection process.
FIG. 2C shows an example of a reference block 63 that has been identified from the search area 60 as being the closest pixel match with the current block. The reference block is identified by a motion vector 65 that points to the integer pixel location 64 in the upper left corner of the reference block 63. The identified reference block 63 includes an array of integer pixels (e.g., 4×4, 16×16, etc.) that most closely matches the array of pixels in the current block.
A more precise match to the current block can be obtained by performing a fractional pixel expansion around the integer pixels in the reference block 63 and then comparing the resultant fractional blocks with the current block to identify the closest match. Fraction movement on the integer pixels is done by shifting the entire block of integer pixels up, down, left, and right in fractional increments in order to find a better match with the current source block than the integer pixels provided. FIG. 3 illustrates an example of a fractional pixel expansion 300 of an integer pixel 301. In this example, the integer pixel 301 has been expanded to its quarter 302 and half 303 pixel locations. As shown, this results in forty-eight half and quarter pixels 302, 303 for each integer pixel (the number may vary depending on the video standard). By performing this fractional pixel expansion on each integer pixel in the identified reference block 63, there are resultant forty-eight fractional blocks that may be compared with the current block in order to define a more precise motion vector. This process is commonly referred to as fractional motion estimation.
Typically, fractional motion estimation is performed by processing one integer pixel location in the reference block at a time to generate the fractional pixel data and to accumulate partial SAD values. After all of the fractional pixel locations have been processed, the resulting forty-nine (48 fractional and one integer) SADs (or fewer, depending on the video standard) are compared and the fractional motion vector with the lowest SAD is selected. A disadvantage of this approach is that it results in many of the fractional pixel expansions being performed multiple times. For example, a (neg x, pos y) expansion for a particular integer pixel will include most of the same fractional pixels as a (neg x, neg y) expansion of the integer pixel below it, thus wasting clock cycles and memory accesses.