FIG. 1 is a block diagram of a typical video encoder 30 having a motion estimation engine 32. The motion estimation engine 32 encodes the incoming video signal 34 by using intra-coded frames (I-Frames) 36 to generate one or more predictive-coded frames (P-Frames) 38. An I-Frame 36 is typically generated by compressing a single frame of the incoming video signal 34. The P-Frame 38 then provides more compression for subsequent frames by making reference to the data in the previous frame instead of compressing an entire frame of data. For instance, a P-Frame 38 may only include data indicating how the pixel data has changed from the previous frame (Δ Pixels) and one or more motion vectors to identify the motion between frames.
In order to generate a P-Frame 38, the motion estimation engine 32 typically compares 16×16 macroblocks of pixel data from the current frame 40 with 16×16 macroblocks of data from a previously generated frame of data, referred to as the reference frame 42. The motion estimation engine 32 attempts to find the best fit pixel match between each macroblock in the current frame 40 and each macroblock in the reference frame 42. In this way, the P-Frame only needs to include the small pixel difference (Δ Pixels) between the matched macroblocks and a motion vector to identify where the macroblock was located in the reference frame 42. An example of this process is further illustrated in FIGS. 2A and 2B.
FIG. 2A depicts an example macroblock 50 within a current frame 52 of pixel data. Also shown in FIG. 2A is a predicted motion vector (PMV) 54 that provides an estimate of where the macroblock 50 was likely located in the reference frame. As illustrated, a motion vector 54 typically points from a corner pixel of the current macroblock 50 to a corner pixel of the reference macroblock 56. Methods for calculating a predicted motion vector (PMV) 54 are known in the art and are beyond the scope of the instant application.
Based on the predicted motion vector (PMV) 54, a search area 60 is selected within the reference frame 62, as illustrated in FIG. 2B. As shown, the search area 60 may include all of the macroblocks surrounding the reference macroblock 56 identified by the predicted motion vector (PMV) 54. The current macroblock 50 is then compared with reference macroblocks at every pixel location within the search area 60 in order to identify the motion vector location within the search area 60 with the closest pixel match. This comparison is typically performed by calculating a sum of absolute differences (SAD) for each motion vector location within the search area 60, and selecting the motion vector location with the lowest SAD as the best match. It should be understood that other factors, such as motion vector cost, may also be used in this selection process.
The calculations performed by a typical motion estimation engine to identify the best fit pixel match between a current macroblock and a search area in a reference frame is often one of the most clock cycle, resource and power consuming processes performed by a video encoder. For example, in the case of 16×16 macroblocks, 256 pixel differences need to be calculated to determine the SAD for every motion vector within the search area. The system resources required to perform these calculations may thus be substantially affected by the way in which this data is loaded into local memory and processed by the motion estimation engine.
FIGS. 3 and 4 illustrate two prior art methods for processing the pixel data from a search area to identify the best fit pixel match with a current macroblock. In these examples, each pixel in the search area (illustrated by white circles) represents a potential motion vector. For each potential motion vector, a SAD is calculated between the current macroblock and the reference macroblock starting at the pixel location identified by the potential motion vector. The arrows in FIGS. 3 and 4 illustrate example scan patterns showing how the reference macroblocks are accessed from memory and processed by a typical motion estimation engine.
With reference first to FIG. 3, this example shows the pedantic approach to processing macroblocks of pixel data in a search area 70. A typical search starts with the potential motion vector 72 in the top left corner of the search area, scans horizontally (or vertically) across each row, and then moves down one row and repeats the process. At each potential motion vector within the search area, the motion estimation engine will typically read a 16×16 macroblock of reference pixel data from a local cache, calculate the SAD, compare the SAD with a minimum to track the best fit pixel match, and then move on to the next potential motion vector. This approach is simple, but requires a macroblock of reference pixel data to be accessed from memory for every potential motion vector.
FIG. 4 illustrates another example search pattern that is somewhat more efficient than the pattern shown in FIG. 3. In this example, the motion estimation engine utilizes a shift register to store enough reference pixel data to process multiple motion vectors from a single stride of data once the shift register is full. This approach reduces the number of times that the memory needs to be accessed. In the illustrated example, the width of the shift register is sufficient to store enough data to process four macroblocks of reference pixel data. For instance, in the case of 16×16 macroblocks, a 19×16 shift register would provide sufficient storage to process four reference macroblocks before a new stride of data is needed from the reference data cache.
Using the scan pattern illustrated in FIG. 4, the reference pixel data needed to process the reference macroblocks for the first four motion vectors 80 in the top left corner of the search area 82 is initially loaded into the shift register for the motion estimation engine. The motion estimation engine then calculates the SAD for each of these four initial reference macroblocks, and compares each SAD with a minimum to track the best fit pixel match. A new stride of data is then loaded into the shift register to process the first four motion vectors in the next row of the search area 82. This process is repeated until the last row 84 in the search area 82 is processed, afterwhich the scan pattern returns to the top row of the search area 82 to process another column of four motion vectors. This scan pattern is repeated until SADs have been calculated for each potential motion vector in the search area 82.
With the scan pattern shown in FIG. 4, the reference data cache only needs to be accessed to add a single stride of data to the shift register when the scan pattern shifts from one row to the next and to completely refill the shift register when the scan pattern moves from the bottom to the top of the search area. It will be appreciated that this approach will significantly reduce the number of memory accesses compared to the scan pattern of FIG. 3. However, the scan pattern shown in FIG. 4 still requires a high percentage of the reference pixels to be read from memory multiple times. Consequently, it is desirable to provide an efficient scan pattern that would reduce the amount of memory accesses needed to processes all of the potential motion vectors in a search area.