One widespread method of capturing and/or displaying images is through the use of pixel-based image capture and/or display. As electronic imaging resolution has increased along with the increased demand of “real time” video display, the need for quick and efficient processing of video image data also grows.
The processing of video data is generally well known in the art. Typically, data for video is defined by a series of frame images of an X×Y pixel resolution. To facilitate efficient processing, pixel (pel) data for each frame is customarily handled in groups, or blocks. The size of the blocks can be varied. Typically, larger block sizes are better for noise immunity, but smaller block sizes are better for tracking fine detail and handling rotation.
For various types of video processing, such as video encoding, frame rate conversion, super-resolution, etc., it is desirable to perform motion estimation by finding where blocks of image data (or altered versions thereof) that appear in a first frame are located in a second frame. Conventionally, the image data of a search block located at a particular coordinate location in a first frame is searched for in a second frame within a search area that surrounds the search block's relative frame location, although the entire second frame could be searched.
For example, FIG. 1 illustrates a search block 12 at a particular location within a first frame 10. Typically, a search area 16 surrounding the search block's relative frame location in a second frame 14 is analyzed to find a corresponding block 18 that best matches the search block 12. The position of the corresponding block 18 as shown in FIG. 1 is by way of example, the actual location of the “best matching” block may be anywhere within the search area 16. The displacement of the corresponding block 18 from the relative frame location of the search block indicates object movement within the image and a motion vector is defined for the search block 12 based on that displacement.
Block searches for a search block 12 within a search area 16 are typically performed using a pixel based comparison of similarly sized blocks, such as using an accumulation of the sum of the absolute value of differences (SAD) calculation comparison of the pel values of the search block 12 to pel values of respective pels with respect to each similarly sized block within the search area 16. Various other comparison methods are known in the art, such as, for example, the sum squared absolute difference SSAD. However, the SAD metric is widely used due to its relative simplicity and hardware support such as amd_sad( ) extensions in OpenCL.
Once a corresponding block 18 is selected based upon the SAD or other comparison metric calculation, a motion vector is assigned to the search block 12 reflecting the relative frame coordinate displacement between the search block 12 and the corresponding block 18. Where the corresponding block 18 in the second frame 14 is in the same relative location as the search block 12 in the first frame 10, a zero vector will result for that block indicating no relative motion of the portion of an image represented by the pels of the search block 12.
It is common to partition the entire first frame into blocks and to assign a motion vector to each of the blocks using this type of searching process. Such processing is calculation intensive and becomes more calculation intensive and time sensitive as frame and resolution sizes and frame speeds increase.
Searching is commonly performed on a whole pel granularity basis to produce motion vectors with integer-pixel precision. For example, for X×Y sized frames, an M×N subset of an X×Y array of pel data of the first frame 10 can be selected for the search block 12 which is searched for within an I×J search area 16 that is a subset of an X×Y array of pel data of the second frame 18 to produce motion vectors indicating displacement of the search block 12 displaced a whole number of vertical and/or horizontal pixels from a determined corresponding block 18 in the second frame 16.
In some instances, it is desirable to obtain motion vectors having higher than integer precision. To achieve sub-pixel precision motion estimation, the frames may be up-sampled. For example, to achieve half-pixel resolution motion estimation of for X×Y pixel frames, both the first frame and the second frame 10, 14 may be up-sampled in both the X and Y dimensions with a scaling ratio of two to produce pel data arrays that are 2X×2Y in size. To search at a higher granulation an equivalent I×J whole pixel search area 16 within the second frame 16, the higher granulation search is performed by including the half pixel values produced by the up-sampling of a 2I×2J array of pel values within the 2X by 2Y pel value array produced from up-sampling the second frame. Such added complexity can significantly impact performance and/or add to the cost of a processing component by requiring greatly increased bandwidth and/or parallel processing capacity in order to meet timing requirements desired in processing sequential video frames.