1. Field of the Application
Generally, this application relates to the coding of video signals. More specifically, it relates to systems and methods for efficient, variable block size motion estimation used in video coding.
2. Description of the Related Art
The typical video display, such as a cathode ray tube (CRT) display or a liquid crystal display (LCD), provides a visible picture that is made up of a sequence of individual images, or frames, indexed by time. The sequence of frames is displayed, for example, at a rate of 30 frames per second. Because each frame includes a large amount of raw information, the sequenced video data is associated with a very high bandwidth requirement. This high bandwidth requirement typically necessitates efficient video compression coding to facilitate more efficient handling of the sequenced video data.
FIG. 1 illustrates the typical video sequence breakdown as is known in the art today. As shown in FIG. 1, a typical video sequence 100 includes a time indexed stream of individual frames 120, 170-190. A particular frame 120 is made up of many slices of data, for example, slice 121. Slice 121 can typically be partitioned into a set of macro-blocks 124-128, where each macro-block is typically 16 picture elements (or pixels) by 16 pixels. Thus, frame 120 can be thought of as a grid of macro-blocks. Representative macro-block 127 can be further sub-divided, for example, into two 16×8 sub-blocks, such as 127b, or into four 8×8 sub-blocks, such as 127a2. Frames can be broken down into many different sizes, shapes and configurations of macro-blocks and sub-blocks.
The typical video sequence is characterized by a high degree of correlation between successive frames. Given the highly correlated nature of consecutive frames, a very simple block-motion model can offer a reasonably good description of the video process. FIG. 2 illustrates a very simple block-motion model as is typically used in the art today. In this simple block-motion model, a current frame 211 includes two current macro-blocks 221, 231. Each current macro-block 221, 231 can be thought of as being shifted from a previous macro-block 220, 230 of a previous frame 210. The shift of each current macro-block 221, 231 can be indicated by a motion vector 250 associated with their respective previous macro-block 220, 230. Thus, each macro-block in a current frame can be viewed as arising from a shifted location of that same macro-block in the previous frame, where the amount of shift is designated by its motion vector.
Commercial grade video compression systems known in the art today, including well-known industry standards, such as, International Telecommunications Union (ITU) H.261, H.263 and H.264, as well as International Standardization Organization (ISO)/International Electro-technical Commission (IEC) MPEG-1, MPEG-2 and MPEG-4, utilize the above properties associated with the video process to efficiently represent and compress video data. In recent years, advances have been made in how motion vector data is represented in the video bit-stream syntax, starting from H.261, when a single motion vector at full pixel resolution was allowed per macro-block, to H.264, where multiple motion vectors (as high as one per 4×4 sub-block or 16 per macro-block) at quarter-pixel resolutions and pointing to multiple reference frames are used.
In general, these video coding systems deploy a motion estimation module that searches for, and measures, the most likely motion vector for the current block (e.g., macro-block, sub-block, full frame, etc.) of data being analyzed. Typically, the motion estimation module is the most computation-intensive, and thus power-hungry, module of a particular video coding system. Once the motion vector is estimated by the motion estimation module, the motion vector, along with any residue information, between the current block of data and the predictor or reference block (i.e., the preceding block from the preceding frame), pointed to by the motion vector, is encoded to form the compressed bit-stream. Typically, the lower the amount of residue information the better the motion estimation.
Accordingly, sophisticated motion estimation algorithms attempt to offer a video quality similar to that of performing an exhaustive pixel-by-pixel motion search, but with a much lower complexity than the exhaustive search. One conventionally popular, high-performance motion estimation algorithm is called the unrestricted center biased diamond search (UCBDS) algorithm. The UCBDS algorithm generally compares a 16×16 macro-block in the current frame to a selection of 16×16 macro-blocks in the reference (or preceding) frame until the ‘best’ motion vector is determined. The most common metric used to compare various motion vector positions is the sum of absolute differences (SAD) metric. However other metrics are known in the art and at least some of them can also benefit from certain embodiments disclosed herein. As the name suggests, SAD is obtained by adding the absolute value of the differences between the current and reference pixel values over all of the pixels of the blocks under analysis. That is, for a 16×16 block SAD comparison between a current frame block and a reference frame block, each of the values of the 256 pixels in the current block is subtracted from each of the values of the associated 256 pixels in the reference block. The absolute values of all of these 256 pixel value differences are then added together to result in the SAD value for that block comparison. As one might expect, the lower the SAD value the better the motion vector position.
Typically, the UCBDS algorithm selects a starting position in the current block, e.g., the upper, left-hand integer pixel, and then selects a corresponding seed position in the reference block with which to begin the SAD comparisons. The seed position in the reference block will be the starting location (i.e., the upper, left-hand pixel) for defining the reference block in the reference frame. The UCBDS will compare the SAD value at the seed position to the SAD values at positions in a diamond pattern that is centered around the seed position, where the points of the diamond pattern are at locations that are plus or minus two (+/−2) pixels away from the seed position (i.e., for a total of nine SAD calculations). If the SAD value at the seed position is the best, then the UCBDS algorithm is complete for that block and the motion vector for determined. However, if one of the pixel locations on the diamond pattern results in the best SAD value, then that location is set as a new seed position and its SAD value is now compared to the SAD values at locations defined by a new diamond pattern centered around that new seed position. This method is continued until the new seed position has the best SAD value as compared to the SAD values at locations on the diamond pattern around it. On average, the UCBDS algorithm is known to finish searching and comparing after calculating the SAD values at about 18-20 integer pixel locations, resulting in a good motion vector for a particular block.
Where the motion vectors are desired to be represented with half-pixel accuracy, a conventional strategy for determining a half-pixel motion vector consists of first determining a good integer pixel resolution motion vector (as above, with the UCBDS algorithm) and then searching and comparing the SAD values at the eight half-pixel locations around the good integer pixel motion vector location to find a good half-pixel motion vector location. Likewise, when motion vectors are desired to be represented with quarter-pixel accuracy, first the good integer pixel motion vector location is determined, which is followed by determining the good half-pixel motion vector location, which in turn is followed by searching and comparing the SAD values at the eight quarter-pixel locations around the good half-pixel motion vector location to find a good quarter-pixel motion vector location.
As used herein, integer pixel and pixel (i.e., without qualification) refer to the actual, picture element location within a particular block (i.e., macro-block, sub-block, etc.) or frame, while sub-pixel (i.e., partial-pixel, half-pixel, quarter-pixel, and the like) refers to a fictitious picture element located a fraction of the way (i.e., one, two or three quarters of the way, and the like) between two integer pixel locations (i.e., horizontally, vertically and/or diagonally). A sub-pixel location is typically associated with a value given by a weighted averaging of the values of the nearby integer locations. For example, in the MPEG-4 video compression standard, the vertical half-pixel location between two vertically adjacent integer pixels with values A and B, respectively, is given by equation (1):round((A+B)/2),  (1)where “round” denotes a rounding operation. Of course, equation (1) is for illustrative purposes only and is in no way intended to limit the scope of this application.
A naïve algorithm to determine quarter-pixel accuracy would exhaustively search all eight half-pixel locations followed by exhaustively searching all eight quarter-pixel locations before providing a good quarter-pixel motion vector location. This can result in as many as sixteen extra searches and SAD comparisons, on top of the original 18-20 integer pixel searches. Additionally, more computations are required to determine the values corresponding to all of the half- and quarter-pixel locations. Furthermore, when variable block size motion vectors and multiple reference frames are allowed, such a search must potentially be carried out for each variety of block size and reference frame used. In summary, the possibility of sub-pixel, variable block sized, motion vectors can significantly increase the complexity of the motion estimation module.
Therefore, what are needed are systems and methods for performing the computations associated with motion estimation for variable block size, sub-pixel motion vectors in a more efficient manner, for example, by using a common core of SAD computations that are capable of quickly calculating different block size motion vector resolutions, at least partially in parallel.