Motion estimation is fundamental to a variety of video compression technologies. For example, video technologies such as moving picture expert group (MPEG) MPEG-1, MPEG-2, and MPEG-4H.264/MPEG-4 advanced video coding (AVC) (hereinafter referred to as H.264, the standard of which is expressly incorporated by reference herein) achieve compression through motion estimation and other techniques. A video sequence includes a series of image frames. In the series of frames, a current frame may be predicted from previously encoded frames known as reference frames. This is sometimes referred to as inter frame prediction.
Patterns corresponding to objects and background tend to “move” within the video frames to form corresponding objects or background from one video frame to the next. An object in the current frame may generally correspond to the same object in the reference frame, but may be in a different location. A video frame is usually divided into blocks or macroblocks. The size of the macroblock is typically 16×16 pixels, but can be any size, for example, down to 4×4 for motion estimation, according to the various standards. Each macroblock (or smaller block within each macroblock) in the current frame is compared to regions of the same size, which can also be referred to as macroblocks, in the reference frame to locate the best matching macroblock. In other words, video encoders use motion estimation to search the previously encoded frames to find the area that best matches the currently being coded macroblock of the current frame.
The matching of one macroblock with another is based on the output of a cost function. The macroblock that results in the least cost is the one that matches the closest to the current macroblock. Some examples of cost functions include a sum of absolute differences (SAD), a mean squared error (MSE), and a sum of absolute transformed differences (SATD), among others. To minimize computational costs, motion estimation may be performed in a predefined search window within a frame. However, larger motions within the video require larger search windows, and hence, lead to higher computational costs.
Motion estimation is an important function of any hybrid video encoder, and can affect many aspects of the encoder from cost, size, computation intensiveness, to compression ratios, etc. The H.264 standard allows video encoders to search a multiple number of previous frames using a search window size of +/−512 quarter pixels, for example, in each direction for high definition (HD) video sequences. This amounts to more than one million search locations per reference frame. Typically, video encoders perform motion estimation on full pixels first and then refine the best motion vector by searching the neighboring half or quarter pixels. This reduces the number of search locations to about 65,000. Even with only 65,000 search locations to evaluate, it is difficult and computationally expensive to implement the motion estimation.
During an exhaustive or full search, the cost function may be applied to the macroblock for each possible search location within the search window, and a motion vector may be associated with each cost and search location. Motion vectors representing the displacement of best matching macroblocks in the reference frame with respect to the corresponding macroblocks in the current frame may be determined. The motion vectors are used to compress video sequences by encoding the changes to an image from one frame to the next rather than encoding the image or frame. The encoders can use the motion vectors to identify the selected region in one of the previous frames and the error between the selected region and the actual macroblock in the current frame.
FIG. 1 is a schematic diagram showing video frame(s) 100 (i.e., video frames 1 and 2) including a search window 105 and macroblocks 110/115 according to a conventional motion estimation approach. A current macroblock 110 in a current video frame 2 is compared to a reference macroblock 115 in a reference video frame 1. After searching within the search window 105 for each possible location for the best matching macroblock 115 in the reference frame, a displacement between macroblocks 110 and the best matching macroblock 115 may be determined, which is represented by a motion vector 120. This determination can be made by exhaustively comparing the cost function of each possible motion vector associated with each of the search locations in the search window. Consequently, the current macroblock 110 is encoded as the motion vector 120 in view of the best matching macroblock 115 in the reference frame after comparing the results of the cost functions. The motion vector 120 includes a horizontal (Δx) component and a vertical component (Δy). The position of the current macroblock 110 has in essence shifted from the position of the reference macroblock 115. Performing a full search of the search window 105 is very computationally expensive, either in resources or time or both.
FIG. 2 is a schematic diagram showing a conventional approach to searching within search window 105 that is an alternative to the full search. A number of positions (search locations) around a center 200 are tested and the position of minimum distortion, or in other words, having the least cost, becomes the center of the next stage. For example, eight search locations around the center 200, and the center position 200 itself, are initially tested, indicated by the circled numbers 1. The search location yielding the lowest cost of the eight initial search locations is selected as the next center 205. Next, eight new locations around the new center 205, and the new center 205 itself, are tested, indicated by the circled numbers 2. The area covered by the second set of search locations is less than the first set because each new set is a refinement of the previous search. The search location yielding the lowest cost is selected as the next center 210. The method can continue until the search converges on a single search location. Such a search is often referred to as a “square” search or a “three step search” as described, for example, in U.S. Pat. Publication 2005/0207494. A variation on this type of search includes a “diamond” search, as described for example in U.S. Pat. Publication 2006/0203912.
FIG. 3 is a schematic diagram showing another conventional approach to searching within search window 105. The search begins from a center location 300 in the search window 105 and expands in a spiral pattern. The expansion can increment gradually, thereby providing an exhaustive search of the search locations in search window 105. Alternatively, the spiral search can have increasing distances from the center location 300. Such a search is described, for example, in U.S. Pat. No. 6,418,166 to Shou-jen.
A full or exhaustive search of all possible search locations, although accurate in finding the best locations of the macroblocks and providing good video quality, is nevertheless time-consuming and computationally expensive. Other conventional searches such as the square or spiral search discussed above may require fewer searches than a full search, because a partial search is performed as opposed to a full search. But the resulting video quality associated with these searches is degraded. Further, video sequences having larger motions require larger search windows; and the larger the search window, the more expensive the process of motion estimation becomes. Embodiments of the invention address these and other limitations in the prior art.