A video typically comprises a number of still images (“frames”) presented in sequence, one after another. In digital videos, each frame may be digitally encoded as a series of bits (or bytes), however resource limitations (e.g. storage space and/or network bandwidth) often place a cap on the total number of bits that can be used to represent each frame, which can effectively limit the overall quality of the video. Thus, one of the main goals of video encoding has been to encode the video in a way which meets a target bitrate while maximizing video quality.
One way of accomplishing this is to encode only the “differences” between each of the frames. For example, “motion” is often isolated to certain regions of a frame at any given time. In other words, not every pixel of a given frame will be changed in the next frame. Thus, rather than re-encoding every pixel of every frame, which would require a very high bitrate, only the pixel differences between consecutive frames are encoded.
FIG. 1 illustrates a method of motion estimation. The method of FIG. 1 comprises frames 110 and 120, a frame element 122, and a macroblock 123. Frame 120 corresponds to the frame currently being encoded, while frame 110 corresponds to the frame that was just previously encoded. The macroblock 123 comprises a plurality of adjacent pixels within frame 120, on which motion estimation is currently being performed. Motion estimation is the process of finding the “best match” from frame 110 for the macroblock 123 in the frame 120. The frame 110 is searched at several search points within a search region 111, and the pixels at each search point are compared with the pixels in the macroblock 123. Search points are represented with motion vectors, and a best motion vector 115 indicates the relative pixel displacement in the horizontal and vertical directions between the location of the best match block 113 in frame 110 and the relative location of the current macroblock 123. Once the best match 113 is found, block based video compression algorithms will encode the pixel differences between the current macroblock 123 and the best match block 113, rather than encoding the actual pixels themselves. Since a relatively good match can often be found in natural video scenes, this technique drastically reduces the amount of data that needs to be encoded into the bitstream, even after accounting for the extra bits used to encode the motion vectors themselves. The decoder then adds these differences to the best match 113, which is extracted using the encoded motion vector. This process is known as “motion compensation”.
FIG. 2 illustrates a method of encoding a macroblock using motion estimation. Referring back the example of FIG. 1, the macroblock 223 corresponds to the macroblock 123 of frame 120, and the macroblock 213 corresponds to the best match block 113 of frame 110. Block 130 represents the difference between the macroblocks 223 and 123 which, in this case, is a block of zeroes. Thus, the encoder will only need to encode this block of zeroes, and will store it into the bitstream along with a corresponding motion vector. These will then be used by the decoder to reconstruct a macroblock that corresponds to macroblock 223. Many video compression algorithms provide very efficient ways of encoding zeroes (i.e. fewer bits are required), thus better matches produced by the motion estimation process will result in fewer number of bits encoded into the bitstream.
When looking for the best motion vector, the metric that is being minimized when finding the best match is the total number of bits produced when encoding the entire video sequence. However, the motion estimation algorithm used in encoding the current macroblock can affect the number of bits used by future macroblocks in unforeseen ways. Thus, it is extremely difficult to calculate the impact that choosing a particular motion vector for a single macroblock has on the size of the entire video sequence. One possible approach is to minimize the number of bits required to encode just the current macroblock. However, this can also be too computationally expensive, so a reasonable approximation is to use a simple distortion metric, such as the sum of absolute differences (SAD), between the pixels in the two blocks.
Further complicating the motion estimation problem is the sheer number of operations required to do an exhaustive search for the best block match, even if an approximation metric such as SAD is used. In addition, a large amount of data memory must be frequently accessed during such a search, thus a straightforward algorithm (i.e. one that searches for the best match by comparing every possible macroblock location in the previous frame to the macroblock being encoded in the current frame; also known as a “brute-force” full search) would perform poorly on an embedded processor that might not have a cache large enough to hold all of the pixels from the previous frame. Thus, there remains a need to search for a best match both efficiently and accurately. The increasing popularity and performance of parallel processors further necessitates a means for video coding which takes full advantage of such parallel processing capabilities.