1. Field of the Invention
The present invention relates to motion estimation for video compression. More specifically, an experimental design for motion estimation for video compression is disclosed.
2. Description of the Prior Art
Video has become an integrated part of everyday electronic devices. While bandwidth and processing power continue to increase rapidly, the increases have only served to raise the demand for higher quality video products, which, in turn, require an even larger bandwidth and even more processing power.
Video compression standards have long been available to lower the required bandwidth and alternatively to increase the amount of video data that can be stored in any given sized storage media. In line with these goals, motion estimation is widely used for video compression standards such as MPEG-1, MPEG-2, and MPEG-4 among others.
The conventional methods for motion estimation are well known to those skilled in the art. Please refer to FIG. 1. In general, each frame goes through a process where a current video frame is read into memory. A small reference block is located within a larger search window of the current frame and a motion vector is generated estimating the direction of motion of the reference block within the search window. This motion vector is used in conjunction with information from the previous frame to generate an estimated image frame in respect to the current frame. The estimated image frame is then subtracted from the current frame, which effectively removes duplicated imagery and results in much less data necessary to be saved in the output file.
Because the estimated image frame is subtracted from the current frame and only the difference is saved, it is obvious that the more accurate the estimation is, the smaller the output file is. The accuracy of the estimated image frame to a large degree depends on the accuracy of the motion vector. The accuracy of the motion vector in turn depends on the accuracy of locating the reference block within the search window.
It is generally accepted that the reference block can be located within the search window most accurately using a full search as shown in FIG. 1. A full search consists of comparing the reference block sequentially with every possible location within the search window. For each location, the comparison is done by adding the absolute values of the difference between the brightness of each pixel in the reference block and the brightness of the corresponding pixel in the current search location. The location with the lowest total of absolute values is considered the best match and is selected to be used to calculate the motion vector.
While the most accurate, the problem with the full search method is the amount of calculations that need to be performed. For example in FIG. 1, the reference block is shown as 8 by 8 pixels and the search window is shown as 16 by 16 pixels. In this case, a full search requires comparing 81 possible locations with the reference block, each requiring calculating and summing 64 absolute values before the best match can be found. Obviously using more than one reference block or larger search windows drives the number of required calculations upward prohibitively. Because decoding the compressed video usually must be done on a real time basis and involves the similar processing steps in reverse, a tradeoff is made to balance the accuracy of the motion vector against the amount of arithmetic processing necessary, and therefore speed, when encoding or decoding motion estimation compressed video.
To reduce the number of calculations required, some algorithms compare the reference block with only selected, representative locations within the search window instead of every possible location. While this results in some loss of quality and in a larger file, the gain in speed is dramatic and the quality is still acceptable. The quality of the video is defined as being acceptable if the original and the processed images are indistinguishable to an average viewer at a distance of 6 times the height of the image. Examples of this method include a Three-step search, a Four-step search, and a hierarchical search.
A common approach to the compression problem involves generating three copies, or layers, of each frame. The first layer is identical with the original image. The second layer is the original image at one-half resolution. The third layer is the original image at one-fourth resolution. The third layer is searched first to find the best match of the reference block as described above and the center point of this best match is used as the central starting position of a second search in the higher resolutioned second layer. Similarly, the results of the second search yield a starting central starting position for a third search in the first layer. The position of the best match to the reference block in the first layer is used to calculate the motion vector.
To reduce the number of calculations necessary, the exact search process performed on any layer varies from method to method, however a full search is not necessary. Usually, a reduced number of search blocks within the search window are selected and sampled according to the particular method used. FIG. 2 illustrates an example of one such prior art method. In FIG. 2, an 8 by 8 pixel reference block is compared with 9 equally spaced 8 by 8 pixel search blocks within the 16 by 16 pixel search window. In this method, the number of search locations has been reduced from the 81 locations required by the full search in FIG. 1 to only the 9 locations (on each layer) shown in FIG. 2, a dramatic improvement, and still results in acceptable quality video. However, the number of calculations required for each video frame is still quite high and any further reduction in the number of search locations reduces processor load allowing higher resolution or larger images to be real-time encoded and decoded.