The present invention relates generally to the compression of digital data. More specifically, the present invention relates to methods and apparatus for providing sub-pixel motion estimation for encoding a digital video signal.
A substantial amount of digital data must be transmitted in digital television systems and the like. A digital television signal includes video, audio, and other data (such as Electronic Programming Guide (EPG) data, and the like). In order to provide for efficient broadcast of such digital signals, it is advantageous to compress the digital signals to minimize the amount of data that must be transmitted.
The video portion of the television signal comprises a sequence of video “frames” that together provide a moving picture. In digital television systems, each line of a video frame is defined by a sequence of digital data bits, or pixels (also referred to herein as “pels”). Each video frame is made up of two fields, each of which contains one half of the lines of the frame. For example, a first or odd field will contain all the odd numbered lines of a video frame, while a second or even field will contain the even numbered lines of that video frame. A large amount of data is required to define each video frame of a television signal. For example, 7.4 megabits of data is required to provide one video frame of a National Television Standards Committee (NTSC) television signal. This assumes a 640 pixel by 480 line display is used with 8 bits of intensity value for each of the primary colors red, green, and blue. High definition television requires substantially more data to provide each video frame. In order to manage this amount of data, the data must be compressed.
Digital video compression techniques enable the efficient transmission of digital video signals over conventional communication channels. Such techniques use compression algorithms that take advantage of the correlation among adjacent pixels in order to derive a more efficient representation of the important information in a video signal. The most powerful compression systems not only take advantage of spatial correlation, but can also utilize similarities among adjacent frames to further compact the data. In such systems, motion compensation (also known as differential encoding) is used to transmit only the difference between an actual frame and a prediction of an actual frame. The prediction is derived from a previous (or future) frame of the same video sequence. In such motion compensation systems, motion vectors are derived, for example, by comparing a block of pixel data from a current frame to similar blocks of data in a previous frame. A motion estimator determines how a block of data from the previous frame should be adjusted in order to be used in the current frame.
Motion compensation is extensively used in video codecs as a means to exploit temporal redundancy between frames (and/or fields) of video. Most standard based video decoders (e.g., those implementing Moving Picture Experts' Group (MPEG) standards 1 and 2) allow one or two translational motion vectors (MV) per block of pixels. These MVs are computed by a motion estimation (ME) process in the video encoder. The most reliable ME algorithm, a full search block matching algorithm (FS-BMA), is widely used in reference software as a benchmark. FS-BMA requires high computational complexity since it attempts to match every possible candidate in the search area, thereby making it impractical for a real-time video encoder. Various “fast” search algorithms have been proposed and utilized in real-time encoders. Most of these techniques sacrifice search quality by using only a subset of the search area in order to reduce the total number of searches. However, most of the existing fast algorithms focus on a full pel resolution ME and are not applicable to half pel. In order to achieve the final half pel MV, the encoder performs a full search at half pel positions around the full pel result from the fast algorithm.
To minimize motion compensated differences, video codecs should generally utilize a dense motion field and fine MV resolution. A single MV with half pel accuracy for every 8×8 block of pixels is typically employed by modem video codecs. A MV for a smaller block size (2×2 and 4×4) with higher accuracy (up to an eighth of a pixel) is useful for tracking the motion of small objects, and such systems are currently being developed in next generation video codecs. The complexity of ME is more pronounced when the MV has sub-pixel resolution since the number of search points increases exponentially as the MV resolution increases. To deal with these additional search points, most real time encoders adopt a hierarchical approach which does not perform FS-BMA at all sub-pixel search points. Instead, only search points that coincide with a full pel position are searched first. Search points at half pel positions surrounding the best matched candidate from a full pel search point are then searched. This process is repeated until the desired accuracy is reached. The complexity of sub-pixel ME is quite significant since most encoders perform a full search at this level even though a fast ME algorithm for full pel ME may be applied. For example, in the baseline ME method described below, this amounts to 18 sub-pixel search positions for every 16×16 block.
The baseline ME method which is routinely used in reference software implementations consists of three main tasks, i.e. a full pel search for a 16×16 block (a 16×16 block is commonly known as a macroblock, or “MB”), a half pel search for a 16×16 block, and a half pel search for an 8×8 sub-block. FIG. 1 shows an example of such a prior art baseline method (boundary effects are ignored in FIG. 1). In FIG. 1, “X” denotes search points from a first task; “+” denotes search points from a second task; and “O” denotes search points from a third task. The first task (16×16 full pel search) matches the current block with every candidate at the full pel position in the search window in the reference frame to find a best matched block. The best matched block from the first task is denoted as 10 in FIG. 1. The search window for the first task is centered at the same coordinate as the current block and is extended in each direction by an amount indicated by the user. The second task (16×16 half pel search) matches the current block with every candidate block at the half pel position in the search window to locate a new best matched half pel block, denoted as 20 in FIG. 1. The search window center of the second task search is at the position of the best matched block 10 from the first task, and each side is extended by one half pel for a total of nine candidate blocks (i.e. the nine search points indicated by “+” in FIG. 1). The third task (8×8 half pel search) matches four sub-blocks of the current block (obtained by dividing the current 16×16 block into four equal 8×8 sub-blocks) with every candidate at the half pel position in their respective windows to obtain four best matched 8×8 half pel sub-blocks (designated 30, 32, 34, and 36 in FIG. 1). The search window centers for each third task search are at the positions of the corresponding sub-blocks of the best matched block 20 from the second task, and each side is extended by one half pel (+/−1*0.5 pel) for a total of nine candidate blocks.
It would be advantageous to provide a ME algorithm which reduces the number of searches and computations performed as compared to the prior art ME process, while improving or maintaining the search quality. It would be further advantageous to reduce the number of searches and computations by discarding redundant search points between at least two of the searches performed in the baseline method described above (i.e. by discarding redundant search points between one of the first and third task, the second and third task, and the first and second task). It would be still further advantageous to provide for a ME process which is easily extendible to higher sub-pixel resolutions, such as one half pel, one quarter pel, one eighth pel, and beyond.
The methods and apparatus of the present invention provide the foregoing and other advantages.