The invention relates to apparatus and methods for encoding video and image data, and in particular, to apparatus and methods for performing dual prime motion estimation.
The emergence of multimedia computing is driving a need for digitally transmitting and receiving high quality motion video. The high quality motion video consists of high resolution images, each of which requires a large amount of space in a system memory or on a data storage device. Additionally, about 30 of these high resolution images need to be processed and displayed per second in order for a viewer to experience an illusion of motion. As a transfer of large, uncompressed streams of video data is time consuming and costly, data compression is typically used to reduce the amount of data transferred per image.
In motion video, much of the image data remains constant from one frame to another frame. Therefore, video data may be compressed by first describing a reference frame and then describing subsequent frames in terms of changes from the reference frame. Standards from an organization called Motion Pictures Experts Group (MPEG) have evolved to support high quality, full motion video. A first standard (MPEG-1) has been used mainly for video coding at rates of about 1.5 megabit per second. To meet more demanding application, a second standard (MPEG-2) provides for a high quality video compression, typically at coding rates of about 3-10 megabits per second.
An example of the MPEG compression process is discussed next. Generally, a first frame may not be described relative to any other frame. Hence, only intra (I) frame or non-predictive coding is performed on the first frame. When a second frame is received, the second frame may be described in terms of the I frame and a first forward predicted (P) frame. The compression of the received second frame is delayed until receipt of the first P frame by a processing system. In a similar manner, a third frame is also described in terms of the first I and P frames. The first P frame is formed by predicting a fourth received frame using the first I frame as a reference. Upon computation of the first P frame, the motion estimation engine can process the second and third received frames as bidirectionally (B) predicted frames by comparing blocks of these frames to blocks of the first I and P frames.
One primary operation performed by the motion estimation engine is block matching. The block matching process identifies a block of image data that should be used as a predictor for describing the current target block. To identify the proper predictor, tokens containing blocks of picture elements (pel) such as a 16xc3x9716 pel block describing the current macroblock are received and compared against the content of a search window.
The block matching process computes a mean absolute difference (MAD) between data stored in the target block and blocks at various offsets in the search window. In this process, corresponding data from the two blocks being compared are subtracted, and the sum of the absolute values of the pel differences are calculated. The smaller the MAD, the better the match between the blocks. The motion estimation engine keeps track of the smallest MAD computed during the search process to determine which of the block in the search window is the best match to the input token. A motion vector describing the offset between the current frame and the best match block is then computed. The motion vector is subsequently sent back to a host processor in the form of an output token.
Although the motion estimation process may be a full, exhaustive block matching search, a multiple step hierarchical search to either a full or a half pixel search resolution is generally performed. In the hierarchical search approach, a best matching block is first found using a low resolution macroblock containing fewer data points than the full resolution image. Once the best matching block has been found, a full resolution search in the vicinity of the best matching block can be performed. This sequence reduces the total number of computations that must be performed by the motion estimation engine as fewer individual pel comparisons are performed in the reduced resolution image. Hence, the appropriate macroblock from which to compute the motion vector is more quickly determined.
Moreover, the MPEG-2 introduces a concept of a dual prime motion compensation in Section 7.6.3.6 of the MPEG H.262 specification. In dual prime motion compensation, a macroblock prediction is computed from an average of two previous field references, one with a same parity (top to top or bottom to bottom fields) and one with an opposite parity (top to bottom or bottom to top fields). Motion vectors for each field are coded as a common motion vector for the same parity fields and a small difference motion for the opposite parity fields. Ideally, the search of the target and reference fields is to be coordinated to minimize the overall error.
Once the motion vector for a macroblock is known relative to a reference field of the same parity, it is extrapolated or interpolated to obtain a prediction of the motion vector for the opposite parity reference field. This prediction is adjusted by adding a small shift to account for a half-pel vertical offset between the two fields. Then, small horizontal and vertical corrections (+1, 0, xe2x88x921) coded in the bitstream are added. In calculating the pel values of the prediction, motion-compensated predictions from the two reference fields are averaged to reduce noises in the data.
If a field picture is being coded, the coded motion vector is applied to the reference field of the same parity. Obtaining the motion vector for the opposite parity field between the two fields generally involves interpolation. If a frame picture is being coded, the coded motion vector is applied to fields of the same parity, but a single vector is used for both fields. Obtaining motion vectors for the opposite parity fields in a frame picture involves both interpolation and extrapolation, but the computation is otherwise the same.
As the dual prime motion estimation process is compute intensive, dedicated hardware is typically used to perform this function. The hardware may be coupled tightly to a processor or alternatively, may be implemented as a hard wired control logic. A tight coupling with the processor allows flexibility in implementing the search process. Flexibility is desirable, as different video applications would benefit from different levels of processing.
Although flexible, the tight coupling approach does not provide as high performance as the hard wired approach. The hard wired approach delivers high performance as it minimizes the burden of motion estimation searching on the processor and thus releases processing cycles otherwise needed for the encoding process. However, the performance associated with the hard wired approach is achieved at the expense of reductions in the flexibility of handling complex search operations.
An apparatus performs dual prime motion estimation based on an average of previous field references in a flexible, yet high performance manner. The apparatus has a command memory for storing a motion estimation command list segment which in turn contains a search command for specifying a merged search operation over one or more search positions. The apparatus also has a score memory for storing the result of each merged search operation. The score memory is initialized when the merged search operation is initiated. During the search operation, the score memory accumulates the result of each search position. The apparatus also has a search engine connected to the command memory and to the score memory for determining from the score memory a search position with the lowest score. The search engine then generates dual prime motion estimation outputs in the form of motion estimation result list segments.
Implementations of the invention include the following. The search command has a merge bit to select a merged search operation. Accordingly, when the merge bit of a current search command is set and the merge bit of a previous search command is cleared, indicating the start of a merged search operation, the score memory is initialized to the score for each search position. Moreover, when the merge bit of the previous search command is set, indicating that the merged search operation is in progress, the score memory accumulates the result for each search position. Further, when the merge bit of the current search command is cleared and the merge bit of the previous command is set, indicating the end of the merged search operation, a search result is generated by locating in the score memory a search position with the lowest accumulated score wile the score is being generated.
Implementations of the invention additionally include the following. The apparatus may estimate a common vector using merged search operations on fields spaced two temporal units apart at a first pel grid resolution with a common input velocity estimate, a merged search operation on adjacent fields at a second pel grid resolution with a first scaled common input velocity estimate, and a merged search operation on fields spaced three temporal units apart at a third pel grid resolution with a second scaled common input velocity estimate. The first pel grid resolution is a xc2xd pel grid resolution, the second pel grid resolution is a xc2xc pel grid resolution, the third pel grid resolution is a xc2xe pel grid resolution, the first scaled common input estimate is xc2xd of the common input velocity estimate, and the second scaled common input estimate is 3/2 of the common input velocity estimate.
The apparatus also may estimate a differential vector using a merged search operation on adjacent fields at a pel grid resolution with a first scaled common input velocity estimate and a predetermined search range, and a merged search operation on fields spaced three temporal units apart at the pel grid resolution with a second scaled common input velocity estimate and the predetermined search range. The pel grid resolution is a xc2xd pel grid resolution, the first scaled common input velocity estimate is xc2xd of the common input velocity estimate, the second scaled common input velocity estimate is 3/2 of the common input velocity estimate and the predetermined search range is one.
Implementations of the invention further include the following. The search engine compensates for differing search grid resolutions between same and opposite parity motion vector searches. Moreover, temporally adjacent fields are compensated by duplicating a search result to four score memory positions.
Advantages of the present invention include the following. The apparatus off-loads much of the dual prime motion estimation processing from the processor while allowing the processor to retain full control of critical search parameters, including the number of levels, search regions and range, target size, horizontal and vertical decimation, field versus frame search, among others. Moreover, the dual prime search operation takes advantage of information not available to the heuristics approach and is thus more efficient. Further, the dual prime search operation is faster than a brute force approach which exhaustively searches for all possible vectors in arriving at a search position with a minimal score. Thus, flexibility and high performance are maintained by the invention.