The invention relates to apparatus and methods for encoding video and image data, and in particular, to apparatus and methods for performing motion estimation.
The emergence of multimedia computing is driving a need for digitally transmitting and receiving high quality motion video. The high quality motion video consists of a plurality of high resolution images, each of which requires a large amount of space in a system memory or on a data storage device. Additionally, about 30 of these high resolution images need to be processed and displayed per second in order for a viewer to experience an illusion of motion. As a transfer of large, uncompressed streams of video data is time consuming and costly, data compression is typically used to reduce the amount of data transferred per image.
In motion video, much of the image data remains constant from one frame to another frame. Therefore, video data may be compressed by first describing a reference frame and then describing subsequent frames in terms of changes from the reference frame. Standards from an organization called Motion Pictures Experts Group (MPEG) have evolved to support high quality, full motion video. A first standard (MPEG-1) has been used mainly for video coding at rates of about 1.5 megabit per second. To meet more demanding application, a second standard (MPEG-2) provides for a high quality video compression, typically at coding rates of about 3-10 megabits per second.
An example of the MPEG compression process is discussed next. Generally, a first frame may not be described relative to any other frame. Hence, only intra (I) frame or non-predictive coding is performed on the first frame. When a second frame is received, the second frame may be described in terms of the I frame and a first forward predicted (P) frame. The compression of the received second frame is delayed until receipt of the first P frame by a processing system. In a similar manner, a third frame is also described in terms of the first I and P frames. The first P frame is formed by predicting a fourth received frame using the first I frame as a reference. Upon computation of the first P frame, the motion estimation engine can process the second and third received frames as bidirectionally (B) predicted frames by comparing blocks of these frames to blocks of the first I and P frames.
One primary operation performed by the motion estimation engine is block matching. The block matching process identifies a block of image data that should be used as a predictor for describing the current target block. To identify the proper predictor, tokens containing blocks of picture elements (pel) such as a 16xc3x9716 pel block describing the current macroblock are received and compared against the content of a search window.
The block matching process computes a mean absolute difference (MAD) between data stored in the target block and blocks at various offsets in the search window. In this process, corresponding data from the two blocks being compared are subtracted, and the sum of the absolute values of the pel differences are calculated. The smaller the MAD, the better the match between the blocks. The motion estimation engine keeps track of the smallest MAD computed during the search process to determine which of the block in the search window is the best match to the input token. A motion vector describing the offset between the current frame and the best match block is then computed. The motion vector is subsequently sent back to a host processor in the form of an output token.
Although the motion estimation process may be a full, exhaustive block matching search, a multiple step hierarchical search to either a full or a half pixel search resolution is generally performed. In the hierarchical search approach, a best matching block is first found using a low resolution macroblock containing fewer data points than the full resolution image. Once the best matching block has been found, a full resolution search in the vicinity of the best matching block can be performed. This sequence reduces the total number of computations that must be performed by the motion estimation engine as fewer individual pel comparisons are performed in the reduced resolution image. Hence, the appropriate macroblock from which to compute the motion vector is more quickly determined.
As the motion estimation process is compute intensive, dedicated hardware is typically used to perform this function. The hardware may be coupled tightly to a processor or alternatively, may be implemented as a hard wired control logic. A tight coupling with the processor allows flexibility in implementing the search process. Flexibility is desirable, as different video applications would benefit from different levels of processing.
Although flexible, the tight coupling approach does not provide as high performance as the hard wired approach. The hard wired approach delivers high performance as it minimizes the burden of motion estimation searching on the processor and thus releases processing cycles otherwise needed for the encoding process. However, the performance associated with the hard wired approach is achieved at the expense of reductions in the flexibility of handling complex search operations.
An apparatus performs motion estimation based on a reference image and a target image in a flexible, yet high performance manner. The apparatus has a command memory for storing a motion estimation command list segment and a search engine connected to the command memory. The search engine retrieves and processes commands stored in the command list segment. The search engine in turn has a reference window memory containing one or more reference data segments, a target memory containing one or more target data segments, and a data path engine for generating a sum of absolute differences between data in the reference window memory and data stored in the target memory. A result memory receives outputs from the motion estimation search engine in the form of motion estimation result list segments.
In one aspect of the invention, each of the reference window memory, target memory and result memory is double-buffered so that motion estimation operations can proceed concurrently with data transfers associated with the next motion estimation operation.
In another aspect, the apparatus allows reference fetches to be shared by up to four adjacent search targets in a split search command. This is accomplished by fetching a reference window common to the adjacent search targets prior to performing the motion estimation. By reducing unnecessary data transfers over the bus, performance is enhanced while bus contention is reduced.
In another aspect of the invention, commands in the command list segment and results in the result list segment share an identical format. The size of each command in the command list and each result in the result list is also identical. The identical format and size allows the result generated by a previous search to be reused as a part of the command of the next hierarchical search.
Advantages of the present invention include the following. The apparatus off-loads much of the motion estimation processing from the processor while allowing the processor to retain full control of critical search parameters, including the number of levels, search regions and range, target size, horizontal and vertical decimation, field versus frame search, among others. Thus, flexibility and high performance are maintained.
The double buffering of the reference window memory, the target memory and result memory results in a performance advantage, as the motion estimation processing can operate on data stored in one set of the double-buffered memory devices while the other set can load data from a system memory. Hence, latencies associated with the system memory access are hidden from the operation of the motion estimation engine.
Additionally, the common format in commands and results allows the results of the current search to be used as part of the next search command in the sequence of hierarchical search or for motion compensation. The ease of reusing the fields of the current result eliminates unnecessary processing of intermediate search results, thus enhancing performance. Moreover, the split search command capability allows the reference and target fetches to be shared by all targets in the split search command. This feature reduces unnecessary fetches and bus loading by up to four times.