The present invention relates to a method and apparatus for efficient motion estimation of a digital video image wherein memory capacity and processing requirements are reduced. The invention is suitable for use in coding compressed digital video images such as those conforming to the MPEG-2 standard.
Digital transmission of television signals can deliver video and audio services of much higher quality than previous analog techniques. Digital transmission schemes are particularly advantageous for signals that are broadcast by satellite to cable television affiliates and/or directly to home satellite television receivers. Such signals can also be transmitted via a cable television network. Additionally, with the development of digital video storage media such as Digital Video Disks (DVDs), consumers now have the capability to store and retrieve compressed digital video in their homes.
Video compression techniques enable the efficient transmission and storage of digital video signals. Such techniques use compression algorithms that take advantage of the correlation among adjacent pixels in order to derive a more efficient representation of the important information in a video signal. The most powerful compression systems not only take advantage of spatial correlation, but can also utilize similarities among adjacent frames to further compact the data. In such systems, differential encoding is used to transmit only the difference between an actual frame and a prediction of the actual frame. The prediction is based on information derived from a previous frame of the same video sequence.
In motion compensation systems, motion vectors are derived by comparing a portion (i.e., macroblock) of pixel data from a current frame to similar portions of the previous frame. The previous frame in transmission order can be either previous or subsequent in display order. A motion estimator determines how the corresponding motion vector in the previous frame should be adjusted in order to be used in the current field. Such systems are very effective in reducing the amount of data to be transmitted.
However, conventional motion estimation techniques are very computational- and memory-intensive. For example, a 16.times.16 pixel macroblock in a frame which is currently being coded may be compared to a 128.times.96 or 128.times.64 search window of a previous or subsequent frame to determine which 16.times.16 pixel comparison region (e.g., block) in the search window most closely matches the macroblock. The criteria for the best match may be defined for each comparison region by summing the absolute values of the pixel differences, or the square of the pixel differences, for each region, and selecting the region with the lowest error.
Thus, sufficient memory is required to store the data for every pixel in the current macroblock and the search region. Furthermore, 16.times.16=256 difference calculations must be made for each comparison region, with 128.times.64=8,192 different comparison regions, for a total of 256.times.8192=2,097,152 difference and accumulation calculations per macroblock. Additionally, with 1,350 macroblocks per frame with the NTSC video standard (e.g., 45.times.30 macroblocks), for example, it can be seen that the processing and memory storage requirements can become very burdensome. This is incompatible with the opposing requirement to provide low-cost video compression hardware, in particular, for consumer applications.
Various schemes have been developed to attempt to reduce the processing and memory storage requirements of motion estimation circuitry. For example, it is possible to reduce the size of the search window. However, this may reduce coding efficiency and/or reduce image quality, in particular for fast motion scenes, where it is likely that the best match region is outside the reduced search window. Alternatively, hierarchical schemes adaptively vary the size of the current macroblock to find the macroblock size which results in the least amount of data being transmitted. However, such multi-pass adaptive schemes have high processing and memory storage requirements.
Additionally, subsampling and averaging schemes may be used to effectively reduce the size of the current macroblock or search window, but this can reduce image quality due to the lost pixel information. Moreover, further computations are required.
Accordingly, it would be desirable to provide a motion estimation system with reduced computational and memory storage requirements. The system should be compatible with both frame (e.g., progressive) and field (e.g., interlaced) mode digital video. The system should reduce computational and memory storage requirements while also maintaining a satisfactory video image. The system should further be compatible with existing video compression techniques such as those conforming to the MPEG-2 standard.
The system should reduce the amount of pixel data from a current macroblock which is used for motion estimation, as well as reducing the amount of pixel data from a search window which is used for motion estimation. The required computations and memory storage capacity should be reduced by up to 50% or more.
The present invention provides a system having the above and other advantages.