The invention relates to video signal compression and, in particular, to a method and apparatus that perform motion estimation in the DCT domain in the course of compressing a video signal.
Generally, a motion picture signal has a high correlation between portions of the motion picture signal representing two consecutive pictures, except when scene changes occur. Thus a motion picture signal can be compressed by determining the differences in the pixel data between consecutive pictures of the motion picture signal, and then encoding these differences. However, if the picture represented by the motion picture signal includes moving portions, the number of difference data can be large. For this reason, block motion estimation is preferably used.
In block motion estimation, the current picture is divided into blocks and the location in another picture, called a reference picture, of a reference block that most closely matches the current block is determined. The differences between the current block and the reference block are then calculated and encoded. The close match between the current block and the reference block ensures that the encoding process generates only a small number of data. The spatial displacement in the reference picture between the reference block and a block corresponding in location to the current block in the current picture is represented by a motion vector. The motion vectors determined for the blocks of the current picture may also be subject to encoding.
Block motion estimation is often the most computationally demanding processing performed in standard motion picture signal compression schemes, such as MPEG and H.263. See, for example, D. le Gall, MPEGxe2x80x94Video Compression Standard for Multimedia Applications, 34 COMMUNICATIONS OF THE ACM, 47-58, (1991 April). Block motion estimation may involve a search for the best of the potential reference blocks located in a certain search range of the reference picture as the reference block. The reference block is the best of the potential reference blocks in the sense that a given distance measure between it and the current block has a minimum value. However, in many applications, the computational complexity of the just-described full search method is too high.
One alternative is to perform a suboptimal search, which may be performed by conducting a partial search in which only a subset of the potential reference blocks is tested, or may be performed iteratively searching at finer resolutions. These approaches significantly reduce the computational burden of motion estimation at the expense of increasing the motion estimation error, and thus increasing the number of data generated when the differences between the current block and the reference block are encoded.
Another alternative with a lower computational burden is to perform a full search in the frequency domain. In this case, the spatial domain cross-correlation function between the current block and the potential reference blocks is calculated in the frequency domain using element-by-element multiplication. The current block and the potential reference blocks may be transformed to the frequency domain using a discrete Fourier transform (DFT), and, in particular, by using the fast Fourier transform (FFT) algorithm. Using a DFT works best for large blocks, which make the DFT complexity small compared to the overall complexity. However, for blocks of the size used in MPEG coders, the complexity of the transformation processing is relatively high, which makes this approach unattractive. Two factors contribute to the high complexity of the DFT in this application: (i) The spatial domain blocks have to be padded with zeroes to turn the cyclic correlation induced by the DFT into linear correlation, and (ii) the DFT generates complex numbers, even though the input blocks and the final outputs are all real numbers.
Two other transform domain methods have recently been suggested. In Frequency-Domain Motion Estimation Using a Complex Lapped Transform, 2 IEEE TRANS. ON IMAGE PROCESSING, 2-17, (1993 January), R. W. Young and N. G. Kingsbury proposed using the complex lapped transform (CLT), which is a Fourier-based overlapping transform that relaxes the need for zero padding. This method has reduced complexity compared to the full search in the spatial domain and the DFT-based frequency domain scheme. However, the CLT motion estimation algorithm does not calculate the exact cross-correlation between the blocks, but rather an approximation to a smooth (filtered) cross-correlation version of it. Young and Kingsbury claim that smoothing decreases the effect of noise on the motion estimation. However, Koc and Liu have pointed out that smoothing also makes the estimation less accurate for blocks containing blur edges.
The second transform-domain motion estimation algorithm is that proposed by U. V. Koc and K. J. R. Liu in DCT-Based Motion Estimation, ISR TR 95-1, University of Maryland; Discrete-Cosine /Sine-Transform Based Motion Estimation, PROC. IEEE INT""L CONF. ON IMAGE PROCESSING (ICIP), III-771-775, (1994 November); Low-Complexity Motion Estimation Scheme Utilizing Sinusoidal Orthogonal Principle, PROC. IEEE WORKSHOP ON VISUAL SIGNAL PROCESSING AND COMMUNICATIONS, 57-62, (1994 September); and Adaptive Overlapping Approach for DCT-Based Motion Estimation, PROC. IEEE INT""L CONF. ON IMAGE PROCESSING (ICIP), (1995 October). Koc and Liu""s algorithm, called DXT, is based on the concept of so-called DCT/DST Pseudo-Phase. The phase-correlation method for motion estimation is translated from the DFT domain to the discrete cosine transform (DCT) and discrete sine transform (DST) domains. Since the DCT and DST are real, they contain no phase component. However, by combining the information encapsulated in both transforms, DXT produces pseudo-phase terms, with properties similar to those of the Fourier phase components. Even though DXT is elegant, and may be useful for large transform coefficient blocks, edge effects prevent DXT from producing accurate results for transform coefficient blocks of the size commonly used in MPEG coders.
Accordingly, what is needed is a frequency-domain motion estimation scheme that calculates the exact linear cross-correlation between the current block and the potential reference blocks of the reference picture, that does not require zero padding to be applied to the input blocks before transformation, that does not involve manipulating complex numbers, and that operates fast enough to be used for real-time motion estimation.
The invention provides a method for determining a motion vector between a current block and a reference block of a reference frame. The reference block is a block located in a search region of the reference picture that most closely matches the current block. The search region is divided into search region blocks. In the method, the current block is orthogonally transformed using DCT/DST transforms of a first type without prior zero padding of the current block to generate a current quadruple of transform coefficient blocks. The current quadruple is processed together with a reference quadruple of transform coefficient blocks generated from four of the search region blocks to generate a quadruple of processed transform coefficient blocks. The quadruple of processed transform coefficient blocks is inversely transformed using inverse DCT/DST transforms of a second type to generate a block of exact cross-correlations between the current block and the search region. The motion vector is determined from the block of exact cross-correlations.
The invention also provides an apparatus for determining a motion vector between a current block and a reference block of a reference frame. The apparatus comprises an orthogonal transform module, a processing module, an inverse transform module and a motion vector determining module. The orthogonal transform module is connected to receive the current block and operates to generate a current quadruple of transform coefficient blocks by orthogonally transforming the current block using DCT/DST transforms of a first type without prior zero padding of the current block. The processing module is connected to receive the current quadruple from the orthogonal transform module, and additionally to receive a reference quadruple of transform coefficient blocks generated from four of the search region blocks. The processing module operates to generate a quadruple of processed transform coefficient blocks from the current quadruple and the reference quadruple. The inverse transform module is connected to receive the quadruple of processed transform coefficient blocks from the processing module, and operates to generate a block of exact cross-correlations between the current block and the search region by inversely transforming the quadruple of processed transform coefficient blocks using inverse DCT/DST transforms of a second type. The motion vector determining module is connected to receive the block of exact cross-correlations and operates to determine the motion vector therefrom.
The inverse DCTs and DSTs of the second type may be of a type different from the DCTs and DSTs of the first type. In a preferred embodiment, the forward transforms are of Type IIe and the inverse transforms are of Type Ie according to Wang""s classifications set forth by S. A. Martucci, in Symmetric Convolution and Discrete Sine and Cosine Transforms, SP-41 IEEE Trans. on Signal Processing, 1038-1051, (1994 May) and K. R. Rao, and P. Yip, DISCRETE COSINE TRANSFORM: ALGORITHMS, ADVANTAGES, APPLICATIONS, Academic Press (1990).
The motion estimation apparatus and method according to the invention calculate the exact cross-correlation between the current block and potential reference blocks in the reference picture using only forward and inverse DCTs and DSTs, thus avoiding the need to perform complex-number arithmetic operations. Moreover, the motion estimation apparatus and method according to the invention do not apply zero padding to the input blocks or the blocks into which the search region is divided before these blocks are subject to forward DCT and DST processing. Thus, the forward DCT and DST processing performed in the motion estimation apparatus and method according to the invention is of minimal complexity.
Since the DCT/DST-based motion estimation method and apparatus according to the invention calculate the exact linear cross-correlation, they do not suffer from edge effects, and produce results that are identical to those obtained by a full search in the spatial domain using the mean squared error (MSE) criterion.
When four frame stores are provided to store the quadruples of transform coefficients generated from the reference picture, the number of computational operations required by the DCT/DST-based motion estimation method and apparatus according to the invention is about 9% of that required by a full search using the MSE criterion, and 24% of that required by a full search using the mean absolute difference (MAD) criterion. When such a buffer is not provided, the above numbers of computational operations change to 10-64% (MSE) and 26-64% (MAD).
Tests show that the DCT/DST-based motion estimation method and apparatus according to the invention operate significantly faster than the above-described transform domain methods for practically the same memory requirements and provide substantially higher accuracy.