The present invention relates to a bi-directional motion estimation method and apparatus thereof in a moving image video codec system having a low bit-rate, and more particularly, to a motion estimation method and apparatus thereof, for compensating a constant domain using only motion vectors, by performing a bi-directional motion estimation in units of filtered objects.
In today's information society, there is an tendency toward larger amounts of information to be received and processed. Accordingly, in order to utilize existing transmission bands more efficiently, data compression is essential. Specifically, with regard to digital video which require very large memory capacities, compression enables the efficient storage, detection and transmission of such large quantities of information. Therefore, many video data compression technologies have been developed.
Video data compression technology (coding methods) can be classified as being lossy or lossless according to the degree of information loss when employed. This technology can be further divided into an intra-frame coding method by which the spatial overlap present in a still image is removed, and an inter-frame coding method by which the time overlap present in a moving image is removed.
On the other hand, another kind of classification for video data compression can be made; that is, in accordance with whether the technique is a "first generation" or a "second generation" coding method. In first generation coding, information loss is small and international standards are in the course of being established. First generation coding methods include spatial coding (e.g., pulse-coded modulation, differential pulse-coded modulation or delta modulation), transform coding (e.g., Karhunen-Loeve, Fourier, Harr, Hadamard, sine or cosine), hybrid coding which combines the spatial and transform coding techniques, and motion compensated coding which is used for moving pictures. In second generation coding, specific image characteristics are used in conjunction with the human visual system itself. Second generation coding methods include pyramid coding, anisotropic nonstationary predictive coding, contour-texture oriented techniques, and directional decomposition based coding.
Among the above-mentioned methods, the motion-compensated coding method is used for high-definition television (HDTV) broadcasting systems and standardized schemes of the Moving Picture Experts Group (MPEG). Motion estimation methods used in motion-compensated coding include a pel-recursive algorithm and a block matching algorithm, and even though the pelrecursive algorithm is more precise, block matching is widely used for moving image systems in view of real-time processing and simplified circuit implementation. In using the block matching algorithm, an image is partitioned into blocks having a constant size, e.g., 16.times.16 or 8.times.8, and then a motion vector for each block is obtained using a minimum absolute error. The block matching algorithm (disclosed in U.S. Pat. Nos. 5,151,784, 5,060,064 and 4,864,394) is used for the MPEG-1 and MPEG-2 standards.
In addition, there has been proposed a method by which a spatial coordinate is changed into a frequency coordinate using a fast Fourier transform (FFT) coefficient and then the motion is estimated using the peak distribution of frequency data (see "Motion Detection Using 3-D FFT Spectrum," by Arica Kojima, Norihoko Sakurai and Junichi Kishikami, in '93 ICASSP, April 1993). Similarly, a motion estimation method using a wavelet transform (WT) technique has also been proposed (see "Motion Estimation with Wavelet Transform and the Application to Motion-compensated Interpolation," by C. K. Cheong, K. Aizawa, T. Saito and M. Hatori, in '93 ICASSP, April 1993).
All of the these methods have advantages in that they can estimate motion with relative precision for most video sequences. However, with the block-matching methods, it is not possible to search motion vectors if objects of contrasting motion are contained in a given block. Further, the FFT and WT methods result in a waste of processing time and an overly complex transformation of the spatial coordinates. Also, since the structural variation of a moving object through an image is not considered, object-based motion cannot be estimated precisely.
Due to the drawbacks of the above methods, they cannot be adopted for the digital video compression of next-generation moving image communication systems such as video telephones, video conferencing and other types of audio-video communication using an integrated service digital network (ISDN).