1. Field of the Invention
The present invention relates to a moving picture compression coding method, and more particularly to a motion estimation method for image compression coding, and a computer-readable recording medium on which a program for implementing the motion estimation method is recorded.
2. Description of the Related Art
Typically, a mobile phone service has been limited to only a voice service in the second generation wireless network because the second generation wireless network has a narrow bandwidth, therefore the third generation wireless network such as IMT (International Mobile Telecommunication)-2000 service was recently developed to provide users with a moving image service as well as the voice service. With the increasing development of mobile technology, more users expect to see visual information different from voice information on their mobile phones, such that the next generation mobile communication importantly incorporates technology for implementing such a moving image service on mobile phones.
Typically, the size of image data is much larger than that of voice or character data. Consequently, the total size of such image data must be compressed down to a prescribed level. Provided that the size of image data is not compressed down to the prescribed level, it is impossible to process the image data in real time.
There have been proposed a variety of applications for compressing the image data, thereby making it is possible for image signals to be stored or transmitted in real time. There are a variety of image compression international standards for image compression, for example, JPEG (Joint Photograph Expert Group) serving as a still image standard, MPEG (Moving Picture Experts Group)1 for TV broadcasting as one of moving picture standards, MPEG2 for satellite broadcasting as one of moving picture standards, and MPEG4 for low bit rate transmission as one of moving picture standards. Particularly, the MPEG4 is an international standard for compression coding of digital image and audio data having a transfer rate of below 64 kbps, and therefore is a compression-coding standard for image or video data having an ultralow transfer rate and a high compression rate as compared to the MPEG1 or MPEG2, and is mainly applied to mobile telecommunications.
In this case, such image data compression is established by removing redundant data caused by similarities between data of image information and data representing the image information.
There are various kinds of such data redundancy, i.e., spatial redundancy and stochastic redundancy within one frame, and temporal redundancy between frame images. The spatial redundancy is based on similarities between values of adjacent pixels within a frame image, indicates that an arbitrary pixel has a value similar to that of its adjacent pixel, and is controlled by a DCT (Discrete Cosine Transform).
The stochastic redundancy is based on similarities between symbols within a frame image, indicates that a plurality of data are stochastically and unevenly distributed on the frame image such that an arbitrary symbol has a value similar to that of its adjacent pixel, and is controlled by a VLC (Variable Length Coding) serving as an entropy coding method. The VLC indicates a method for allocating a bit having a prescribed size proportional to a symbol size.
Finally, the temporal redundancy is based on similarities between a current frame image and a previous frame image, and is controlled by a ME/MC (Motion Estimation/Motion Compensation). In more detail, the ME is adapted to detect a motion vector between a current frame image and a previous frame image, generates a new frame image by a ME operation using the detected motion vector, subtracts the new frame image from the current frame image to remove the same data between the current frame image and the new frame image in such a way that the temporal redundancy is controlled.
FIG. 1 is a view illustrating a block diagram of a conventional system for transmitting/receiving image data. Referring to FIG. 1, the conventional system for transmitting/receiving image data includes an image data transmitter 100 for compressing image data, and transmitting a compression image signal of the image data; a satellite 1 for receiving the image signal from the image data transmitter 100, and transmitting it to a receiver; and an image data receiver 200 for receiving the image signal from satellite 1, decompressing the image signal, and restoring the original image data.
The image data transmitter 100 includes a MPEG source decoder 110 for compressing video data (VD) and audio data (AD), a text encoder 130 for compressing text data (TD), a channel encoder 150 for performing a channel encoding operation on encoded data of the MPEG source encoder 110 and the text encoder 130 to remove a noise from the encoded data, and a RF (Radio Frequency) unit 170 for modulating encoded data of the channel encoder 150, and transmitting it over a first antenna ANT1. In this manner, the signal transmitted from the transmitter 100 is relayed to the receiver 200 over the satellite 1.
The image data receiver 200 includes a baseband processor 210 for demodulating an image signal received from satellite 1 over a second antenna ANT2, thereby restoring the image signal to baseband image data, a channel decoder 220 for detecting errors of the image data received from the baseband processor 210, correcting the errors, and performing image restoration, and a MPEG decoder 230 for decompressing compressed image data received from the channel decoder 220, and restoring original image data. The TD is also decoded since the TD is also an input to channel encoder. (150)
FIG. 2 is a view illustrating a detailed block diagram of a MPEG source encoder 110 contained in the image data transmitter 100 shown in FIG. 1. Referring to FIG. 2, the MPEG source encoder 110 includes a 8×8 blocking unit 111 for dividing one frame image Vin into 8×8 blocks, a subtracter 112 for subtracting a generated frame image from a current frame image received from the 8×8 blocking unit 111, a 8×8 DCT (discrete cosine transformation) unit 113 for performing a DCT operation on the current frame image received from the subtracter 112, a 8×8 quantizer 114 for quantizing a frame image received from the 8×8 DCT unit 113, a VLC unit 115 for performing a VLC operation on a current frame image received from the 8×8 quantizer 114, a 8×8 dequantizer 117 for dequantizing a frame image received from the 8×8 quantizer 114, a 8×8 IDCT (Inverse Discrete Cosine Transform) unit 118 for performing an IDCT operation on a frame image received from the 8×8 dequantizer 117, an adder 119 for adding a frame image received from the 8×8 IDCT unit 118 and a generated frame image, a frame memory 120 for storing a frame image received from the adder 119, a 16×16 blocking unit 123 for dividing entry of one frame of image data into 16×16 blocks, a motion estimator 121 for estimating a motion vector by comparing a pixel value of a current frame image received from the 16×16 blocking unit 123 with that of a previous frame image received from the frame memory 120, a motion compensator 122 for applying a motion vector received from the motion estimator 121 to the frame image of the frame memory 120, and generating a new frame image, and a multiplexer (MUX) 116 for multiplexing image data received from the 8×8 VLC unit 115 and the motion vector received from the motion estimator 121.
One frame image has a variety of resolutions such as 720×480 and 1192×1080, etc. The motion estimator 121 for estimating a motion vector between a current frame image and a previous frame image considers each such frame image of 16×16 pixel blocks, and processes the frame image in block units.
The motion estimator 121 compares a pixel value of a current frame image F(t) with that of a previous frame image F(t−1) to estimate a moving direction of an image, i.e., a motion vector, and will hereinafter be described in more detail.
FIG. 3 is a view illustrating an exemplary blocked frame image. The 16×16 blocking unit 123 shown in FIG. 2 divides one frame image into 16×16 blocks according to the MPEG4 standard. An example of such 16×16 blocked frame image is shown in FIG. 3. As shown in FIG. 3, an overall frame image is divided into 16×16 blocks, and the overall frame image is denoted by a group of the 16×16 blocks such as B11, B12 . . . B1m, B21, B22 . . . Bn1 . . . Bnm.
FIG. 4 is a view illustrating a current frame image formed by partially-blocking the frame image of FIG. 3, and depicts a current frame image F(t) composed of 9 partial blocks wherein 8 partial blocks are arranged to surround an exemplary block that corresponds from the previous frame to the current frame. FIG. 5 is a view illustrating a previous frame image having block B(t−1)22 corresponding to block B(t)22 of the current frame image of FIG. 4. FIG. 5 depicts a previous frame image F(t−1) composed of 9 partial blocks wherein 8 partial blocks are arranged to surround an arbitrary block B(t−1)22 corresponding to the current block B(t)22 shown in FIG. 4. A dotted line shown in FIG. 5 indicates a search window SRW containing the same blocks as the current block B(t)22 therein. The search window SRW is determined depending on a movable range between successive two frame images comprising about 24 frames per second. A corresponding block B(t−1)22 of the SRW is extended in the range of±block size/2.
Referring to FIG. 5, the motion estimator 121 shown in FIG. 2 compares the current block B(t)22 of FIG. 4 with each block comprising the SRW of FIG. 5. The motion estimator 121 establishes such a comparison process in the direction from the upper left end as shown in FIG. 6a to the lower right end. The comparison process of the motion estimator 121 is established along the same direction as an electron gun scanning direction of a cathode ray tube as shown in FIG. 6b. The motion estimator 121 finds the most similar matching block to the current block in order to estimate a motion vector. This algorithm for finding the most similar matching block is called a block matching algorithm.
In such a block matching algorithm, each pixel value within a block is adapted as a comparison value among blocks, and in more detail, a pixel value of a current block of a current frame image is compared to that of a corresponding block of a previous frame image. The block matching algorithm subtracts a pixel value of a corresponding block from that of a current block, finds an arbitrary block having the least error (or the least difference), calculates a position vector of a matching block on the basis of the, current block, and thereby estimates a motion vector.
In the meantime, the motion vector is generally estimated in light of two factors composed of image quality degradation prevention and high speed estimation.
There are a variety of block matching algorithms, and particularly, one of them is called a full search algorithm. This full search algorithm is adapted as an estimation reference of other algorithms because its resultant estimated image has the best image quality. However, the full search algorithm needs 24 bits in total to display a color pixel because 8 bits are assigned to each of 3 primary colors (i.e., R, G, and B). Thus, provided that such a full search algorithm is applied to all pixel values, a large number of calculations are required such that it is impossible to implement a real-time system in a very wide search range.
To solve this problem, there have been recently proposed a variety of high-speed matching algorithms, for example, a method for reducing the number of search points using a UESA (Unimodal Error Surface Assumption), a method using a multi-resolution, a method for moving a reference point using a correlation between adjacent motion vectors and performing a VSR (Variable Search Range) function, and a method for obtaining a calculation value in a block matching process, etc.
Particularly, a representative one of the method for reducing the number of search points using the UESA is a TSS (Three Step Search) method that will hereinafter be described in more detail.
FIG. 7 is an exemplary view illustrating a SRW (shown in FIG. 5) of a predetermined size. Referring to FIG. 7, the SRW is extended in all directions of a reference block having 16×16 pixels such that a large-sized search window larger than the reference block is formed. In this case, the SRW is typically extended by a size of (+block size/2) in one direction of the reference block, but FIG. 7 indicates this size of (+block size/2) as a prescribed value of +7 for the convenience of description.
FIGS. 8a˜8i are exemplary views illustrating search points within a SRW determined by the conventional TSS method. The TSS method does not determine whether a current block is matched with all blocks of the SRW, but determines whether the current block is matched with one or more of 9 blocks among all blocks as shown in FIGS. 8a˜8i. In this case, center points of these 9 blocks to be searched within the SRW are indicated as reference numerals 1˜9 as shown in FIGS. 8a˜8i. 
FIGS. 9a˜9c are exemplary views illustrating search points adapted for explaining the conventional TSS method. Referring to FIGS. 9a˜9c, search points which may each be a center point of a block in case of using the full search method within a SRW are indicated by small circles, and 9 search points determined by the TSS method are assigned with reference numerals 1˜9, respectively.
The aforementioned TSS method for such motion estimation will hereinafter be described with reference to FIGS. 9a˜9c. 
Firstly, a SAD (Sum of Absolute Difference) is mainly adapted to a process of finding a desired matching block in light of calculation complexity and a system performance. Here, the SAD is defined as the sum of absolute differences (or errors) between a pixel value of a current block of a current frame image F(t) and a pixel value of a block corresponding to the current block on a SRW of a previous frame image F(t−1), and is represented as the following Eq. 1:
                              SAD          ⁡                      (            x            )                          =                              ∑                          l              =              0                                      n              -              1                                ⁢                                    ∑                              j                =                0                                            n                -                1                                      ⁢                                                                                              I                    c                                    ⁡                                      (                                                                  k                        +                        i                                            ,                                              l                        +                        j                                                              )                                                  -                                                      I                    p                                    ⁡                                      (                                                                  k                        +                        x                        +                        i                                            ,                                              l                        +                        y                        +                        j                                                              )                                                                                                                        [Eq.  1]            where Ic(k+i,l+j) is a pixel value of a current frame image block, Ip(k+x+i,l+y+j)is a pixel value of a corresponding block on a SRW of a previous frame image, ‘x’ and ‘y’ are coordinates within the SRW, ‘k’ and ‘l’ are coordinates within a corresponding block, and ‘n’ is a size of a matching block.
Also, the TSS method determines a motion vector through first to third search processes using the UESA method for monotonically increasing an error value on each search point in proportion to a difference between the error value and a global motion vector. Firstly, the first search process calculates all SADs on the basis of nine search points 1˜9 shown in FIG. 9a. The second search process calculates again all SADs on the basis of nine search points 21˜29 of FIG. 9b on the basis of a search point ‘2’ of FIG. 9a when a search point of a minimum SAD calculated by the first search process is ‘2’ shown in FIG. 9a. Subsequently, the third search process calculates all SADs on the basis of nine search points on the basis of a search point ‘22’ of FIG. 9b when a search point of a minimum SAD calculated by the second search process is ‘22’ shown in FIG. 9b, and thereby one search point having a minimum SAD is determined as a motion vector.
The aforesaid TSS method searches all the 9 search points in each of the first to third search processes. In more detail, all the 9 search points are searched in the first search process, 8 search points other than one search point calculated by the first search process are searched in the second search process, and 8 search points other than one search point calculated by the second search process are searched in the third search process. Therefore, the TSS method searches 25 search points in total. However, considering that 24 bits are needed to display a pixel value of one point, the TSS method has an excessively long search time such that it is impossible to perform real-time processing of images using software.