1. Field of the Invention
The present invention relates to a method for encoding a video signal through discrete cosine transform and motion estimation and an apparatus therefor, and more particularly, to a method for encoding a video signal for simplifying motion estimation by referring to discrete cosine transform coefficients and an apparatus therefor.
2. Description of the Related Art
In general, a video signal is compressed by two methods. One is intraframe compression and the other is intraframe compression. According to intraframe compression, information is compressed in a video frame. A discrete cosine transform (DCT) is included in the intraframe compression. According to the DCT, correlation of data is removed through two-dimensional pivoting. An input frame is divided in units of blocks and an image of each block is transformed from a spatial region to a frequency region. The transformed data tend to cluster on one side, a lower region. Spatial overlap is removed by quantizing only the clustered data through use of a quantizer.
According to the intraframe compression, temporal overlap is removed by encoding an image on the basis of differences in corresponding pixel values between continuous video frames. Because people or objects move only in a center of a screen without changing in a background in temporally continuous images, it is possible to remove the temporal overlap using such a characteristic. That is, it is possible to significantly reduce an amount of data when a screen does not change or even though a screen changes by not encoding a similar portion and referring to a previous image. Such a method is referred to as a motion estimation (ME) technology. The ME technology is used as the intraframe compression method in almost all image encoding standards such as moving picture experts group (MPEG)-4 as well as H.261, MPEG-1, MPEG-2, and H.263.
FIG. 1 illustrates a conventional encoding system 100 for compressing a digital video signal, for example, an image encoding system of MPEG-2 method. A conventional method for compressing an image through the DCT and the ME with reference to FIG. 1 will now be described.
A frame-type input video signal is input to a frame memory 101. The frame is stored in the frame memory 101 as continuous blocks of pixel data so as to be processed in units of blocks. A frame block commonly has pixel sizes of 8×8 to 16×16. For the convenience of explanation, a block having a pixel size of 8×8 will be described. However, the present invention can be applied to a block of another pixel size.
A DCT 103 discrete cosine transforms an input video signal read by the frame memory 101 in units of blocks and generates DCT coefficients. A quantizer 105 quantizes the generated DCT coefficients. A bit ratio controller 117 determines a quantization table to be used for quantization by the quantizer 105 in order to adjust a target transmission bit ratio to thus control a bit ratio. The quantized DCT coefficients are scanned in zigzags and are input to a variable length coder 107. The variable length coder 107 transforms the scanned quantized DCT coefficients into variable length encoded data and outputs the data as an encoded continuous bit stream through a bit stream generator, not shown.
The output of the quantizer 105 is also input to an inverse-quantizer 109. The DCT coefficients output from the inverse-quantizer 109 are inverse discrete cosine transformed by an inverse discrete cosine transform (IDCT) 111 and become recovered pixel data in units of blocks. The recovered pixel data in units of blocks are stored in a frame memory 113. All blocks of a video frame are sequentially recovered and are stored in the frame memory 113. The recovered image frame stored in the frame memory 113 is used as a reference frame for ME.
After all blocks of a first video frame are processed by the encoding system 100, a second video frame is input to the encoding system 100. A motion estimator 115 searches for a region which is the most similar to a first macro block (MB) of the second frame in a search region of a reference frame stored in the frame memory 113. In general, the search region includes a plurality of candidate MBs. The motion estimator 115 moves a reference region having the same pixel size with that of the MB up and down and right and left in the search region in units of half pixels and compares the pixels of the MB with the pixels of the reference region. The MB commonly has a size of 8×8 or 16×16. Various common searching algorithms such as a full searching block matching algorithm (FBMA), a three step search (TSS), diamond search, and hierarchical motion estimation or block matching techniques are used. A motion vector (MV) illustrating a relationship between the position of the most similar reference region of the searched reference frame and the MB of a second image frame is determined.
A difference between the first MB of the second frame and the most similar reference region of the reference frame is obtained by an adder 119. The difference is encoded by the DCT 103, the quantizer 105, and the variable length coder 107 together with the MV. The difference and the MV are obtained by separate apparatuses and separate processes. However, the MV and the difference can be obtained in one process. The difference is input to the inverse-quantizer 109 and the IDCT 111 and is stored in the frame memory 113 as the recovered pixel data for the ME of the next frame. The processes are sequentially applied to the all blocks of the second frame in their entirety.
As mentioned above, a reference frame used for the ME is not the image frame of the original but a recovered frame from the decoding of the already encoded, that is, quantized DCT coefficients. This is for minimizing an error between an encoding system and a decoding system by receiving encoded image data from the decoding system and undergoing the same processes as those of encoding. A N×N inverse discrete cosine transform equation used for the decoding process is as follows.
                                                                        f                ⁡                                  (                                      x                    ,                    y                                    )                                            =                                                2                  N                                ⁢                                                      ∑                                          u                      =                      0                                                              N                      -                      1                                                        ⁢                                                                          ⁢                                                            ∑                                              v                        =                        0                                                                    N                        -                        1                                                              ⁢                                                                  C                        ⁡                                                  (                          u                          )                                                                    ⁢                                              C                        ⁡                                                  (                          v                          )                                                                    ⁢                                              F                        ⁡                                                  (                                                      u                            ,                            v                                                    )                                                                    ⁢                      cos                      ⁢                                                                                                    (                                                                                          2                                ⁢                                x                                                            +                              1                                                        )                                                    ⁢                          u                          ⁢                                                                                                          ⁢                          π                                                                          2                          ⁢                                                                                                          ⁢                          N                                                                    ⁢                      cos                      ⁢                                                                                                    (                                                                                          2                                ⁢                                y                                                            +                              1                                                        )                                                    ⁢                          v                          ⁢                                                                                                          ⁢                          π                                                                          2                          ⁢                                                                                                          ⁢                          N                                                                                                                                                                                            wherein              ,                                                          ⁢                              C                ⁡                                  (                  u                  )                                            ,                                                C                  ⁡                                      (                    v                    )                                                  =                                  {                                                                                                              1                                                      2                                                                                                                                                                            for                            ⁢                                                                                                                  ⁢                            u                                                    ,                                                      v                            =                            0                                                                                                                                                              1                                                                    otherwise                                                                                                                                                    [                  Equation          ⁢                                          ⁢          1                ]            and F(u,v) is a reference frame function that provides decoded DCT coefficients from the decoding of the previously encoded (quantized) DCT coefficients, and u and v are coordinates in the DCT block.
[Equation 1] has calculation complexity of O(n3). The entire quantized DCT coefficients are inverse discrete cosine transformed by [Equation 1]. As a result, a larger amount of operations is used than in a case where the original image frame is used as the reference frame. Also, efficiency of an encoding method deteriorates. Because the search region of the reference frame and all of the pixels of the current MB are compared with each other by the ME 115, time required for estimating motion and the amount of operations increase.
A portable system such as a mobile terminal has restricted operation ability and power supplying ability. An excessive amount of operations required for the ME is a heavy burden for the portable system. However, in the case of transmitting a moving picture through a radio channel, a significantly large amount of data is generated. Meanwhile, a usable frequency band is restricted. In order to transmit significantly large moving picture data by a restricted frequency band, it is essential to reduce the amount of transmitted data using ME. Therefore, it is necessary to compress moving picture data using the ME and to reduce the excessive amount of operations required for the ME in order to reduce the amount of transmitted data.