1. Field of the Invention
The present invention relates to an apparatus for compression-encoding a moving picture on the basis of H.264, etc., and more particularly, to an apparatus and method for compression-encoding a moving picture directed to preventing image quality deterioration while minimizing the amount of calculation performed for rate-distortion optimization (RDO).
2. Discussion of Related Art
Digital image data is used in video conferencing, high-definition televisions (HDTVs), video-on-demand (VOD) receivers, moving picture experts group (MPEG) image-supporting personal computers, video game systems, digital ground wave broadcast receivers, digital satellite broadcast receivers, cable TVs (CATVs), and so on. However, characteristics of images and conversion of an analog signal into a digital signal yield a large amount of digital image data. Thus, the digital image data is not used as is but rather is compressed by an efficient compression method.
Three main compression methods are used to compress digital image data. These are a method of reducing temporal redundancy, a method of reducing spatial redundancy, and a compression method using stochastic properties of generation codes. A representative method of reducing temporal redundancy is a motion estimation and compensation method, which is used in most moving picture compression standards such as MPEG, H.263, etc.
The motion estimation and compensation method is used to search for the portion of a previous or next reference screen that is most similar to a particular portion of a current screen, and transmit only difference components between the two portions. In the motion estimation and compensation method, the more precisely motion vectors are searched for, the less difference component data there is to transmit, thus providing a way to efficiently reduce the amount of data. However, searching for the most similar portion of the previous or next screen requires a considerably long estimation time and a large amount of calculation.
An H.264 codec performs the search using a cost function based on RDO instead of using a conventional sum of absolute difference (SAD)-based method. The cost function employed in H.264 uses a rate-distortion (RD) cost calculated by adding the number of encoded coefficients multiplied by a Lagrangian multiplier to a conventional SAD value. Here, the number of encoded coefficients is replaced with a value proportional to a quantized coefficient and then multiplied by a fixed Lagrangian multiplier to determine a compensation cost in order to perform the search.
In order to simultaneously obtain high compression efficiency and high image quality, encoding is performed in 16×16 or 8×8 large block units in conventional moving picture encoding, but a mode having the lowest value among eight different block modes is selected in H.264 moving picture encoding.
However, in order to determine eight different block modes, various encoding operations as well as integer-pixel and sub-pixel searches all must be performed separately for each mode. Consequently, more calculations are required and more time is taken in comparison with a conventional moving picture encoding algorithm. In order to embody a moving picture encoding apparatus for an Internet protocol (IP)-TV, it is necessary to be able to reduce calculation time by minimizing calculations for determining a block mode without deteriorating image quality.
FIG. 1 is a block diagram showing the constitution of a general H.264 moving picture encoder. Among the blocks, a motion estimator comprises an integer-pixel estimator for estimating an integer-pixel-specific motion vector, and a sub-pixel estimator for estimating optimum half-pixel and quarter-pixel-specific motion vectors on the basis of the estimated integer-pixel-specific motion vector.
The illustrated conventional H.264 encoder comprises a motion estimation (ME) module 22, a motion compensation (MC) module 24, an intra mode estimation (IME) module 32, an intra prediction (IP) module 34, a de-quantization (IQ) module 58, an inverse discrete cosine transform (IDCT) module 56, an entropy encoding module 64, a deblocking filter 92, frame memories 12, 14 and 18, and so on.
The motion estimation module 22 performs a function of detecting a motion vector from several reference images and a macroblock mode determination function of searching for the optimum macroblock type having the minimum bit rate and errors. The motion compensation module 24 functions to obtain a compensation image from a reference image according to the motion vector and macroblock mode type detected by the motion estimation module 22. In FIG. 1, the motion compensation module 24 is limited to obtaining differences between two compared images, and the following process for obtaining a compensation image is resumed by a discrete cosine transform (DCT) block 52 and a quantization module 54.
In intra-coding of a macroblock, the intra mode estimation module 32 functions to select the optimum intra prediction mode by performing prediction on adjacent blocks. The intra prediction module 34 functions to obtain an intra-predicted compensation image from previously coded adjacent blocks using the selected intra prediction mode. The intra mode estimation module 32 performs a similar function to the motion estimation module 22 in inter mode and thus is referred to as a motion estimation module in intra mode. And, the intra prediction module 34 performs a similar function to the motion compensation module 24 in inter mode and thus is referred to as a motion compensation module in intra mode.
The DCT module 52 perfumes 4×4 DCT, the quantization module 54 quantizes coefficients transformed by the DCT 52, and the IDCT module 56 and the dequantization module 58 respectively perform the reverse of operations performed by the DCT module 52 and the quantization module 54.
The operation result Dn′ of the IDCT module 56 is restored images that have not passed through the deblocking filter 92. The entropy encoding module 64 performs entropy coding using bit allocation based on the probability of the occurrence of quantized DCT coefficients. The deblocking filter module 92 functions to improve the quality of the restored images obtained through the IDCT module 56, and the improved-quality images are stored in the frame memory module 18 to be used as references for subsequently input images.
Unlike conventional MPEG-1, MPEG-2 and MPEG-4 standards, the H.264 standard has several reference images, and a plurality of previously encoded images as well as an immediately previous frame can be used as the reference images. This is called multiple reference frames.
Similar to the conventional MPEG standards, the H.264 standard performs encoding in slices including an I_slice, a P_slice, a B_slice, an SI_slice and an SP slice. For convenience of description, a slice can be regarded as a single frame. That is, the I_slice, the P_slice and the B_slice are almost the same as an I_picture, a P_picture and a B_picture of the conventional MPEG standards.
The H.264 standard defines an inter mode representing each macroblock making up currently input frame data by a moving vector and difference value with respect to previous frame data, and an intra mode representing each macroblock by a moving vector and a difference value with respect to the same frame data. According to macroblock size, P16×16, P16×8, P8×16, P8×8, P8×4, P4×8 and P4×4 modes exist in the inter mode, and I16×16 and I4×4 modes exist in the intra mode. An H.264 encoding apparatus selects a mode providing high compression efficiency due to a low cost.
Here, the method of selecting an optimal block mode is an RDO technique. A motion estimation and mode decision algorithm using RDO can improve a bit rate by 5 to 10% at a cost of 30 to 40% encoding speed.
Therefore, in general, the H.264 standard performs motion estimation and compensation for all the modes in the sequence illustrated in FIG. 2 (alternatively, an I16×16 mode may be processed after B-slice check), calculates compensation costs, and compares the calculated compensation costs to determine an optimum mode for received frame data.
FIG. 3 is a conceptual diagram illustrating an encoding process performed by an H.264 moving picture encoding apparatus using conventional RDO. All the blocks illustrated in FIG. 3 can be embodied as separate pieces of hardware, but this increases hardware load. Generally, at least two of blocks 1200 to 1700 for mode decision are embodied as sequential operations of one piece of hardware. In this sense, FIG. 3 is merely a conceptual diagram. In FIG. 3, a motion estimation block and a motion compensation block may have structures shown in FIGS. 4 and 5, and an encoding block 1100 may have the structure of the H.264 encoder shown in FIG. 1.
In FIG. 3, a B_slice check block 1200 is directed to calculating an estimation value in a skip mode processing a previous frame and a next frame by division blocks having the zero vector for a moving vector. Three inter mode prediction blocks 1300, 1400 and 1500 respectively operating according to P16×16, P16×8/P8×16 and P8×8 or less modes are directed to calculating inter mode-specific prediction bit values from the continuous parts of a moving picture. Two intra prediction blocks 1600 and 1700 respectively operating according to I16×16 and I4×4 modes are directed to calculating intra mode-specific prediction bit values from the non-continuous parts of a moving picture. Alternatively, the P8×8 mode may be further classified into P8×8, P8×4, P4×8 and P4×4 modes to be processed.
As illustrated in FIG. 3, in the moving picture encoding apparatus according to conventional art, with respect to input frame data, the 6 prediction bit calculators 1200 to 1700 calculate prediction bit values for 7 modes, respectively (the 16×8/8×16 mode integer-predictor calculates two prediction bit values for 16×8 and 8×16 modes).
A mode decision block 1900 examines the 7 prediction bit values and selects the most appropriate mode, and the final encoding block 1100 converts input frame data according to the determined mode.
It can be seen in FIGS. 3 to 5 that when the standard H.264 moving picture encoding apparatus shown in FIG. 1 uses RDO, RDO is substantially performed once per block mode. In addition, in motion estimation with respect to 5 reference images, RDO must be performed once per reference image, thus significantly increasing the amount of calculation for moving picture encoding.
When each block is embodied as a separate hardware module, hardware cost goes up. On the other hand, when the blocks are embodied in one hardware module for prediction value calculation, the hardware module performs predicted value calculation 8 times (once for each of the 7 modes and once for a determined mode). Consequently, it can be seen that the amount of calculation for moving picture encoding is considerably large.