The invention relates to methods and devices for controlling quantization scales of video signals, and more particularly, to methods and devices for controlling quantization scales of video signals in a video encoding device.
According to MPEG (moving picture coding experts) 2 standard, an image is compressed by eliminating spatial redundancies by chrominance sampling, discrete cosine transform (DCT) and quantization, and eliminating temporal redundancies due to similarity between frames by motion compensation (MC).
Generally, there are chromatic or geometrical similarities between frames. In order to eliminate spatial redundancies, it is required to find out important elements and remove those elements that are less important. According to experiments, the human eye is more sensitive to luminance than chrominance. Thus, the MPEG 2 standard, which symbolizes luminance (a.k.a. luma) with Y and chrominance (a.k.a. chroma) with Cr and Cb, reduces signal volume by decreasing chrominance sampling. The MPEG 2 standard defines three sampling modes—4:2:0, 4:2:2, and 4:4:4—which represent three different chroma sampling frequencies. For instance, 4:2:0 mode means sampling 4 Y blocks (each block 8*8 pixels), 1 Cr block (8*8 pixels) and 1 Cb block (8*8 pixels) from a macro block the size of 16*16 pixels. By reducing the chroma sampling frequency, data compression can be optimized.
Video data is actually a continuous series of still frames, which are perceived as a moving picture due to the persistence of images in the vision of human eyes. The frames have a very short time interval, and there is only small difference between neighboring frames. Therefore, the MPEG 2 standard eliminates temporal redundancies due to similarity between frames by motion compensation (MC). The method described above is well known by a person skilled in the art.
Please refer to FIG. 1 showing a conventional video encoder 10. The video encoder 10 includes a DCT device 12, a motion estimator and compensator 14, a quantizer 16, a variable length encoder (VLE) 18, and a rate controller 20. The video encoder 10 utilizes the DCT device 12 and the quantizer 16 to eliminate spatial redundancies and the motion estimator and compensator 14 to eliminate temporal redundancies in order to compress the digital video data. The compressed data is then encoded by the VLE 18 and sent to a system multiplexer (not shown) to output the data in a transport stream or program stream as defined by the MPEG 2 standard.
The DCT device 12 performs DCT operation to every block (each block includes 8*8 pixels) sampled from a macro block in order to transform the video data from a spatial domain to a frequency domain. The DCT operation is a completely reversible mathematical operation. A DCT coefficient obtained after transforming the chroma of the block remains as an 8*8 two-dimensional matrix. Generally, there is hardly any intensive change of colors in a frame; thus the DCT coefficient standing for higher spatial frequency in the matrix is small or even 0. Basically, DCT operation does not reduce data volume but instead transforms the data in a format where redundancies can be more easily found.
Subsequently, the quantizer 16 quantizes the DCT coefficient to further compress the video data. Quantization is to reduce the description of the bit number of each coefficient; that is, each coefficient is described in a less precise unit. Quantization makes a value close to 0 become 0 and reduces the distribution of coefficients that are not 0 for a better performance on data compression. Quantization is a sort of damaging compression, which means the data quantized is not the same as the original. Therefore, the distortion resulting from compression is dependent on the selection of quantization scale.
The rate controller 20 in the video encoder 10 is for adjusting the quantization scale of the quantizer 16 according to a predetermined output bit rate range of the video encoder 10. The rate controller 20 adjusts the quantization scale for a macro block; that is, every block sampled from a macro block has the same quantization scale.
After quantization by the quantizer 16, the video encoder 10 connects those DCT coefficients in the quantized two-dimensional matrix serially, turning it into a one-dimensional series by way of a specific algorithm, to create the series with the longest length of continuous 0's in order to optimize the data compression. Subsequently, the VLE 18 compresses the one-dimensional series to output a compressed bit stream, which is called an encoded bit stream.
Lowering the quantization scale of the quantizer 16 lowers the compression rate, which increases the image quality, but increases the bit rate of the encoded bit stream output by the VLE 18. In contrast, increasing the quantization scale of the quantizer 16 increases the compression rate, which lowers the image quality, but lowers the bit rate of the encoded bit stream output by the VLE 18. In the encoded bit stream, each quantization scale applied during encoding is represented by, for example, 7–8 bits. These bits are used for decoding the encoded bit stream.
Along with the description above, in the case that the output bit rate of the video encoder 10 is to be kept in a predetermined range, if the quantization scale varies too frequently, there may be a large number of bits used to record quantization scales in the encoded bit stream output by the video encoder 10. For instance, in the NTSC standard, there are 30 frames per second, and each frame has 1350 macro blocks. In this case, if the output bit rate of the video encoder 10 is 2 Mbps, the share of the output bit rate for each compressed macro block is only 49.3 bits. If the rate controller 20 varies the quantization scale for every macro block, around 14% (7/49.3=0.14) of the output bit rate is used for recording the quantization scale. Such arrangement of the bit rate is not efficient.
Therefore in the conventional video encoder 10, the more frequent the rate controller 20 varies the quantization scale, the more bits are wasted on recording the quantization scale, so the remaining bit rate available for recording video data itself is limited and the image quality is not satisfactory.