1. Field of the Invention
The present invention relates to digital video encoding. More specifically, the present invention relates to methods of video encoding using variable bit rate to improve the video quality of an encoded video stream.
2. Discussion of Related Art
Due to the advancement of semiconductor processing technology, integrated circuits (ICs) have greatly increased in functionality and complexity. With increasing processing and memory capabilities, many formerly analog tasks are being performed digitally. For example, images, audio and even full motion video can now be produced, distributed, and used in digital formats.
FIG. 1(a) is an illustrative diagram of a digital video stream 100. Digital video stream 100 comprises a series of individual digital images 100_0 to 100_N, each digital image of a video stream is often called a frame. For full motion video a video frame rate of 30 images per second is desired. As illustrated in FIG. 1(b), a digital image 100_Z comprises a plurality of picture elements (pixels). Specifically, digital image 100_Z comprises Y rows of X pixels. For clarity, pixels in a digital image are identified using a 2-dimensional coordinate system. As shown in FIG. 1(b), pixel P(0,0) is in the top left corner of digital image 100_Z. Pixel P(X-1,0) is in the top right corner of digital image 100_Z. Pixel P(0,Y-1) is in the bottom left corner and pixel P(X-1, Y-1) is in the bottom right corner. Typical image sizes for digital video streams include 720×480, 640×480, 320×240 and 160×120.
FIG. 2 shows a typical digital video system 200, which includes a video capture device 210, a video encoder 220, a video channel 225, a video decoder 230, a video display 240, and an optional video storage system 250. Video capture device 210, typically a video camera, provides a video stream to video encoder 220. Video encoder 220 digitizes and encodes the video stream and sends the encoded digital video stream over channel 225 to video decoder 230. Video decoder 230 decodes the encoded video stream from channel 225 and displays the video images on video display 240. Channel 225 could be for example, a local area network, the internet, telephone lines with modems, or any other communication connections. Video decoder 230 could also receive a video data stream from video storage system 250. Video storage system 250 can be for example, a video compact disk system, a hard disk storing video data, or a digital video disk system. The video stream from video storage system 250 could have been previously generated using a video capture device and a video encoder. However, some video streams may be artificially generated using computer systems.
A major problem with digital video system 200 is that channel 225 is typically limited in bandwidth. As explained above a full-motion digital video stream can comprise 30 images a second. Using an image size of 640×480, a full motion video stream would have 18.4 million pixels per second. In a full color video stream each pixel comprises three bytes of color data. Thus, a full motion video stream would require a transfer rate in excess of 52 megabytes a second over channel 225. For internet application most users can only support a bandwidth of approximately 56 Kilobits per second. Thus, to facilitate digital video over computer networks, such as the internet, digital video streams must be compressed.
Most video compression algorithms, such as MPEG2 and MPEG4, reduce the bandwidth requirement of a digital video stream by not sending redundant information across channel 225. For example, as shown in FIG. 3, a digital video stream includes digital image 301 and 302. Digital image 301 includes a video object 310_1 and video object 340_1 on a blank background. Digital image 302 includes a video object 310_2, which is the same as video object 310_1, and a video object 340_2, which is the same as video object 340_1. Rather then sending data for all the pixels of digital image 301 and digital image 302, a digital video stream could be encoded to simply send the information that video object 310_1 from digital image 301 has moved three pixels to the left and two pixels down and that video object 340_1 from digital image 301 has moved one pixel down and four pixels to the left. Thus rather than sending all the pixels of image 302 across channel 225, video encoder 220 can send digital image 301 and the movement information, usually encoded as a two dimensional motion vector, regarding the objects in digital image 301 to video decoder 230. Video decoder 230 can then generate digital image 302 using digital image 301 and the motion vectors supplied by video encoder 220. Similarly, additional digital images in the digital video stream containing digital images 301 and 302 can be generated from additional motion vectors.
However, most full motion video streams do not contain simple objects such as video objects 310_1 and 340_1. Object recognition in real life images is a very complicated and time-consuming process. Thus, motion vectors based on video objects are not really suitable for encoding digital video data streams. However, it is possible to use motion vector encoding with artificial video objects. Rather than finding distinct objects in a digital image, the digital image is divided into a plurality of macroblocks. A macroblock is a number of adjacent pixels with a predetermined shape and size. Typically, a rectangular shape is used so that a rectangular digital image can be divided into an integer number of macroblocks. FIG. 4 illustrates a digital image 410 that is divided into a plurality of square macroblocks. For clarity, macroblocks are identified using a 2-dimensional coordinate system. As shown in FIG. 4, macroblock MB(0,0) is in the top left corner of digital image 410. Macroblock MB(X-1,0) is in the top right corner of digital image 410. Macroblock MB(0,Y-1) is in the bottom left corner and macroblock MB(X-1, Y-1) is in the bottom right corner. Calculations of motion vectors is well known in the art and not an integral part of the present invention. Congratulation.
In general frames produced using motion estimation using only preceding frames are called Predicted Frames (P Frames). Frames produced using motion estimation using both preceding and succeeding frames are called bi-directional (B Frames). Frames that do not use information from preceding or succeeding frames are called Intra frames (I Frames). In terms of data size, intra frames require more data than predicted frames, which require more data than bi-directional frames. However, the quality of each succeeding image calculated using motion estimation degrades. Thus, an encoded video stream are typically arranged as multiple groups of pictures (GOPs). Each group of pictures can be decoded without reference to another group of picture. Thus each group of pictures starts with an intra frames and may include additional intra frames spaced periodically throughout the group of pictures to maintain picture quality.
FIG. 6 is a simplified block diagram of a typical video encoder 600. Video encoder 600 includes a motion estimation unit 610, a discrete cosine transformation unit 620, quantizer 630, and a run-length coder 640. Motion estimation unit 610 performs motion estimation on in input video stream I_VS to generate predicted frames and bi-directional frames. Motion estimation unit 610 typically includes an embedded decoder to insure that the encoding can be properly decoded. Discrete cosine transformation unit 620 transform each frame into the frequency domain which provides more efficient data storage for video streams. Quantizer 630 reduces the magnitude of the transform coefficients of each frame to reduce the amount a data required for each frame. The quantization step is a “lossy” operation in that the original frame information can not be reproduced from the quantized transform coefficients. The amount of quantization can be controlled by adjusting a frame quantization parameter F_MQUANT. The quantized coefficients are then run-length encoded to form an encoded video stream E_VS. The size and therefore the bit rate of encoded video stream E_VS can be controlled using frame quantization parameter F_MQUANT.
As explained above video compression is generally needed because channel 225 (FIG. 2) has limited bandwidth. Thus some video encoders use a constant bit rate (CBR) scheme so that the encoded video stream can be transferred across channel 225. However, constant bit rate compression generally causes varying picture quality especially when comparing a complex picture with a simple picture or during scene changes, which would not have the benefits of motion estimation. Another approach is to use a variable bit rate and buffering to fully utilize channel 225. The average of the variable bit rate over time must be close to a target bit rate that channel 225 can handle. Using variable bit rate, more bits that are not necessary for simple pictures to achieve a desired picture quality can be saved for use on complex pictures. Thus, in theory the overall picture quality of the encoded video stream can be improved using variable bit rate encoding. However, allocating the bits to each frame to achieve the desired picture quality level can be very complex. Hence there is a need for a method or system to control the frame quantization parameter to allocate bits to the frames of a video stream to achieve high picture quality.