1. Industrial Field of Utilization
The present invention relates to very low bit rate video and associated audio coding, specifically to the dynamic allocation of video bit rate according to the instant bit rate consumption of audio signal in a constant bit rate system.
2. Background and Prior Art
Digital video and associated audio coding plays a key role in the industrial applications of digital signal storage, processing, transmission and distribution. Various digital coding standards have been and are being developed by various standardization bodies. For the moving picture and associated audio coding, typical coding scheme involve three parts, namely, video coding, audio coding and system multiplexing.
The current transform coding algorithms adopted by the MPEG phase one and phase two standards involve such techniques as transformation, quantization and variable length coding. To increase the coding efficiency, predictive coding techniques such as inter picture prediction, motion estimation and compensation are used. Therefore, a picture could be encoded by reducing its spatial redundancy within the picture itself. A picture coded in this way is usually referred to as an I picture. A picture could also be encoded in inter picture mode. If a picture is predicted only from the previous picture, it is referred to as a P picture. If a picture is predicted both from the previous picture and future picture, the picture is referred to as a bi-directional predictive coded picture, which is usually called a B picture. A main feature of the current video coding scheme is that the bit rate of the coded bitstream is a fixed constant by using a FIFO buffer. The fullness of the buffer is used to adjust the quantization step and bits available for the I, B and P picture so as to control the bit rate of each coded picture.
Many audio coding algorithms have emerged with the introduction of digital communication links, mobile communication, entertainment and multimedia services. In particular, various very low bit rate speech coding algorithms have been standardized to reduce transmission bit rate or the memory capacity of the voice storage systems. Toll quality speech could be obtained from the recently standardized CCITT G.728 16 kbits/s coder. Communications quality could be obtained using the USA Federal Standard 1016 4.8 kbits/s voice coder. The Vector Sum Excited Linear Predictive Coder standardized for North American and Japan cellular communications could provide near toll quality at 8.0 kbits/s operation. These algorithms generally fall into the class of speech coders known as Code Excited or Vector Excited Linear Prediction coders and have been typically designed for constant bit rate transmissions. The number of bits per audio frame is also kept constant.
Very limited variable bit rate for speech coding has been considered for packet switching networks, digital speech interpolation systems and digital communication multiplication equipment systems.
The main function of the system encoder is to provide the necessary and sufficient information to synchronize the decoding and presentation of video and audio information, and at the same time ensuring that coded data in the decoders' buffer does not overflow or underflow. Coding the system layer information includes packetizing the data into packets and creating time-stamps for the packet header. Two time stamps are used: presentation time stamp (PTS) which indicates when the presentation unit of an audio frame or video picture should be play or broadcast, and decoding time stamp (DTS) which indicates the time to decode an audio or video picture. The PTS and DTS have a common time base, called system clock reference (SCR), to unify the measurement of the timing, ensure correct synchronization and buffer management. In a fixed constant bit rate environment, the system expects a fixed bit rate audio and video for the correct buffer management.
The coding scheme described above has following problems that prevent us from achieving a very high compression ratio and coding efficiency, which are the key factors for various very low bit rate video and associated audio coding applications.
The inefficiency of current coding scheme arises from:
1) Constant audio bit rate
The current audio coding scheme encodes an audio signal at a constant bit rate. If we define the audio information as a complexity measure which reflects the signal intensity and frequency distribution, we find the complexity varies with time. For example, considering the videophone application, when one party to the communication is talking, the other party is usually listening and silent. This mean that there are moments when the listener does not make any voice input. Secondly, there are silent moments existing even when a person is talking. A constant audio encoder wastes the bandwidth during those silent moments. For very low bit rate coding applications, it is possible that the audio occupies similar or even higher bandwidth than the video. The problem to be solved here is to use a variable bit rate audio encoder for audio coding and save the bits for video coding use.
2) Constant video bit rate
The current video coding scheme provides a constant bitstream output by using a FIFO buffer at the end of the encoder. The instant bit usage and buffer fullness are used to adjust the bits for each picture and the quantization step. The latter is used to control the bit rate of the next encoded macroblock within the picture. This bit rate control process is done within the video encoder itself, which is independent of the bit usage of the audio encoder. How to make use of the bit saving from the audio encoder to improve the video coding quality is the problem to be solved.
3) Video dynamic bit rate control
The current video coding scheme implements bit rate control by allocating a certain number of bits to each picture and also adjusting the quantization step for each macroblock of a picture. There are cases when there are sufficient bits available that are more than enough to adjust the quantization step to its minimal value. Under such situation, the extra bits could not be efficiently used. Another problem is whether the minimal quantization step is the best way to encode a picture in the predetermined encoding mode, i.e., I picture, P picture and B picture. Especially when the current picture is set to be encoded in B picture, will the smaller quantization step give better coded picture quality than a P picture or even an I picture if the amount of bits available allows us to do either a P picture or an i picture coding? The above discussion could be concluded as a problem of how to dynamically select I, P and B picture coding modes.
A second problem under this item relates to very low bit rate coding where the picture rate of the coding scheme is usually less than the rate required by real time video. For the videophone application, as an example, the picture rate is usually set around 10 pictures per second. Because of this reduced picture rate, the jig effects, would appear if the objects in the picture move too fast. This problem also makes the predictive coding difficult when a scene is changed. How to efficiently make use of the available bits to insert a picture to the pre-determined picture rate is another issue to be addressed to improve the picture coding quality.
4) System multiplexer
Current system multiplexer accepts only the constant bit rate audio and video bitstreams to multiplex them into a constant bit rate system bitstream. There is no mechanism to control the audio and video encoder for the dynamic bandwidth allocation between the two encoders.