The past few years have witnessed a great popularity of digital and online videos and their applications. With the emergence of fast communication technologies and multimedia applications, digital video codecs are used in many areas and systems, such as in DVDs (Digital Video Disc) employing the MPEG-2 (Moving Picture Experts Group-2) format, in VCDs (Video Compact Disc) employing the MPEG-1 (Moving Picture Experts Group-1) format, in emerging satellite and terrestrial broadcast systems, and on the Internet.
More specifically, this popularity of video applications lead to interesting developments in video codecs, which compress and decompress video data. In video data compression, a balance is kept between the video quality and the compression rate, i.e. the necessary transmitted quantity of data or, in other words, the bitrate needed to represent the video.
In addition, the complexity of the encoding and decoding algorithms, the robustness to data losses and errors, the state of the art of compression algorithm design, end-to-end delay in a videoconference application, etc. are also considered.
A plurality of video coding standards exist, each of them being specially designed for a particular type of application. For example, the H.263 standard, published by the ITU (International Telecommunications Union) is a video coding and compression standard for low bitrates, such as in the range of 20-30 kbps (kilobits per second). More specifically, this standard supports video coding in video-conferencing and video-telephony applications.
The H.263 standard specifies the format and content of the encoded stream of data; therefore it sets the requirements for the encoder and decoder to meet without specifically providing a design or structure for the encoder and decoder themselves.
In video compression, each picture is represented by typically two kinds of pictures, commonly referred to as frames, i.e. the Intra frames and Inter frames. Furthermore, the Inter frames are separated into two categories, i.e. the P-frames (Predictive frames) and B-frames (Bi-predictive or Bi-directional frames). An Intra frame represents a whole picture coded independently of any other picture; therefore they are bandwidth consuming since the content of the whole picture must be encoded. In order to compress and therefore save bandwidth, only differences between whole pictures (or Intra frames) are encoded and then transmitted. Those differences are represented by the P-frames and the B-frames. For example, the background between two consecutive pictures usually does not change and, therefore, such background does not need to be encoded again. The B-frames distinguish from the P-frames in that they are bi-directional and thus perform a bi-directional prediction, i.e. a prediction from the previous and next pictures.
Furthermore, when compressing videos, a picture is divided into macroblocks (MB) for processing purposes. Indeed, processing is performed macroblock by macroblock. Each macroblock generally represents a block of 16 by 16 pixels.
A video encoder generally includes a motion estimation module, a motion compensation module, a DCT (Discrete Cosine Transform) module, and a quantizing module.
The motion estimation module allows for predicting which areas of a previous frame have been moved into the current frame so that those areas do not need to be re-encoded.
The motion compensation module allows for compensating for the movement of the areas from the previous frame into the current frame.
DCTs are generally used for transforming a block of pixels into “spatial frequency coefficients”. They operate on a two-dimensional block of pixels, such as a macroblock. Since DCTs are efficient at compacting pictures, generally a few DCT coefficients are sufficient for recreating the original picture.
The quantizing module is provided for quantizing the DCT coefficients. For example, the quantizing module sets the near zero DCT coefficients to zero and quantizes the remaining non-zero DCT coefficients.
One of the limitations in video coding comes from the capacity of the communication channels. Indeed, communication channels are limited by the number of bits that they can transmit per second. In many channels, the bitrate is constant, such as in ISDN (Integrated Services Digital Networks), POTS (Plain Old Telephone Service), etc.
However, depending on the efficiency of the algorithms used to compress the videos and the motion complexity of those videos, the bit budget and the bitrate needed for encoding and transmitting encoded videos may vary and, in particular, increase. Therefore rate control is needed so as to adjust the bitrate required for encoding videos of various complexities to the bitrate of the communication channel used to transmit these encoded videos.
The current rate control algorithm used in the H.263 standard is called the TMN8 (Test Model Near-Term version 8). Generally stated, this rate control algorithm ensures that only an average bitrate is met. This algorithm cannot control both an average target bitrate and a maximum bitrate.
The article entitled “Rate Control in DCT Video Coding for Low-Delay Communications”, by Jordi Ribas-Corbera, 1999, hereinafter referred to as Reference 1, discloses an algorithm used by the rate control TMN8 to ensure that the target average bitrate, related to a target frame size, is met by each frame. More specifically, the TMN8 rate control algorithm computes some image statistics to determine some proper QP (Quantization Parameter) values and update them for each Intra frame so as to meet the target frame size. Unfortunately, this control is very approximate and often the resulting frame size can be significantly larger or smaller than the target frame size.
Furthermore, in the TMN8 rate control algorithm, when the given target bitrate is exceeded, the encoder will skip a certain number of frames so as to compensate for the overflow. Of course, by so doing, the quality of the communication and video is altered.
Another rate control method, such as the maximum birate-based rate control, shows improvements over the TMN8 rate control. The method is described in “An improved video rate control for video coding standards”, by Stéphane Coulombe, 2007, PCT/CA2007/002242, hereinafter referred to as Reference 2. This maximum bitrate-based rate control method is structured to meet both with average and maximum bitrates. However, this maximum bitrate-based rate control method is derived from the particular definition of the maximum bitrate. More particularly, the maximum bitrate is defined as the maximum amount of bits that can be transmitted within one second. Applications such as video streaming do not follow this definition.
In applications such as video streaming, basic buffer-based rate control methods can be used and show improvements over the TMN8 rate control. Such buffer-based rate control method is presented in Reference 2. A basic buffer-based rate controller allocates a large amount of bits to the Intra frame and then distributes the unused bits from the encoding of the Intra frame over the following Inter frames in order to optimize the size of the video buffering verifier over a certain number of frames. However, even though it was shown that the basic buffer-based rate controller worked well for several video sequences, it exhibited some problems with sequences including many motion and scene changes, such as those found in movie and video trailers. In those sequences, allocating a large amount of bits to or around the Intra frames and allocating a near constant amount of bits to the remaining frames was not a good strategy. The Video Buffer Verifier (VBV) is a model of a hypothetical decoder buffer that should not overflow or underflow when fed with a conforming video bit stream. In the present invention the video buffer verifier will refer to the VBV in the case of MPEG-4 coding (see Annex D of the MPEG-4 video coding standard), the Hypothetical Reference Decoder of H.263 (see Annex B of H.263 standard) or any other buffer model of a hypothetical decoder.
In the article of Bo Xie and Wenjun Zeng entitled “A sequence-based rate control framework for consistent quality real-time video”, in the IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, pp. 56-71, 2006, Xie and Zeng exploit a frame complexity metric, the mean absolute difference (MAD) in a buffer-based video rate controller. Their new rate control framework achieves more consistent quality across video sequences. Their method is a sequence-based (as opposed to GOP (Group of Pictures)-based) bit allocation model to track the non-stationary characteristics in the video source. They showed that their proposed rate control solution can produce significantly better PSNR (Peak Signal-to-Noise Ratio) performance (in terms of both average value and consistency across scenes) as well as temporarily smoother video with less quality flicker and motion jerkiness than MPEG-4 Annex L frame-level rate control. Xie and Zeng claim that their proposed rate control solution is robust against various sequences, bit rates and frame rates, and has been used in commercial products. However, they do not consider the buffer level in their method except from typical checks for buffer overflow and underflow. Not acting based on actual buffer level can lead to an increased number of dropped frames. For instance, when the buffer level is high, the actual coded frame has a greater probability to create an overflow and thus leading to a dropped frame. Xie and Zeng do not use either the position of Intra frames (when they come at regular intervals) nor take into account a maximum bitrate as well as an average bitrate in their rate control method; they consider only a maximum bitrate equal to the average bitrate.
Therefore, there is still a need for overcoming the above discussed problems related to rate control in video sequences including a lot of motion and scene changes. Accordingly, buffer-based device and method capable of improving rate control are sought.