The present invention relates to an improved system and method for digital video processing. More particularly, the present invention relates to dual pass encoding rate control algorithms with look-ahead capabilities for handling unpredictable changes in video statistics, including special events such as scene changes, dissolves, fades, flashes, explosions, jerky motion, and the like. In addition, the present invention also relates to an improved statistical multiplexing algorithm having look-ahead capabilities for the handling of special events. Also disclosed is a fine-tuning bit rate control algorithm for dynamically adjusting, at the macroblock level, the quantizer level during encoding of a picture.
Digital television offers viewers high quality video entertainment with features such as pay-per-view, electronic program guides, video-on-demand, weather and stock information, as well as Internet access and related features. Video images, packaged in an information stream, are transmitted to the user via a broadband communication network over a satellite, cable, or terrestrial transmission medium. Due to bandwidth and power limitations, efficient transmission of film and video demands that compression and formatting techniques be extensively used. Protocols developed by the Motion Pictures Experts Group (MPEG) such as MPEG-1 and MPEG-2 maximize bandwidth utilization for film and video information transmission by adding a temporal component to a spatial compression algorithm.
Rate control is critical during encoding and transcoding of digital video programs in a multi-program transmission environment, where several programs are multiplexed and transmitted over a single communication channel. Since these programs share a limited channel capacity, the aggregate bit rate of the programs must be no greater than the communication channel rate. A goal of such bit rate adjustment is to meet the constraint on the total bit rate of the multiplexed stream, while also maintaining a satisfactory video quality for each program.
Commonly, it is necessary to adjust a bit rate of digital video programs that are provided, e.g., to subscriber terminals in a cable television network or the like. For example, a first group of signals may be received at a headend via a satellite transmission. The headend operator may desire to forward selected programs to the subscribers while adding programs (e.g., commercials or other content) from a local source, such as storage media or a local live feed. Additionally, it is often necessary to provide the programs within an overall available channel bandwidth. It may also be desired to change the relative quality level of a program by allocating more or fewer bits during encoding or transcoding.
Accordingly, the statistical multiplexer (statmux), or encoder, which includes a number of encoders for encoding uncompressed digital video signals at a specified bit rate, has been developed. The statistical remultiplexer (statremux), or transcoder, which handles pre-compressed video bit streams by re-compressing them at a specified bit rate, has also been developed. Moreover, functions of a statmux and statremux may be combined when it is desired to transcode pre-compressed data while also coding uncompressed data for transport in a common output bitstream. Uncompressed programs are coded for the first time, while compressed programs are re-encoded, typically at a different bit rate.
These statmux and statremux devices evaluate statistical information of the source video that is being encoded, and allocate bits for coding the different video channels accordingly. For example, video channels that have hard-to-compress video, such as a fast motion scene, can be allocated more bits, while channels with relatively easy to compress scenes, such as scenes with little motion, can be allocated fewer bits.
For MPEG applications, a statmux or statremux must accommodate three different picture or frame types (i.e. I, P and B frames), which usually require quite different numbers of bits because of the different nature of their temporal processing. Each individual image in a sequence of images on film or video is referred to as a frame. Each frame is made up of a large number of picture elements (pixels) that define the image. Within each frame, redundant pixels describe like parts of a scene, e.g. a blue sky. Various types of compression algorithms have been used to remove redundant spatial elements thereby decreasing the bandwidth requirements for image transmission. Sequences of frames on film or video often contain pixels that are very similar or identical. In order to maximize bandwidth utilization, compression and motion compensation protocols, such as MPEG, are typically used to minimize these redundant pixels between adjacent frames.
Frames referenced by an encoder for the purpose of predicting motion of images within adjacent frames are called anchor frames. These anchor frames can be of type Intra-frame (I-frame) or Predicted-frame (P-frame). Groups of pixels (macroblocks) that are mapped without reference to other frames make up I-frames, while P-frames contain references to previously encoded frames within a sequence of frames. A third type of frame referred to as a Bi-directional (B-frame) contains macroblocks referred from previously encountered frames and macroblocks from frames that follow the frame being currently analyzed. This entails a type of look-ahead scheme to describe the currently analyzed image in terms of an upcoming image. Both B-frame and P-frame encoding reduce duplication of pixels by calculating motion vectors associated with macroblocks in a reference frame, resulting in reduced bandwidth requirements. MPEG-2 encoding and MPEG-1 encoding differ in their support of frame slices. Slices are consecutive groups of macroblocks within a single row defined for a frame that can be individually referenced. Typically slices are of the same type, i.e. all P-frame encoded or all I-frame encoded. The choice of encoding type for a particular frame is dependent upon the complexity of that image.
In MPEG-2 digital video systems, the complexity of a video frame is measured by the product of the quantization level used to encode that frame and the number of bits used for coding the frame. This means the complexity of a frame is not known until it has been encoded. As a result, the complexity information always lags behind the actual encoding process, which requires the buffering of a number of frames prior to encoding, thereby adding expense and complexity.
Furthermore, selection of I-frame versus P-frame encoding protocol typically requires multiple encoding passes on a single frame to determine the complexity of the encoding. If a P-frame encoding results in a greater complexity than would be realized using I-frame encoding, then I-frame encoding would be selected. Ideally, an anchor frame should be coded twice in the first pass encoder to generate the complexity measure for both I and P cases, but computational overhead typically limits such an approach. From a bandwidth utilization viewpoint, it would be most effective to code for P-frames except where the image complexity would call for I-frame encoding, e.g. at scene changes. One problem with requiring multiple encoding passes on a single frame is the increased computational complexity introduced, thereby reducing the throughput of the encoder. Another problem with this approach is the inherent inefficiency of having to encode the same frame twice.
Commonly assigned co-pending patent application Ser. No. 09/929,983 entitled “First Pass Encoding of I and P Frame Complexity For Compressed Digital Video”, filed Aug. 15, 2001, incorporated herein and made a part hereof by reference, discloses an improved complexity encoding system with effective scene change detection. Co-pending patent application Ser. No. 09/929,983 discloses encoding methods and apparatus for alternately encoding both I-frame and P-frame macroblocks within a single frame. By doing so, both I and P encoding complexity can be computed without encoding the same frame twice. This arrangement allows the I-frame decision to be made at the second pass encoder instead of at the first pass encoder, thus taking advantage of a look-ahead pipeline to more effectively align the I-frames with scene changes. This method also reduces the computational encoding complexity.
It would be advantageous to provide methods and apparatus for improving rate control and statistical multiplexing using the first pass encoding process of the dual pass encoding scheme as disclosed in the aforementioned co-pending application in order to improve the handling of special events in video sequences. In particular, it would be advantageous to combine statistics from first pass encoding and second pass encoding to improve rate control. It would be further advantageous to selectively sum the bit count (complexity measure) of the frames in the look-ahead pipeline of a dual pass encoder to generate the need parameter in order to ensure that video quality does not deteriorate during transition from complex video to simple video. It would be advantageous when selectively summing the complexity measurements of the frames in the look-ahead pipeline to apply a weighting to the bit count (complexity measure) for these frames such that frames closer to the current frame receives a higher weighting. It would be still further advantageous to scale the statmux need parameter by a Scene Change Multiplier in order to improve the video quality of flashes when multiple scene changes are detected in the look ahead pipeline. Finally, it would be advantageous to provide for dynamic fine-tuning of the bit rate at a macroblock level during encoding of a picture to improve video quality.
The methods and apparatus of the present invention provide the foregoing and other advantages.