An ensemble of video encoder systems is illustrated in FIG. 1. The ensemble 10 comprises a plurality of video encoder systems 20-i, i=0, 1, 2, . . . , P-1. Each encoder system receives uncompressed video at an input 22-i and outputs a compressed digital video bitstream at an output 24-i. Illustratively, each encoder system 20-i outputs a compressed digital bit stream which complies with the syntax specified in the MPEG-2 video specification (ISO.backslash.IEC13818-2; Recommendation ITU-TH.262 (1995E) produced by ISO, the contents of which are incorporated herein by reference).
The digital bit streams output by the encoder systems 20-i are multiplexed together by the system multiplexer 30 and the multiplexed bitstream is transmitted to one or more decoders via the common channel 40. A system controller 50 controls the system multiplexer 30. The system controller 50 allocates a fraction of the bandwidth of the channel 40 to each of the encoder systems. The system controller 50 may be implemented as a microprocessor or microcontroller. The present invention relates to a novel method for allocating the bandwidth of the channel 40 among the encoder systems 20-i.
A video encoder system 20 is illustrated in FIG. 2. The system 20 includes a preprocessor 14, a video encoder 16, a rate buffer 18 and a rate controller 19. The preprocessor 14 receives a digital video signal comprising a sequence of frames from a source 12. The source 12 of video is, for example, a video camera, or a telecine machine which converts a sequence of film images into a sequence of video frames, or other device which outputs a sequence of video frames. The preprocessor 14 performs a variety of functions to place the sequence of video frames into a format in which the frames can be compressed by the encoder. For example, in case the video source is a telecine machine which outputs 30 frames per second, the preprocessor 14 converts the video signal into 24 frames per second for compression in the encoder 16 by detecting and eliminating duplicate fields produced by the telecine machine.
In addition, the preprocessor may spatially scale each frame of the source video so that it has a format which meets the parameter ranges specified by the encoder 16.
The video encoder 16 is preferably an encoder which utilizes a video compression algorithm to provide an MPEG-2 compatible bit stream. The MPEG-2 bit stream has six layers of syntax. There are a sequence layer (random access unit, context), Group of Pictures layer (random access unit, video coding), picture layer (primary coding layer), slice layer (resynchronization unit), macroblock (motion compensation unit), and block layer (DCT unit). A group of pictures (GOP) is a set of frames which starts with an I-frame and includes a certain number of P and B frames. The number of frames in a GOP may be fixed. A macroblock in a video frame illustratively comprises four 8.times.8 pixel blocks of luminance information and two 8.times.8 pixel blocks of chrominance information.
The encoder distinguishes between three kinds of frames, (i.e., pictures) I, P, and B. The coding of I frames results in the most bits. In an I-frame, each macroblock is coded as follows. Each 8.times.8 block of pixels in a macroblock undergoes a DCT (discrete cosine) transform to form an 8.times.8 array of transform coefficients. The transform coefficients are then quantized with a variable quantizer matrix. The resulting quantized DCT coefficients are zig-zag scanned to form a sequence of DCT coefficients. The DCT coefficients are then organized into run, level pairs. The run, level pairs are then entropy encoded. In an I-frame, each macroblock is encoded according to this technique. It should be noted that the quantizer matrix used to quantize each macroblock is multiplied by a scale factor which can vary from one macroblock to the next.
In a P-frame, a decision is made to code each macroblock as an I macroblock, which is then encoded according to the technique described above, or to code the macroblock as a P macroblock. For each P macroblock, a prediction of the macroblock in a previous video frame is obtained. The prediction is identified by a motion vector which indicates the translation between the macroblock to be coded in the current frame and its prediction in the previous frame. (A variety of block matching algorithms can be used to find the particular macroblock in the previous frame which is the best match with the macroblock to be coded in the current frame. This "best match" macroblock becomes the prediction for the current macroblock.) The predictive error between the predictive macroblock and the current macroblock is then coded using the DCT, quantization, zig-zag scanning, run, level pair encoding, and entropy encoding. In order to do predictive encoding of this type, the video encoder 16 inherently includes a decoder. This decoder decodes a frame which is compressed by the encoder. The decoded frame is then stored and used to make the motion compensated predictions described above.
In the coding of a B-frame, a decision has to be made as to the coding of each macroblock. The choices are (a) intracoding (as in an I macroblock), (b) unidirectional backward predictive coding using a subsequent frame to obtain a motion compensated prediction, (c) unidirectional forward predictive coding using a previous frame to obtain a motion compensated prediction, and (d) bidirectional predictive coding wherein a motion compensated prediction is obtained by interpolating a backward motion compensated prediction and a forward motion compensated prediction. In the cases of forward, backward, and bidirectional motion compensated prediction, the predictive error is encoded using DCT, quantization, zig-zag scanning, run, level pair encoding and entropy encoding.
B frames have the smallest number of bits when encoded, then P frames, with I frames having the most bits when encoded. Thus, the greatest degree of compression is achieved for B frames. For each of the I, B, and P frames, the number of bits resulting from the encoding process can be controlled by controlling the quantization step size. A macroblock of pixels or pixel errors which is coded using a large quantizer step size results in fewer bits than if a smaller quantizer step size is used.
After encoding by the video encoder, the bit stream is stored in the encoder rate buffer 18. Then, the encoded bits are transmitted via the system multiplexer 30 and channel 40 (see FIG. 1) to a decoder, where the encoded bits are received in a buffer of a decoder.
Each individual encoder system 20 has its own rate controller 19. The purpose of the rate controller 19 is to maximize the perceived quality of the encoded video sequence when it is decoded at a decoder by intelligently allocating the number of bits used to encode each frame. The sequence of bit allocations to successive frames preferably ensures that a channel bit rate assigned by the system controller 50 of FIG. 1 is maintained and that decoder buffer exceptions (overflow or underflow) are avoided. The process of allocating bits to individual frames takes into account the frame type (I, P or B) and scene dependent coding complexity. To accomplish rate control at each individual encoder system 20, the rate controller 19 receives input information indicating the occupancy of the rate buffer 18. The rate controller 19 executes a rate control algorithm and feeds back control signals to the encoder 16 (and possibly to the preprocessor 14, as well) to control the number of bits generated by the encoder for succeeding frames.
Several rate control algorithms are known. An illustrative rate control algorithm is disclosed in U.S. patent application Ser. No. 08/573,933 entitled RATE CONTROL FOR A VIDEO CONTROLLER, filed for Michael Perkins and David Arnstein on even date herewith and assigned to the assignee hereof. The contents of this related application are incorporated herein by reference.
Dynamic rate optimization or statistical multiplexing is a method for making more effective use of the bandwidth of a single communication channel through which several MPEG-2 or other digital video bitstreams are transmitted. A satellite transponder is an example of such a communication channel. Such a channel typically has a bandwidth of 40 MHz which is shared among 10-15 bitstreams. A dynamic rate optimizing or statistical multiplexing strategy dynamically allocates the channel bandwidth among several encoder systems that share the communication channel. Each encoder system 20-i transmits periodically (e.g., once per frame interval) a measure of video quality via a communication path to the system controller 50. The system controller 50 determines adjusted bitrate allocations for the individual video encoder systems 20. The system controller 50 transmits the adjusted bit rate allocations to the rate controllers of the individual encoder systems 20 in order to drive the video quality measurements to equality. Naturally, the sum of these bitrate allocations must equal the total channel bitrate. The rate controller at the individual encoder systems take into account the adjustments in allocation of channel bitrate when allocating bits to the frames that are encoded by the encoder systems.
In conventional video encoder ensembles, the quality measure outputted by each individual video encoder system to the system controller is a Peak-Signal-to-Noise Ratio (PSNR). The encoder can output this information because, as indicated above, it has available both the encoded and decoded frames. The system controller determines channel bandwidth allocation adjustments so as to equalize the average Peak-Signal-to-Noise Ratio across all of the video encoder systems. More bitrate is allocated to encoders with a small PSNR and less bit rate is allocated to encoders with large PSNR.
The problem with using PSNR as a quality measure is that the "masking effect" is not taken into account. A frame with a lot of visual complexity can hide coding artifacts from the viewer when the frame is decoded and displayed. It is well known that a quality measure in the form of a PSNR does not take into account the masking effect and does not take into account subjective image quality.
It is an object of the present invention to provide a dynamic rate optimizing or statistical multiplexing process for an ensemble of video encoder systems, which overcomes the shortcomings of the above-described prior art dynamic rate optimizing or statistical multiplexing process.
In particular, it is an object of the invention to allocate bandwidth to a plurality of video encoder systems which share a common transmission channel in a manner which takes into account the masking effect and subjective image quality.