The invention relates generally to video encoding which utilizes motion-compensated video compression techniques, and more particularly to rate control in multi-resolution video encoding applications.
Motion video sequences typically contain a significant amount of intra-frame or xe2x80x9cspatialxe2x80x9d redundancy as well as inter-frame or xe2x80x9ctemporalxe2x80x9d redundancy. Video compression techniques take advantage of this spatial and temporal redundancy to significantly reduce the amount of information bandwidth required to transmit, store and process video sequences. Existing standards for digital video compression include, for example, H.261, H.263, Motion-JPEG, MPEG-1 and MPEG-2. Transmission of compressed digital video can take place over many types of transmission facilities, and with many available bandwidths. For example, in a multipoint transmission application, two or more receivers of a compressed video bitstream may each have different available bandwidths with which to receive the video data. It is generally desirable in such an application to allow a receiver with a high bandwidth to receive higher resolution video than a receiver with a low bandwidth, rather than limiting all of the receivers to the low bandwidth. In these and other similar applications, a given video sequence is encoded at multiple resolutions.
The MPEG-2 standard implements multi-resolution video encoding through a process known as spatial scalability. This involves encoding a base layer of the video at a lower resolution and one or more enhancement layers at higher resolutions. The base layer is then transmitted to all receivers in a multipoint transmission application, and the enhancement layer or layers are transmitted only to the higher bandwidth receivers. However, MPEG-2 spatial scalability requires the higher bandwidth receiver to decode two or more layers, which increases the computational complexity of the decoding process. In addition, the bandwidth required for transmitting two or more layers is generally higher than that required for transmitting a single bitstream encoded at the higher resolution. Additional detail regarding these and other aspects of the MPEG-2 standard are described in greater detail in xe2x80x9cInformation Technology Generic Coding of Moving Pictures and Associated Audio Information: Video,xe2x80x9d ISO/IEC DIS 13818-2, which is incorporated herein by reference.
FIG. 1 shows a conventional multi-resolution encoding system 10. A video sequence in Common Intermediate Format (CIF) is supplied directly to a first standard video encoder 12 and also to a downsampler 14. The first standard video encoder 12 encodes the CIF video sequence to generate a CIF bitstream. The downsampler 14 converts the CIF video sequence to a Quarter-CIF (QCIF) video sequence. A second standard video encoder 16 encodes the QCIF video sequence to generate a QCIF bitstream. The two encoders 12, 16 operate substantially independently, and generally do not share rate control information.
FIG. 2 shows one of the standard video encoders 12, 16 of FIG. 1 in greater detail. The CIF or QCIF video sequence is applied via a signal combiner 20 to a discrete cosine transform (DCT) generator 22 which generates DCT coefficients for macroblocks of frames in the sequence. These coefficients are applied to a quantizer 24, and the resulting quantized coefficients may be zig-zag scanned and run-amplitude coded before being applied to a variable-length coder (VLC) 26. The output of the VLC 26 is an encoded bitstream. Rate control is provided by a rate control processor 28. The DCT, quantization and variable-length coding operations of FIG. 2 are designed to remove spatial redundancy within a given video frame in the sequence.
Temporal or inter-frame redundancy is removed in the encoder of FIG. 2 through a process of inter-frame motion estimation and predictive coding. For example, MPEG-2 video frames may be either intra-coded (I) frames, forward-only predictive (P) frames or bidirectionally-predictive (B) frames. An I frame is encoded using only the spatial compression techniques noted above, while a P frame is encoded using xe2x80x9cpredictivexe2x80x9d macroblocks selected from a single reference frame. A given B frame is encoded using xe2x80x9cbidirectionally-predictivexe2x80x9d macroblocks generated by interpolating between a pair of predictive macroblocks selected from two reference frames, one preceding and the other following the B frame. In the encoder of FIG. 2, the output of the quantizer 24 is applied to an inverse quantizer 30 and then to an inverse DCT generator 32. The output of the inverse DCT generator 32 is processed over one or more frames by a motion compensator 34 and motion estimator 36. The motion compensator 34 generates motion vectors which are combined with a subsequent frame in signal combiner 20 so as to reduce inter-frame redundancy and facilitate encoding.
A conventional video encoder such as that shown in FIG. 2 generally attempts to match the bitrate of the compressed video stream to a desired transmission bandwidth. The quantization parameter (QP) used in the quantizer 24 generally has a substantial effect on the resultant bitrate: a large QP performs coarse quantization, reducing the bitrate and the resulting video quality, while a small QP performs finer quantization, which leads to a higher bitrate and higher resulting image quality. The rate control processor 28 thus attempts to find a QP that is high enough to restrain the bitrate, but with the best possible resulting image quality. In general, it is desirable to maintain consistent image quality throughout a video sequence, rather than having the image quality vary widely from frame to frame. Both the MPEG-2 simulation model and the H.263 test model suggest rate control techniques for selecting the QP.
Approaches for implementing this type of rate control are described in greater detail in, for example, A. Puri and R. Aravind, xe2x80x9cMotion-Compensated Video Coding with Adaptive Perceptual Quantization,xe2x80x9d IEEE Transactions on Circuits and Systems for Video Technology, Vol. 1, No. 4, pp. 351-361, December 1991, and W. Ding and B. Liu, xe2x80x9cRate Control of MPEG Video Coding and Recording by Rate-Quantization Modeling,xe2x80x9d IEEE Transactions on Circuits and Systems for Video Technology, Vol. 6, No. 1, pp. 12-20, February 1996, both of which are incorporated by reference herein. These approaches generally first select a target bitrate for each frame type (i.e., I frames, P frames and B frames), and the encoder attempts to assign the same number of bits to each frame of the same type. A frame-wide QP is then determined for each frame in an attempt to match the target bitrate for that frame. The approach described in the Puri and Aravind reference determines the frame-wide QP by using an activity measure, the frame variance. The approach described in the Ding and Liu reference generates a rate-quantization model. In either approach, the encoder may also vary the QP for individual macroblocks based on local activity measures.
A significant problem with these and other conventional rate control techniques is that they can be computation intensive, particularly for high resolution video sequences. For example, the approach in the Ding and Liu reference performs multi-pass encoding, that is, an entire frame is encoded more than one time using different QPs in order to find a QP that results in an actual bitrate closer to the target bitrate. This type of multi-pass encoding can be very computation intensive, and substantially reduces the efficiency of the encoding process.
The invention provides a multi-resolution video encoding system which improves the computational efficiency associated with encoding a video sequence in two or more different resolutions. An illustrative embodiment includes a first encoder for encoding the sequence at a first resolution, and a second encoder for encoding the sequence at a second resolution, where the second resolution is higher than the first resolution. Information obtained from encoding the sequence at the first resolution is used to provide rate control for the sequence at the second resolution. This information may include, for example, a relationship between a quantization parameter selected for an image at the first resolution and an actual output bitrate generated by encoding the image using the selected quantization parameter.
An exemplary rate control process implemented in the above-described illustrative embodiment may first determine target bitrates for different types of images at each of the first and second resolutions. The target bitrates may be set independently for each of the first and second resolutions, or alternatively maintained in a fixed ratio. The process then utilizes a rate-quantization model to select a quantization parameter for use with a given one of the images of the sequence at the first resolution. The selected quantization parameter is the quantization parameter which best matches the target bitrate for the first resolution. An estimated bitrate is determined for the image at the first resolution to be encoded using the selected quantization parameter, by dividing the target bit rate for the second resolution by a factor. The rate-quantization model is then used to determine a quantization parameter for an image at the second resolution, by finding the best quantization parameter for encoding the image at the first resolution to achieve the estimated number of bits for the image. The above-noted factor may be updated as the sequence is encoded by, for example, recomputing it as a moving average of the ratio between: (1) an actual number of bits used when encoding the image at the second resolution using the determined quantization parameter, and (2) the number of bits which the rate-quantization model estimates will be required for encoding the image at the second resolution.
The invention improves the computational efficiency of multi-resolution video encoding by using information generated during the encoding of lower resolution images to facilitate the encoding of higher resolution images. Unlike the conventional spatial scalability approach described previously, the invention can allow each receiver in a multipoint transmission application to choose its own single-resolution video stream, such that each receiver makes the most efficient use of its own available bandwidth. The invention can be used with a variety of video encoding standards, including H.261, H.263, Motion-JPEG, MPEG-1 and MPEG-2. These and other features and advantages of the present invention will become more apparent from the accompanying drawings and the following detailed description.