1. Field of the Invention
This invention relates to methods of coding video signals for digital storage and/or transmission of such signals using joint rate control for multiple video objects based on a quadratic rate-distortion model.
More particularly, this invention relates to a method of encoding video employing a joint rate control algorithm for multiple video object coding. The algorithm is based on the VM7 rate control scheme as described in the MPEG-4 Video Verification Model V7.0 ISO/ICC JTCI/SC29/WG11, Coding of Moving Picture and Associated Audio MPEG 97/N1642. April 1997, Bristol, U.K.
The method follows a similar framework as that proposed previously by the current inventors in their parent application, with a change in the method of target distribution and introduction of a tool to take into account object shape in the rate control process. These modifications contribute to more homogeneous quality among video objects and better buffer regulation. As a whole, the method provides an effective means of coding multiple video objects so that the buffer is well-regulated and bits are appropriately distributed; yet it is flexible in deciding the necessary compromise between spatial and temporal quality.
2. Description of the Prior Art
A basic method for compressing the bandwidth of digital color video signals which has been adopted by the Motion Picture Experts Group (MPEG) utilizes Discrete Cosine Transform (DCT) techniques. In addition, the MPEG approach employs motion compensation techniques.
The MPEG standard achieves high data compression rates by developing information for a full frame of the image only every so often. The full image frames, or intra-coded pictures are called "I-frames", and contain the full frame information independent of any other frames. Between the I-frames, there are so-called B-frames and P-frames which store only image differences which occur relative to reference anchor frames.
More specifically, each frame of video sequence is partitioned into smaller blocks of pixel data and each block is subjected to the discrete cosine transformation function to convert the statistically dependent spatial domain picture elements (pixels) into independent frequency domain DCT coefficients.
That is, the blocks of data, encoded according to intraframe coding (I-frames) , consist of matrices of Discrete Cosine Coefficients. Respective 8.times.8 or 16.times.16 blocks of pixels are subjected to a Discrete Cosine Transform (DCT) to provide a coded signal. The coefficients are subjected to adaptive quantization, and then are run-length and variable-length encoded. Hence, respective blocks of transmitted data may include fewer than an 8.times.8 matrix of codewords. Macroblocks of intraframe encoded data will include, in addition to the DCT coefficients, information such as the level of quantization employed, a macroblock address or location indicator, and a macroblock type, the latter information being referred to as "header" or "overhead" information.
Blocks of data encoded according to P or B interframe coding also consist of matrices of Discrete Cosine Coefficients. In this instance however, the coefficients represent residues or differences between a predicted 8.times.8 pixel matrix and the actual 8.times.8 pixel matrix. These coefficients are subjected to quantization and run- and variable-length coding. In the frame sequence, I and P frames are designated anchor frames. Each P frame is predicted from the lastmost occurring anchor frame. Each B frame is predicted from one or both of the anchor frames between which it is disposed. The predictive coding process involves generating displacement vectors which indicate which block of an anchor frame most closely matches the block of the predicted frame currently being coded. The pixel data of the matched block in the anchor frame is subtracted, on a pixel-by-pixel basis, from the block of the frame being encoded, to develop the residues. The transformed residues and the vectors comprise the coded data for the predictive frames. As with intraframe coded frames, the macroblocks include quantization, address and type information.
The results are usually energy concentrated so that a few of the coefficients in a block contain the main part of the picture information. The coefficients are quantized in a known manner to effectively limit the dynamic range of the coefficients and the results are then run-length and variable-length encoded for application to a transmission medium.
In a recent proposal for implementing the latest coding verification model (VM), which is described in "MPEG-4 Video Verification Model Version 5.0", distributed by Adhoc group on MPEG-4 video VM editing to its members under the designation ISO/IEC JTC1/SC29/WG11 MPEG 96/N1469, November 1996, the contents of which are incorporated herein by reference, representatives of the David Sarnoff Research Center proposed "A New Rate Control Scheme Using Quadratic Rate Distortion Model". The MPEG-4 video coding format will produce a variable bit rate stream at the encoder from frame to frame (as was the case with prior schemes). Since the variable bit rate stream is to be transmitted over a fixed rate channel, a channel buffer is employed to smooth out the bit stream. In order to prevent the buffer from overflowing or underflowing, rate control of the encoding process is required.
In the recent Sarnoff proposal, before the encoding process begins for a given set of frames (picture) a target bit rate for each frame is calculated to accommodate the fact that the output bit rate from the output of the encoder is constrained to a fixed bit rate while the bit rate resulting from picture encoding can vary over a relatively wide range (if left uncorrected), depending on the content of the image frame. According to the proposal, the distortion measure associated with each frame is assumed to be the average quantization scale of the frame and the rate distortion function is modeled as a second order function of the inverse of the distortion measure. Before the actual encoding process begins the target bit rate of the image is estimated by the number of bits left for coding the group of images, as well as the number of frames still to be encoded. The authors mention implementing their scheme at the picture level and also note a possibility for extending their scheme to the macroblock level.
It has also been known that when a block (macroblock) contains an edge boundary of an object, the energy in that block after transformation, as represented by the DCT coefficients, includes a relatively large DC coefficient (top left corner of matrix) and randomly distributed AC coefficients throughout the matrix. A non-edge block, on the other hand, usually is characterized by a similar large DC coefficient (top left corner) and a few (e.g. two) adjacent AC coefficients which are substantially larger than other coefficients associated with that block. This information relates to image changes in the spatial domain and, when combined with image difference information obtained from comparing successive frames (i.e. temporal differences) factors are available for distinguishing one video object (VO) from another.
As shown in FIG. 1 (a sample video scene), one or more video objects (VO.sub.1, VO.sub.2, VO.sub.i) may be contained in an image frame or plane (VOP) and, in each successive frame, the relative positioning of video objects may be expected to change, denoting motion. At the same time, this motion assists in defining the objects.
Under the MPEG-4 VM, additional objectives of content-based manipulation and independent bit stream coding have been imposed to provide added functionality at the decoder end of the system. The MPEG-4 objective complicates and imposes additional processing requirements on the process of predicting target bit rates for each frame as a result of the added overhead information such as the coding of shape information within the MPEG-4 encoder. The foregoing characteristics of the MPEG-4 VM, as well as information regarding identification of individual VO's is explained in greater detail in the above-referenced manual.
It is an object of the present invention to provide an adaptive video coding method which is particularly suitable for MPEG-4 encoder and other encoding schemes.
It is a further object of the present invention to provide an adaptive video coding method for use in accordance with MPEG-4 VM wherein individual video objects (VO's) are taken into account in providing an improved bit rate control system making use of relative motion, size, variance and shape of each VO.