A video coder is utilized to code a video signal such as an HDTV signal for transmission via a telecommunication channel to a remote location, where the video signal is reconstructed by a decoder. To transmit a video signal over a telecommunication channel with a minimum of bandwidth, a video coder which achieves a high compression ratio is utilized.
Highly compressive video coders typically employ coding algorithms which utilize three compression modes: intraframe coding, motion compensated predictive coding, and motion compensated interpolative coding.
The different compression modes are assigned to frames of video as follows. In general, a sequence of video frames is divided into groups of N frames called groups of pictures (GOPs). The first frame of each GOP is intraframe coded. The two other compression modes, predictive and interpolative, are assigned to frames according to the expression N=HL+1, where L is the number of predictive frames in each GOP and M is the distance in frames between an intra-frame and a predictive frame or between two predictive frames. Therefore, the number M-1 is the number of interpolative frames between predictive frames. Each group of adjacent predictive and interpolative frames forms a sub-group of pictures (SGOP). The number of frames in each SGOP is M and the number of SGOPs in each GOP is L.
Intra-frame coding requires no picture information beyond the frame itself. The pixels comprising a frame to be intra-frame coded are divided into small blocks such as 8.times.8 or 16.times.16 blocks. Each block is coded by a coding circuit which carries out an orthogonal transformation such as a discrete cosine transform (DCT). A quantizer then quantizes the transform coefficients resulting from the orthogonal transform.
Unlike intra-frame coding, the two other coding modes require prediction or interpolation from one or two neighboring frames. To efficiently reduce the temporal redundancy, some method of motion compensation must be utilized. For each block of pixels in a frame to be predictively coded, the best match block from the nearest previous intra or predictive frame is identified and a corresponding motion vector is obtained. The best match block is used as a motion compensated prediction for the current block. A criterion involving an activity index comparison is used to determine if the current block should be motion compensated. If the test is positive, the current block is coded with a predictive mode, wherein the predictive error (i.e. the difference) between the current block and its motion compensated prediction is DCT coded in conjunction with a quantizer. Otherwise the current block is coded with an intramode, wherein the data in the block is directly DCT transformed and quantized.
The processing of interpolative frames is similar, except that an interpolative frame requires bidirectional prediction. Therefore, motion estimation and compensation are performed twice, once with a previous and once with a subsequent intra or predictive frame. Based on a particular criteria, one of the three--forward prediction, backward prediction, or interpolation of both--is selected as the motion compensated prediction. As in predictive frames, either a predictive mode or an intramode is used to code each block depending on a decision criterion involving an activity index.
In all three compressive coding modes, the transform coefficients resulting from the DCT transform are quantized. To take advantage of lower human visual sensitivity at higher DCT frequencies, larger quantizer step sizes are used for the DCT coefficients at these frequencies. The quantizer can be represented by a matrix comprising the elements q(m,n). Each element or parameter q(m,n) represents the quantizer step size for the DCT coefficient having the location (m,n) in transform space.
After being quantized, the DCT coefficients are then variable length coded or fixed length coded.
The number of code bits required to code a frame varies significantly from frame to frame. Interpolative frames require the least number of code bits and intra-frames require the most code bits.
Although in some cases it may be possible to handle this fluctuation in bit rates in the telecommunication channel or network which transmits the compressively coded video signal, it is preferable for the coding circuit itself to smooth out the fluctuations so as to allow for a simpler network configuration. To smooth out the fluctuations in bit rates, a rate buffer is utilized as an interface between the coding circuit and the network.
To control the bit rate transmitted from the rate buffer into the network to a desired level, it is necessary to control the buffer content, i.e., it is necessary to maintain the fraction of the buffer which is occupied within predetermined limits. To control the buffer content, information about the buffer content is fed back to the coding circuit to control the rate at which bits are generated by the coding circuit. The number of code bits generated by the coding circuit is controlled by controlling the quantizer step sizes. In general, smaller quantizer step sizes result in more code bits and larger quantizer step sizes result in fewer code bits. In addition, it should be noted that in general, busy image regions result in the generation of more code bits than smooth image regions.
A variety of rate buffer control strategies are disclosed in the prior art (see, e.g., CCITT SG XV, "Description of Reference Model 8 (RM8)," Doc. 525, June 1989; ISO/MPEG, "MPEG Video Simulation Model 3," Doc. 90/041, July 1990; W. H. Chen and W. K. Pratt, "Scene Adaptive Coder," IEEE Trans. Comm., Vol. COM-32, pp. 225-232, Mar. 1984). In these strategies, the quantizer step sizes are determined based on the current buffer content without regard for the transitions in video quality caused by changes to the quantizer. This strategy requires an a priori knowledge of a mapping between buffer content and particular quantizer step sizes. While this strategy may pilot the buffer contents to a normal state relatively quickly, the quality within frames or between frames is not uniform. Consider a frame where the first part is busy and the second part smooth. Because busy image sections result in the generation of more code bits than smooth image sections, coding the first part of this frame will fill the rate buffer significantly. As a result, the quantizer step sizes used for the second part of the frame may have to be quite large so as to reduce the number of code bits generated to maintain the buffer content at a desired level. Because large quantizer step sizes are utilized for the second part of the frame, the visual quality will be worse for this part of the frame when it is reconstructed at a receiver. This problem is particularly severe for the smooth portion of the frame, as human perception is particularly sensitive to degradation in the smooth and still sections of an image.
In view of the foregoing, it is an object to the present invention to provide a method for coding a video image including a rate buffer control strategy which overcomes the problems associated with conventional rate buffer control strategies such as sharp changes in video quality and changes in video quality within particular frames. More specifically, it is an object of the invention to provide a rate buffer control strategy which enables a substantially constant bit rate from the rate buffer into the network, despite the fluctuations in bits received from the coding circuit as a result of alternating between intra-frame, predictive, and interpolative coding.