1. Field of the Invention
This invention generally relates to digital image communication processes and, more particularly, to a system and method for adaptively controlling the bit rate, when transcoding between compressed video protocols.
2. Description of the Related Art
Compressed digital video is widely used in multimedia applications. There exist many digital video coding standards. Different applications and environments have different video stream requirements. Therefore, the conversion of digital video bitstreams from one compressed format, into another, is necessary. This process is called video transcoding. The format change may be a different bitrate, frame size, or even compression standard.
The conventional rate control method uses a one-pass process. With conventional rate control, the encoder makes assumptions concerning the picture types, without knowledge of the sequence. It controls bit allocations and quantization based on the pictures already coded. This is not optimal for the overall bit allocation.
As noted in U.S. Pat. No. 6,310,915, in the MPEG-2 standard pictures are both spatially and temporally encoded. Each picture is first divided into non-overlapping macroblocks, where each macroblock includes a 16×16 array of luminance samples and each block or array of 8×8 chrominance samples overlaid thereon. A decision is made to encode the macroblock as an inter macroblock, in which case the macroblock is both temporally and spatially encoded, or to encode the macroblock as an intra macroblock, in which case the macroblock is only spatially encoded. A macroblock is temporally encoded by an inter-picture motion compensation operation. According to such an operation, a prediction macroblock is identified for the to-be-motion compensated macroblock and is subtracted therefrom to produce a prediction error macroblock. The prediction macroblock originates in another picture, called a reference picture, or may be an interpolation of multiple prediction macroblocks, each originating in different reference pictures. The prediction macroblock need not have precisely the same spatial coordinates (pixel row and column) as the macroblock from which it is subtracted and in fact can be spatially offset therefrom. A motion vector is used to identify the macroblock by its spatial shift and by the reference picture from which it originates. (When the prediction macroblock is an interpolation of multiple prediction macroblocks, a motion vector is obtained for each to-be-interpolated prediction macroblock).
Pictures may be classified as intra or I pictures, predictive or P pictures and bidirectionally predictive or B pictures. An “I” picture contains only intra macroblocks. A “P” picture may contain inter macroblocks, but only forward directed predictions from a preceding reference picture are permitted. A “P” picture can also contain intra macroblocks for which no adequate prediction was found. In addition, a dual prime prediction may be formed for a P picture macroblock in an interlaced picture, which is an interpolated prediction from the immediately two preceding reference fields. A “B” picture can contain intra macroblocks, inter macroblocks that are forward motion compensated, inter macroblocks that are backward motion compensated, i.e., predicted from a succeeding reference picture, and inter macroblocks that are bidirectionally motion compensated, i.e., predicted from an interpolation of prediction macroblocks in each of preceding and succeeding reference pictures. If the P or B pictures are interlaced, then each component field macroblock can be separately motion compensated or the two fields can be interleaved to form a frame macroblock and the frame block can be motion compensated at once.
Spatial compression is performed on selected 8×8 luminance pixel blocks and selected 8×8 pixel chrominance blocks of selected prediction error macroblocks, or selected intra macroblocks. Spatial compression includes the steps of discrete cosine transforming each block, quantizing each block, zig-zag (or alternate) scanning each block into a sequence, run-level encoding the sequence and variable length encoding the run-level encoded sequence. Prior to discrete cosine transformation, a macroblock of a frame picture may optionally be formatted as a frame macroblock, including blocks containing alternating lines of samples from each of the two component field pictures of the frame picture, or as a field macroblock, where the samples from different fields are arranged into separate blocks of the macroblock. The quantization parameter may be changed on a macroblock-by-macroblock basis and the weighting matrix may be changed on a picture-by-picture basis. Macroblocks, or coded blocks thereof, may be skipped if they have zero (or nearly zero) valued coded data. Appropriate codes are provided into the formatted bitstream of the encoded video signal, such as non-contiguous macroblock address increments, or coded block patterns, to indicate skipped macroblocks and blocks.
Additional formatting is applied to the variable length encoded sequence to aid in identifying the following items within the encoded bitstream: individual sequences of pictures, groups of pictures of the sequence, pictures of a group of pictures, slices (contiguous sequences of macroblocks of a single macroblock row) of pictures, macroblocks of slices and motion vectors and blocks of macroblocks. Some of the above layers are optional, such as the group of pictures layer and the slice layer, and may be omitted from the bitstream if desired. (If slice headers are included in the bitstream, one slice header is provided for each macroblock row.) Various parameters and flags are inserted into the formatted bitstream as well indicating each of the above noted choices (as well as others not described above). The following is a brief list of some of such parameters and flags: picture coding type (I,P,B), macroblock type (i.e., forward predicted, backward predicted, bidirectionally predicted, spatially encoded only) macroblock prediction type (field, frame, dual prime, etc.), DCT type (i.e., frame or field macroblock format for discrete cosine transformation), the quantizer scale code, etc.
Generally speaking, it is desirable to use the same picture coding type and the same intra/inter macroblock decisions in the subsequent encoding of the transcoding operation as was done in originally encoding the video signal fed to the transcoder. This maintains picture quality.
As noted in U.S. Pat. No. 6,587,508, a conventional transcoder is designed to input first bit streams at a predetermined input bit rate through the input terminal, to convert the first bit streams into second bit streams to be output at a predetermined output bit rate, i.e., a target bit rate, equal to, or lower than the input bit rate of the inputted first bit streams. The conventional transcoder may comprise a variable length decoder, a de-quantizer, a quantizer, a variable length encoder, and a rate controller.
The variable length decoder is designed to decode a coded moving picture sequence signal within the first bit streams to reconstruct an original picture data for each of pictures including a matrix of original quantization coefficients. The de-quantizer is designed to input the matrix of original quantization coefficients level from the variable length decoder and the first quantization parameter. The de-quantizer is further designed to inversely quantize the inputted matrix of original quantization coefficients level with the first quantization parameter to generate a matrix of de-quantization coefficients, referred to as “dequant”, i.e., DCT coefficients, for each of macroblocks as follows:dequant={2×level+sign(level)}×Q1×QM DIVIDED 32;  (a)or,dequant=level×Q1×QM DIVIDED 16;  (b)
where the equation (a) is used for the inter macroblock, while the equation (b) is used for the intra macroblock. QM is a matrix of quantization parameters stored in a predetermined quantization table. The first quantization parameter Q1 and the matrix of quantization parameters QM are derived from the inputted first bit streams by the decoder. Here, the original quantization coefficients level, the de-quantization coefficients dequant, the matrix of quantization parameters QM, and the first quantization parameter Q1 are integers. The de-quantization coefficients dequant calculated by the equations (a) and (b) should be rounded down to the nearest one.
The quantizer is designed to input the matrix of de-quantization coefficients dequant from the de-quantizer and then quantize the inputted matrix of de-quantization coefficients dequant for each of macroblocks with a second quantization parameter, referred to as “Q2” hereinlater, to generate a matrix of re-quantization coefficients, referred to as “tlevel”, as follows:tlevel=dequant×16 DIVIDED Q2×QM;  (c)                or,tlevel=dequant×16 DIVIDED Q2×QM+sign(dequant)×1 DIVIDED 2;  (d)        
where the equation (c) is used for the inter macroblock, while the equation (d) is used for the intra macroblock. The second quantization parameter Q2 is obtained by the rate controller. Here, the re-quantization coefficients tlevel and the second quantization parameter Q2 are also integers. The re-quantization coefficients tlevel calculated by the equations (c) and (d) should be rounded down to the nearest one.
The variable length encoder is designed to input the re-quantization coefficients tlevel from the quantizer and then encode the inputted matrix of the re-quantization coefficients tlevel to generate an objective picture data for each of pictures to sequentially output the objective picture data in the form of the second bit streams. The variable length encoder is designed to input a diversity of information included in the first bit streams necessary for the second bit streams from the variable length decoder.
The rate controller is designed to perform a rate control over the encoding in the conventional transcoder according to the TM-5 on the basis of the information obtained from the de-quantizer as described below.
The transcoder, however, has no information on the structure of group of pictures, such as a picture rate of I or P-pictures within each of the group of pictures, so that the transcoder must estimate the structure of group of pictures within the inputted moving picture sequence to allocate bits for each type of pictures within the estimated structure of group of pictures. Furthermore, the transcoder is required to decode the first bits streams almost all over the layers, such as the sequence layer, the group of pictures layer, the picture layer, the slice layer, and the macroblock layer in order to derive necessary data for transcoding from the first bits streams. This operation wastes time, thereby causing the delay in the transcoding process.
An improved convention is adapted to perform the rate control without estimating the structure of group of pictures. This transcoder further comprises a delay circuit. The delay circuit is interposed between the variable length decoder and the de-quantizer and designed to control the flow of the signal from the variable length decoder to the de-quantizer. The delay circuit is operated to delay starting the de-quantizating process in the de-quantizer until the variable length decoder has been finished to decode one of the pictures in the coded moving picture sequence signal. However, the de-quantizer must wait until the decoding process of the picture has been completed over the entire target transcoding frame, thereby causing the delay in the transcoding process.
Another conventional transcoder includes a target output bit updating unit and a quantization parameter computing unit, in addition to a target ratio computing unit and a bit difference computing unit. This transcoder can perform the rate control on the basis of the formation on the number of coding bits previously recorded in the input bit streams. This transcoder has information on the number of coding bits previously recorded in the bits stream, making it possible to solve the problem of the delay in the second conventional transcoder. The third conventional transcoder, however, has another problem. The encoder that is linked with the third transcoder must provide the above information on the number of coding bits to be recorded in the bit streams, thereby causing the delay of process in the encoder.
In the case of a transcoder, the picture coding type and inter/intra macroblock decision is preferably constrained to be the same during a successive encoding as it was during the previous encoding. As such, the encoder of a transcoder has only two options available for varying the encoding. First, while the transcoder's decoder decodes pictures of the bitstream, information regarding the decoded picture types can be gathered. The transcoder's encoder extrapolates from this information as to what picture types are expected and allocates bits accordingly. However, this solution does not work well if the group of pictures structure of the bitstream changes. For example, the group of picture structure can change from IBBPBBPBBPBBI to IIIIIII. In such a case, the extrapolation of picture coding type will be erroneous. In the example above, the unanticipated rise in I picture frequency will result in an incorrect allocation of bits and degraded quality for unanticipated I pictures.
Second, the transcoder can make no assumption about picture types and simply scale the number of bits used in the original encoding according to the ratio of the bit rate of the originally encoded bitstream to the bit rate of the re-encoded bitstream produced by the transcoder. However, this solution does not work well if the bit rate of the originally encoded bitstream fed to the transcoder is far higher than the bit rate of the re-encoded bitstream produced by the transcoder. The reason for this is that the difference in the number of bits used for different picture coding types is inversely correlated with the bit rate of the signal. Thus, at very high bit rates, B pictures have a similar number of bits of encoded data as I pictures yet at low bit rates, I pictures have far more bits of encoded data than B pictures.
It would be advantageous if the transcoding process could take advantage of the known complexity of the input bitstream, as expressed in the number of bit per frame and the quantization per frame, to determine the quantization factor of the output bitstream.