The invention relates to the encoding of video signals, and more particularly, encoding of video allowing control of bitrate to meet a target while ensuring that good video quality will result when the encoded stream is decoded.
Video compression is a popular topic since there are a plethora of existing and upcoming applications, products and services of digital video. With tend towards higher resolution/quality digital video, the bandwidth requirements of uncompressed digital video becomes quite significant, necessitating the use of compression. Thus a number of video compression schemes have been developed, some proprietary while others that are standards. The goal in video encoding is to be able to generate a compressed representation of video material that can be decoded for playback by suitable devices or in software. Typically, good quality encoding can be computationally intensive and expensive and thus it is preferable to generate coded content just once, and decode it for play back often, as needed. This requires interoperability between encoded compressed representations (bitstreams) and decoders capable of playing it. A guarantee of interoperability also implies that decoder from different manufacturers would be able to decode compliant bitstreams resulting in decoded video of identical quality. Further, since video coding/decoding can be computationally expensive, to reduce decoder costs, economies of scale are often exploited. Both for the reasons of interoperability as well as that of economies of scale, considerable effort has been put in standardization of video compression schemes, although many proprietary schemes also co-exist.
Earlier MPEG audio and video coding standards such as MPEG-1 and MPEG-2 have enabled many familiar consumer products. For instance, these standards enabled video CD's and DVD's allowing video playback on digital VCRs/set-top-boxes and computers, and digital broadcast video delivered via terrestrial, cable or satellite networks, allowing digital TV and HDTV. While MPEG-1 mainly addressed coding of non-interlaced video of Common Intermediate Format (CIF) resolution at data-rates of 1.2 Mbit/s for CD-ROM offering VHS-like video quality, MPEG-2 mainly addressed coding of interlaced TV resolution video at 4 to 9 Mbit/s and high definition TV (HDTV) video at 15 to 20 Mbit/s. At the time of their completion the MPEG-1 (1992) and the MPEG-2 (1994) standards represented a timely as well as practical, state-of-the-art technical solution consistent with the cost/performance tradeoffs of the products intended an within the context of implementation technology available. MPEG-4 was launched to address a new generation of multimedia applications and services. The core of the MPEG-4 standard was developed during a five year period however MPEG-4 is a living standard with new parts added continuously as and when technology exists to address evolving applications. The premise behind MPEG-4 was future interactive multimedia applications and services such as interactive TV, internet video etc where access to coded audio and video objects might be needed. The MPEG-4 video standard is designed as a toolkit standard with the capability to allow coding and thus access to individual objects, scalability of coded objects, transmission of coded video objects on error prone networks, as well as efficient coding of video objects. From coding efficiency standpoint, MPEG-4 video was evolutionary in nature as it was built on coding structure of MPEG-2 and H.263 standards by adding enhanced/new tools with in that structure. Thus, MPEG-4 part 2 offers a modest coding gain but only at the expense of a modest increase in complexity.
The H.264/MPEG-4 AVC standard is a new state of the art video coding standard that addresses aforementioned applications. The core of this standard was completed in the form of final draft international standard (FDIS) in June 2003. It promises significantly higher compression than earlier standards. The standard evolved from the original work done by ITU-T VCEG in their H.26L project over the period of 1999-2001, and with MPEG joining the effort in late 2001, a joint team of ITU-T VCEG and ISO MPEG experts was established for co-developing the standard. The resulting joint standard is called H.264 by VCEG and is called either MPEG-4 part 10 or MPEG-4 Advanced Video Coding (AVC) by MPEG. Informally, the standard is also referred to as the Joint Video Team (JVT) standard since it was a result of collaborative activity of VCEG and MPEG standards groups. The H.264/MPEG-4 AVC standard is often quoted as providing up to a factor of 2 improvement over MPEG-2, and as one would expect the significant increase in compression efficiency comes at the expense of substantial increase in complexity. As in the case of earlier standards, only the bitstream syntax and the decoding semantics are standardized, encoder is not standardized. However, to obtain good results, encoding needs to be performed in a certain manner, and many aspects of encoding are implemented demonstrated in collaborative software developed by JVT, known as the Joint Model (JM).
Rate control, since it is a major encoding issue and further it can be fairly application dependent and complex; it has not been addressed sufficiently in JVT. Despite ongoing effort of over a year, and while it can have a significant impact on coded video quality, the JM software still does not include a solution for rate control. While an important requirement in rate control is to ensure that on the average, coding bitrate does not exceed target bitrate, this has to be done while maintaining acceptable video quality. Thus adaptive quantization is also closely related to rate control as adaptation of quantizer used in transform coding is a common approach to control rate of generation of bits in video coding. More successful techniques for rate control have to be generally aware of characteristics of the content, features of video coders, as well as spatial/temporal quality expectations from an application. Being aware of codec features typically involves knowing about, individual picture types (I-, P-, B- and others) and their bitrate needs, picture coding structures that can be derived from picture types, tradeoffs in motion coding versus transform coding, impact of quantizer adjustment vs. frame dropping etc. Among the many solutions for rate control available, the rate control of MPEG-2 Test Model 5 (TM5) still offers a reasonable starting point and can be the basis of design for a new, custom rate controller. The TM5 rate controller consists of three main steps—target bit allocation, virtual buffer based bit rate control, and adaptive quantization. But TM5 rate controller, while a reasonable starting point, was designed for MPEG-2, a very different codec than H.264/MPEG-4 AVC. Even for MPEG-2 it has well documented shortcomings, and further it was intended for higher bit-rate coding only so its performance may not be good at lower bitrates. Besides, there are several new issues with H.264 as compared to earlier standards that one needs to be careful about in designing a rate controller. Here is a list of some of the issues that are relevant to bitrate and quality control while coding as per the H.264/MPEG-4 AVC standard.                Since coding occurs at relatively lower bitrates then earlier standards, relatively larger bitrate fluctuations can easily occur during coding causing difficulties in rate control.        The nature of quantizer in this standard may not allow sufficient precision in quantizer adaptation at normal coding bitrates at the expense of too much precision at higher bitrates, causing difficulties in rate control.        Since in this standard, changes in quantizer impact loop filtering, during rate control, care needs to be taken in changing quantizer to avoid introducing spatio-temporal variations that can cause visible artifacts.        The bitrates for B-pictures are generally smaller but can vary a lot with respect to earlier standards and thus add to difficulties in rate control.        Quantizer changes need to be carefully restricted based on scene complexity, picture types, and coding bitrate to prevent adverse impact on picture quality.        Low complexity motion estimation, mode decision, and reference selection can result in excessive bits generated for certain frames, making bitrate control difficult.        Macroblock quantizer or RDopt lambda changes if not performed carefully can introduce visible spatio-temporal quality variations in areas of fine texture.        
Thus, at present none of the rate control techniques provide a good solution for bitrate and picture quality controlled encoding with H.264/MPEG-4 AVC standard over a range of bit-rates and video content. This is so because none of the existing techniques were designed to address nuances of H.264/MPEG-4 AVC, which is a complex, new standard. Thus what is needed in the art is a new rate controller that is effective for bitrate control, producing good picture quality, while keeping low complexity and delay when encoding with H.264/MPEG-4 AVC standard. Before discussing such a rate controller that is the subject of this invention, we introduce several basic concepts in design of a rate controller, by using example of a MPEG-2 TM5, a prior art rate controller.
FIG. 1 illustrates a prior art generalized MPEG encoder with a TM5 rate controller, and FIG. 2 illustrates details of components of a TM5 rate controller. MPEG encoder with TM5 rate controller 100 shown in FIG. 1 is useful for bitrate-controlled coding of video material to achieve a given bitrate budget for storage on disk or for constant bitrate transmission over a network.
Video frames or fields referred to here as pictures to be coded are input via line 102 to an MPEG encoder 150 and to TM5 rate controller 140. An example of such an encoder is MPEG-1, MPEG-2, or MPEG-4 video encoder known to those of skill in the art. TM5 rate controller 140 takes as input, coding parameters on line 104, and coding statistics on line 152 and inputs them to picture target bits computer 110. The coding parameters on line 104 consist of bit-rate, picture-rate, number of I-, P- and B-pictures, universal coding constants for P- and B-pictures, and others. The coding statistics on line 152 consist of actual coding bits, quantizer used for the picture of a certain type just coded, and others; this statistics is output by the MPEG encoder 150. Based on this information, picture target bits computer 110 outputs target bits for each picture of a pre-known picture type to be coded. Virtual buffer based quantizer computer 120 takes as input, target bits on line 112 for a picture of a certain type being coded, a subset of coding parameters (bit_rate, picture_rate, and universal coding constants for P- and B-pictures) on line 118, and subset of coding statistics (partial bits generated in current picture up to current macroblock) on line 116 to output on line 122, a new quantizer value for each macroblock. The quantizer value output on line 122 is derived from fullness of internal virtual buffer of a picture of the type being coded and is updated every macroblock. Line 122 is also an input to activity based quantizer computer 130, at the other input 124 of which, are video pictures input to TM5 rate controller via line 140. The activity based quantizer computer 130 performs the function of modulating the buffer based quantizer available on line 122, with an activity measure for the picture being coded, outputting an activity based quantizer on line 132 for use by MPEG encoder 150 for quantization of DCT coefficients of picture blocks during encoding. The MPEG Encoder 150 outputs encoded video bitstream on line 154 and this coded bitstream can then be stored or transmitted for eventual consumption by a matching decoder to produce decoded video pictures.
FIG. 2A shows details of picture target bits computer 110 introduced in FIG. 1 and as is known in the art. In order to explain this we first introduce the terminology used by TM5 rate controller. A video sequence may be divided into groups-of-pictures (GOPs) of known size. A GOP can be identified by its length N (e.g. 15 meaning there are 15 frames in a GOP) and distance M between P-pictures (e.g. M=3, meaning 2 B-picture pattern, which would cause a coding pattern of I B B P B B P . . . from pictures in input order). Let:                SI, SP, SB correspondingly represent actual bits generated in coding any I-, P-, B-pictures,        QI, QP, QB correspondingly represent actual average quantizer values generated in coding of any I-, P-, B-pictures,        XI, XP, XB correspondingly represent resulting complexity measures (XI=SIQI, XP=SPQP, XB=SBQB),        NI, NP, NB correspondingly represent number of I-, P-, B-, pictures remaining in a GOP,        TI, TP, TB correspondingly represent target bits for coding any I-, P-, B-pictures, and        KP, KB represent corresponding universal constants (e.g., KP=1.0, KB=1.4) in coding.        
Further, let bitrate represent bitrate to be used in coding, and picturerate represent frame rate of video, G represent total bits (G=bitrate×N/picturerate) assigned to a GOP, and R represent bits remaining (after coding a picture, R=R−Si,p,b) during coding of a GOP. TM5 specifies equations for calculation of corresponding target bits TI, TP, TB of I-, P- and B-pictures, such that each of TI, TP, TB are a function of R, NP, NB, XI, XP, XB, KP, KB, bitrate, and picturerate. With this introduction of terminology, now we are ready to discuss FIG. 2A.
Coding parameters on line 104, are separated into NI, NP, NB on line 214, and KP, KB on line 216, and are applied to I-, P-, B-picture target bits equations implementer 220, that also receives as input, complexity values XI, XP, XB on line 206. Line 152 provides feedback in the form of coding statistics, QI, QP, QB on line 202, SI, SP, SB on line 204 and R on line 208. The respective QI, QP, QB on line 202 and SI, SP, SB on line 204 are multiplied in 205 resulting in XI, XP, XB, on line 206 for input to I-, P-, B-picture target bits implementer 220 Implementer 220 also takes, as an input, the output of differencer in 210, which represents the remaining bits R generated as noted above (R=R−Si,p,b).
Dividers 225, 230 and multiplier 235 collectively generate a signal having a value bitrate
      bitrate          8      *      picrate        .A selector 240 (labeled “MAX”) selects the greater of the two values output respectively from the implementer 220 and the multiplier 235 as the target rate value Ti,p,b.
FIG. 2B is a block diagram of a Virtual Buffer Based Quantizer Computer 120 suitable for use with TM5 applications. The Quantizer Computer 120 may generate a buffer based quantizer qbuf on a macroblock-by-macroblock basis for coding of input pictures. The quantizer parameter may be calculated as:
            q      buf        =                  r        31            ⁢              (                              d            X0                    +                      B                          j              -              1                                -                                                    T                X                            ×                              (                                  j                  -                  1                                )                                      MB_cnt                          )              ,where X=I, P or B depending upon the type of picture being coded, Tx are the target rate values computed by the TBC 110. The Quantizer Computer 120 may includes an initial d0I,B,P computer 250 that calculates dX0 (x=I, P or B) values according to:
            d      I0        =          10      ×              r        31              ,            d      P0        =                  K        P            ×              d        I0              ,            and      ⁢                          ⁢              d        B0              =                  K        B            ×                        d          I0                .            
FIG. 2C is a block diagram of an Activity Based Quantizer Computer 130 suitable for use in a TM5-base rated controller. Responsive to input video data vidin, the quantizer computer 130 calculates variances, minimum variances and minimum activity for each 8×8 block in an input frame (box 280). A picture average minimum activity computer 285 averages minimum variances for the macroblocks. A MB normalized minimum 8×8 block activity computer 290 generates normalized values of block activity. A MB activity quantizer computer generates a quantizer value qp based on the normalized activity identified by computer 290 and also based on an assigned picture type value ptyp and previous quantizer values qbuf. The qp value is selected for each macroblock in an input picture.
The inventors identified a need in the art for a rate controller that is effective for bitrate control, that produces good picture quality and maintains low complexity and delay when encoding with H.264/MPEG-4 AVC standard.