In a typical hybrid video coding system, a two dimensional transform converts image data or motion-compensated residual data from the spatial domain to the frequency domain. The data is thus de-correlated and arranged is such a way that most of the information is concentrated in specific two-dimensional regions on the transform.
The discrete cosine transform (DCT) can be defined asY=AXAT  (1.1)
The inverse DCT is defined asX=ATYA  (1.2)
Where X is a matrix of samples, Y is a matrix of coefficients (or levels) and A is a transform N×N matrix.
After the image samples are converted to the transform domain, the coefficients, Y, thus obtained are quantized to integer values for further processing and transmission. The process of quantization is performed on the coefficients Y obtained above.
Quantization can be considered as a reduction in the range of values of an input signal. Each sample of the input signal is mapped to one quantized output value. One basic example of this type of scalar quantization is the rounding of a fractional value to the nearest integer. Simple rounding is an example of a linear quantizer.
A formula for determining a rounded integer value is as follows:I=round(Y/Qstep)  (1.3)
Where I is the quantized output, Y is the fractional input and the quantizer step (Qstep) is the quantization step size. The quantized output levels will be spaced uniformly in intervals of quantizer parameter (QP).
Inverse quantization is the reverse and lossy process where the closest value to the original fractional value is obtained.I′=I·Qstep  (1.4)Transform and Quantization in H.264
The H.264 coding standard uses an integer transform instead of the traditional DCT. The original proposal for transform and quantization can be found in “Low Complexity Transform and Quantization—Part I: Basic Implementation”, by Hallapuro, Karczewicz, Malvar, JVT 038 of ISO/IEC MPEG & ITU-T VCEG, Geneva, January 2001. For practical purposes the three types of transforms used in the standard are close approximations to the DCT. Depending on the type of residual data to be coded the standard specifies the following:                a) Integer 4×4 transform for all blocks of data with the following extensions:        b) Hadamard 4×4 transform for Luma discrete cosine (DC) coefficients (obtained in (a)) in INTRA macroblocks predicted as Intral 16×16 mode.        c) Hadamard 2×2 transform for Chroma DC coefficients (obtained in (a)) in ANY macroblock prediction.Integer 4×4 Transform        
The advantages of the integer over the DCT are primarily practical application advantages, since, theoretically, the use of the integer is not as efficient as the DCT. However, the integer approach, for example, requires only 16-bit operations in most cases. Also, the integer approach ensures that there is no mismatch between an encoder and a decoder since it is fully defined. Y is defined as follows:
                              Y          =                                    (                              CXC                T                            )                        ⊗            E                          ⁢                                  ⁢                  Where          ,                                    (        1.5        )                                C        =                                            (                                                                    1                                                        1                                                        1                                                        1                                                                                        2                                                        1                                                                              -                      1                                                                                                  -                      2                                                                                                            1                                                                              -                      1                                                                                                  -                      1                                                                            1                                                                                        1                                                                              -                      2                                                                            2                                                                              -                      1                                                                                  )                        ⁢                                                  ⁢            E                    =                      (                                                                                a                    2                                                                                        ab                    2                                                                                        a                    2                                                                                        ab                    2                                                                                                                    ab                    2                                                                                                              b                      2                                        4                                                                                        ab                    2                                                                                                              b                      2                                        4                                                                                                                    a                    2                                                                                        ab                    2                                                                                        a                    2                                                                                        ab                    2                                                                                                                    ab                    2                                                                                                              b                      2                                        4                                                                                        ab                    2                                                                                                              b                      2                                        4                                                                        )                                              (        1.6        )            
The operator {circle around (x)} above indicates that every element of the core transform CXCT is multiplied by the corresponding element of the scaling matrix E. Thus, normal matrix multiplication is not used. The elements of E are defined as follows:a=½b=√{square root over (⅖)}  (1.7)Quantization
The H.264 coding standard uses a scalar quantizer of the basic form:L=round(Yij/Qstep)  (1.8)
There are 52 values of Qstep defined by the standard. These values are not actually transmitted, but they are indexed into a table of Qstep values assumed by both the encoder and decoder, and thus incorporated by design. The index transmitted, QP, is such that the quantization step doubles with every QP increment of 6.
Matrix E above is used as a scaling factor (SF) after the core transform is performed on the input sample values. If the unscaled coefficient levels are denoted as K=CXCT, then each element of K is quantized and scaled as follows:
                    L        =                  round          ⁡                      (                                          K                ij                            ·                              SF                                  Q                  step                                                      )                                              (        1.9        )            
Where SF can take the values a2, ab/2 or b2/2 as indicated by matrix E
Practical implementations in fixed point arithmetic transform the equation for L above into a binary shifted implementation to avoid costly division and to preserve accuracy
                                                                    L              ij                                            =                                    (                                                (                                                                                                          K                        ij                                                                                    ·                                                                  (                                                  SF                          ·                                                      2                            qbits                                                                          )                                                                    Q                        step                                                                              )                                +                                  (                                      f                    ·                                          2                      qbits                                                        )                                            )                        /                          2              qbits                                      ⁢                                  ⁢                              sign            ⁡                          (                              L                ij                            )                                =                      sign            ⁡                          (                              K                ij                            )                                      ⁢                                  ⁢        Where                            (        1.10        )                                qbits        =                  15          +                      floor            ⁡                          (                              QP                /                6                            )                                                          (        1.11        )            where f is a fraction typically less than 1, e.g., ⅓, ⅙, etc; floor(x) is the greatest integer less than or equal to x; and QP is the quantization index parameter, 0<=QP<52.
This type of non-linear quantizer has a region around L=0 where small fractions are mapped to zero. This region is known as a Dead Zone. The size of the Dead Zone can be controlled by the parameter f in the above equation for L. Hereinafter, f is defined as the Dead Zone offset. The value of f should usually be non-negative. A value of f=0.5 corresponds to the conventional rounding operation with no deadzone, while the smaller the f, the more K will be quantized to 0, therefore the wider the deadzone.
There are many methods of performing rounding in the dead zone area. The purpose of rounding is to reduce the amount of data that represents the quantized value, for example, to round an 11 bit quantized result up or down to an integer to an 8 bit integer. For example, if the offset is set to round values 0.5 and over rounded up to 1, and values less than 0.5 rounded down to 0. Thus, a value of 0.6 could be rounded up to a value of 1, and 0.2 could be rounded down to a 0. This reduces the amount of data that needs to be transmitted, displayed or otherwise processed. The question then becomes, at which level is the dead zone offset chosen? Thus, the dead zone offset determines which values are quantized to a value of zero, and which are quantized to non-zero value.
The choice of which threshold point to pick is a design question. For example, instead of 0.5, 0.3 could be the point of the threshold. With a lower threshold, more fractional portions of the transformed blocks would be rounded from zero to a non-zero integer. Thus, 0.5 would be rounded to 1 and 0.2 would be rounded to 0. In this case, statistically, more fractional components would likely be rounded up to an integer. In particular, many pixels located around the dead zone would be rounded to +/−1. Also, more detail would be represented, and thus more data bits representing the block would result from more numbers being rounded up. Therefore, depending on which point is chosen as the threshold point, more or less detail would be represented, and more or less bits would be required to represent the respective detail.
In practice, rigidly choosing a threshold leaves the designer with the difficult task of balancing the quality of the resulting video picture with the number of digital bit representations that are limited by the bit rate of a system. If too many bits are generated for a system to process, then the system must be designed with a higher threshold that will reduce the number of bits. If higher quality is desired, then a lower threshold is used, and more data bits result.
Furthermore, by rigidly choosing a threshold value, blurry or flat blocks of video occur in areas of high texture or high motion. For example, if a sweater having particular weave pattern were recorded in a video process that chooses a strict threshold, a low threshold will show details of the pattern, but at a high bit rate. Conversely, a high threshold would result in a less detailed video representation, and the weave pattern may not even be apparent to a viewer. If the same pattern or texture is in motion from one video slice to the next, a similar result occurs. A higher threshold gives bland or blurry video representations at a given system bit rate limit. And, a lower threshold gives a more detailed representation of the image, but produces more digital bit representations that may not be within the bit rate of the system. The result: blurry and/or inconsistent blocks of video within video slices in areas of high texture and/or motion.
Intra 16×16 DC Transform and Quantization
For macroblocks coded as Intra 16×16, in addition to the integer transform described above, there is a further transformation applied to the DC coefficients of all the 4×4 transformed blocks in the macroblock. The DC value of each of the 16 transformed 4×4 matrices is extracted as a 4×4 matrix KDC and transformed as follows.
                                          Y            DC                    =                                    (                                                HK                  DC                                ⁢                                  H                  T                                            )                        /            2                          ⁢                                  ⁢                  Where          ,                                    (        1.12        )                                H        =                  (                                                    1                                            1                                            1                                            1                                                                    1                                            1                                                              -                  1                                                                              -                  1                                                                                    1                                                              -                  1                                                                              -                  1                                                            1                                                                    1                                                              -                  1                                                            1                                                              -                  1                                                              )                                    (        1.13        )            
Where, H, the Hadamard transform above, is used instead of the integer transform to simplify operations on the DC coefficients. The coefficients YDC are quantized in a similar manner as the integer transform coefficients as follows.
                                                                    L              DCij                                            =                                    (                                                (                                                                                                          K                        DCij                                                                                    ·                                                                  (                                                                              SF                            DC                                                    ·                                                      2                            qbits                                                                          )                                                                    Q                        step                                                                              )                                +                                  (                                                            f                      DC                                        ·                                          2                      qbits                                                        )                                            )                        /                          2              qbits                                      ⁢                                  ⁢                              sign            ⁡                          (                              L                DCij                            )                                =                      sign            ⁡                          (                              K                DCij                            )                                                          (        1.14        )            
Where SFDC takes the value a2 as defined as above, qbits is defined above, and fDC is defined as the Dead Zone offset.
Chroma DC Transform and Quantization
Similarly to Luma sample (residual) processing, all of the chroma samples (residuals) are processed by the 4×4 integer transform. The DC coefficient of each 4×4 block is extracted to form a 2×2 matrix KcDC and further transformed as follows.
                                          UV            DC                    =                      c            ·                          K              cDC                        ·                          c              T                                      ⁢                                  ⁢        Where                            (        1.15        )                                c        =                  (                                                    1                                            1                                                                    1                                                              -                  1                                                              )                                    (        1.16        )            
The quantization of the Chroma DC coefficients is described below
                                                                    L                              cDC                ij                                                          =                                    (                                                (                                                                                                          UV                                                  cDC                          ij                                                                                                            ·                                                                  (                                                                              a                            2                                                    ·                                                      2                            qbits                                                                          )                                                                    Q                        cstep                                                                              )                                +                                  (                                                            f                      cDC                                        ·                                          2                                              qbits                        +                        1                                                                              )                                            )                        /                          2                              qbits                +                1                                                    ⁢                                  ⁢                              sign            ⁡                          (                              L                                  cDC                  ij                                            )                                =                      sign            ⁡                          (                              UV                                  cDC                  ij                                            )                                                          (        1.17        )            
The H.264 coding standard provides general methods and tools to encode image sequences. At low bit rates, rate-distortion driven decisions in the encoder select coding modes that are less expensive to code rather than modes that provide less distortion. In this situation, larger 16×16 modes are more likely to be selected since they can be coded with less bits than the smaller 8×8 and 4×4 modes.
In some cases, an Intra 16×16 mode will produce fewer bits than an Inter 16×16 mode in temporally predicted slices. The Intra 16×16 mode typically selected in this situation has very low frequency information and often has only DC information. Although this implementation choice is efficient in conventional processes, it can produce macroblocks with very low detail and that appear blurred.
When these blurred macroblocks appear in areas of the picture with high detail, especially in a picture with motion detail, the perceived quality is inconsistent, since neighboring macroblocks may be coded in other more efficient modes. Or, if neighboring blocks are coded similarly, they can produce large areas full of blurred macroblocks. One approach to solving this problem is to add noise to the rounding process know as “film grain”. This produces grainy noise patterns to fill in flat or blurry images with grainy textures that may be less apparent to a viewer than what would otherwise be apparent. This approach still does not adequately solve the problem of blurred images or images that are otherwise not representative of the recorded images.
Therefore, there exists a need for an improved method and system for encoding video pictures that overcomes the shortcomings of the prior art. As will be seen, the invention provides such a method and system in an elegant manner.