Currently, there are multiple video coding technologies, for example, video coding standards such as H.264/advanced video coding (AVC), H.265/high efficiency video coding (HEVC), and Audio Video Coding Standard (AVS). The foregoing video coding standards generally use a hybrid coding framework, and the hybrid coding framework mainly includes the phases, prediction, transform, quantization, entropy coding, and the like.
In the prediction phase, a reconstructed pixel of a coded area is used to generate a predicted pixel for an original pixel corresponding to a current code block. Two main types of prediction manners are included, intra-prediction and inter-prediction. For the intra-prediction, a reconstructed pixel of a spatial neighborhood of the current code block in a current code image is used to generate the predicted pixel of the current code block, for example, horizontal, vertical or other multi-directional prediction in the H.264/AVC, and a prediction manner based on template matching and intra-motion compensation. For the inter-prediction, a reconstructed pixel, in one or more coded images, corresponding to the current code block is used as the predicted pixel of the current code block, for example, prediction based on motion compensation. The inter-prediction includes two forms, unidirectional prediction and bidirectional prediction. For the unidirectional prediction, a reconstructed pixel in one coded image is used to generate the predicted pixel of the current code block, and for the bidirectional prediction, reconstructed pixels in two coded images are used to generate the predicted pixel of the current code block.
A pixel value difference between an original pixel and a predicted pixel is referred to as a residual. To improve coding efficiency of the residual, generally, the residual is first transformed and converted into a transform coefficient. Common transform includes discrete cosine transform (DCT), discrete sine transform (DST), wavelet transform, and the like. Afterwards, quantization processing is performed on the transform coefficient, for example, by means of vector quantization or scalar quantization. Then, a quantized transform coefficient and encoding mode information (for example, a code block size, a prediction mode, and a motion vector) are converted into a bitstream by means of entropy coding processing. A common entropy coding method includes arithmetic coding, variable length coding (VLC), fixed length coding, run-length coding, and the like.
The transform coefficient may be quantized in a scalar quantization manner. If an ith transform coefficient in N transform coefficients of a residual of the current code block is denoted as C(i) (1≤i≤N, N is associated with a transform block size and is usually 16, 64, 1024, or the like), the quantized transform coefficient Q(i) is:
            Q      ⁡              (        i        )              =          sign      ⁢                          ⁢                        {                      C            ⁡                          (              i              )                                }                ·        round            ⁢                          ⁢              {                                                                          C                ⁡                                  (                  i                  )                                                                                  Qs              ⁡                              (                i                )                                              +                      o            ⁢                                                  ⁢            1            ⁢                          (              i              )                                      }              ,where sign{X} represents a symbol of X, that is,
      sign    ⁢                  ⁢          {      X      }        =      {                                                      1              ,                                                          X              ≥              0                                                                                          -                1                            ,                                                          X              <              0                                          ,              round        ⁢                                  ⁢                  {          X          }                    is a rounding operation and may be generally one of rounding down, rounding off, or rounding up, |X| represents an absolute value or an amplitude of X, Qs(i) represents a quantization step corresponding to the transform coefficient C(i), and o1(i) is a rounding offset.
Video decoding is a process of converting a bitstream into a video image, and includes several main phases such as entropy decoding, prediction, dequantization, and inverse transform. First, the bitstream is parsed by means of entropy decoding processing to obtain encoding mode information and a quantized transform coefficient. Then, on one hand, a predicted pixel is obtained using the encoding mode information and a decoded reconstructed pixel. On the other hand, dequantization is performed on the quantized transform coefficient to obtain a reconstructed transform coefficient, and inverse transform is performed on the reconstructed transform coefficient to obtain reconstructed residual information. Afterwards, the reconstructed residual information is added to the predicted pixel to obtain a reconstructed pixel in order to restore the video image.
Dequantization is performed on the quantized transform coefficient Q(i) to obtain the reconstructed transform coefficient R(i), which may be described as:R(i)=sign{Q(i)}·round{Q(i)·Qs(i)+o2(i)}  (Formula 1),where Qs(i) may be a floating-point number, and o2(i) is a rounding offset. Generally, to avoid using a floating-point operation, floating-point multiplication is approximately replaced with a manner of integer addition and shifting. For example, in the H.265/HEVC, the dequantization process described in Formula 1 is approximated by:R(i)=sign{Q(i)}·(Q(i)·Qs′(i)+(1<<(bdshift−1)))>>bdshift  (Formula 2),where bdshift is a shifting parameter, Qs′(i) is an integer, and Qs′(i)/2bdshift approximates to the quantization step Qs(i) in (1). In this case, o2(i)=0.5, and a rounding manner is rounding down. Qs′(i) is jointly determined by a level scale l(i) and a scaling factor m(i):Qs′(i)=m(i)·l(i)  (Formula 3),where l(i) is a function of a quantization parameter (QP), that is,l(i)=levelScale[QP(i)%6]<<└QP(i)/6┘  (Formula 4),where a level scaling list levelScale[k]={40,45,51,57,64,72}, where k=0, 1, . . . , 5, and └QP(i)/6┘ represents rounding after QP(i) is divided by 6, and % is a REM operation.
Generally, the dequantization is directly associated with the quantization step, and the quantization step is affected by the QP, the scaling factor, and the level scaling list. The quantization step may be adjusted in multiple manners. For example, each level of QP corresponds to a quantization step when the scaling factor and the level scaling list are fixed. The H.264/AVC and the H.265/HEVC stipulate 52 levels of QPs. Therefore, the quantization step may be adjusted by changing the QP. For another example, the quantization step may be changed by adjusting the scaling factor. Typically, one of multiple scaling factor matrices, also referred to as quantization matrices, may be selected to determine the scaling factor. Although different data is changed in the foregoing two examples, the quantization step is adjusted in essence.
For lossy encoding, a reconstructed pixel and an original pixel may be different, and a value difference between the two is referred to as distortion. Due to multiple visual masking effects such as a luminance masking effect and a contrast masking effect, distortion intensity observed by human eyes is closely associated with a feature of a background in which the distortion is located. That is, sensitivity of the human eyes to distortion is associated with background luminance and background contrast of a location of the distortion. Generally, the distortion sensitivity and the background luminance present a U-shaped curvilinear relationship, and the distortion sensitivity and a variance or a standard deviation of the background present a monotonically decreasing relationship. In the video coding, with reference to the foregoing visual features, the quantization step is increased in an area of visual insensitivity to distortion (that is, an area of relatively small distortion sensitivity), and the quantization step is reduced in an area of visual sensitivity to distortion. Therefore, in comparison with a uniform quantization step, coding distortion allocation can be more suitable for visual perception of the human eyes, and subjective quality is improved at a same bit rate, that is, coding efficiency is improved.
A method for adjusting a quantization step is as follows.
At an encoder, a video sequence is analyzed, a QP corresponding to a transformation coefficient of each code block is determined, and the QP or an offset (delta QP) of the QP relative to a slice QP is written into a bitstream. At a decoder, a quantization step of each code block is accordingly adjusted according to the QP obtained by means of parsing.
In the foregoing existing technical solutions, the QP is decided at the encoder and QP information is transmitted in the bitstream such that the decoder learns of a quantization step adjustment value of each code block, to implement adaptive quantization step adjustment. However, side information corresponding to the QP limits coding efficiency improvement to some extent.