Existing video compression systems based on international standards such as ISO MPEG-1, MPEG-2, MPEG-4 Part 2, MPEG-4 Part 10 AVC/ITU-T H.264, H.261, H.263, and VC-1 rely, among others, on intra and inter coding in order to achieve compression. In intra coding, spatial prediction methods are used. In inter coding, compression is achieved by exploiting the temporal correlation that may exist between pictures. Previously coded pictures are used as prediction references for future pictures and motion and/or illumination change estimation and compensation is used to predict one picture from the other. The prediction residuals are then transformed (e.g. discrete cosine transform, or wavelets, or some other decorrelating transform), quantized, and entropy coded given a certain bit rate constraint. See the following references for additional information regarding video compression for the standards cited above: (1) MPEG-1—ISO/IEC JTC 1, “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s—Part 2: Video,” ISO/IEC 11172 (MPEG-1), November 1993; (2) MPEG-2—ITU-T and ISO/IEC JTC 1, “Generic coding of moving pictures and associated audio information—Part 2: Video,” ITU-T Rec. H.262 and ISO/IEC 13818-2 (MPEG-2), November 1994; (3) MPEG-4 Part 2—ISO/IEC JTC 1, “Coding of audio-visual objects—Part 2: Visual,” ISO/IEC 14496-2 (MPEG-4 Part 2), January 1999; (4) MPEG-4 Part 10 AVC/ITU H.264-JVT reference software version JM13.2, and Advanced video coding for generic audiovisual services, (5) H.261—ITU-T, “Video codec for audiovisual services at px64 kbits/s,” ITU-T Rec. H.261, v2: March 1993; (6) H.263—ITU-T, “Video coding for low bit rate communication,” ITU-T Rec. H.263, v2: January 1998; and (7) VC-1—SMPTE 421M, “VC-1 Compressed Video Bitstream Format and Decoding Process”, April 2006.
A generic intra predictor for a block or region is depicted in FIG. 1. An arbitrary coding order is depicted. The current block 101 is predicted from reconstructed pixels that belong to blocks 102 that have already been coded. Let p(i,j,t) denote a pixel with coordinates (i,j) in frame t in the current block b, hence (i, j)εb. Let {circumflex over (p)}(i,j,t) denote the prediction value for the previously mentioned pixel. Let B denote the set of coordinates of all pixels that have already been coded. The prediction value {circumflex over (p)}(i,j,t) can be given as shown in Eq. 1 below:
                                          p            ^                    ⁡                      (                          i              ,              j              ,              t                        )                          =                              ∑                                          (                                  m                  ,                  n                                )                            ∈              B                                ⁢                      (                                          w                ⁡                                  (                                      i                    ,                    j                    ,                    m                    ,                    n                                    )                                            ×                              p                ⁡                                  (                                      m                    ,                    n                    ,                    t                                    )                                                      )                                              Eq        .                                  ⁢        1            A generalized linear predictor as shown above in Eq. 1 is thus a weighted sum of potentially all pixels that have already been decoded where w(i,j,m,n) represents the weights to be applied. In practice, decoded pixels that are in the proximity of the pixels being predicted will be more correlated and will dominate the prediction by assuming larger weights than other pixels. In H.264/AVC, only blocks in the immediate neighborhood of the current block contribute to its intra prediction.
Prediction samples are drawn from already decoded pixels; hence the coding architecture assumes a closed loop. A closed-loop architecture ensures exact replication of the operation at the decoder negating drift that would be the result of predicting the pixel blocks from the original uncompressed pixel values. The latter is also known as an open-loop prediction architecture. Closed-loop prediction may be illustrated as in FIG. 2. As depicted in FIG. 2, coding 121 is performed on a prediction residual to provide a coded residual. The coded residual is buffered 123 and then used to generate a prediction 125. The prediction 125 and the input pixel are then combined to provide the prediction residual. An input pixel is predicted as a function of coded pixels whose reconstructed values are combined to form the prediction. Past errors can accumulate and propagate to future pixels. Note that both open-loop and closed-loop architectures exist for temporal prediction. Further note that even though the term “block,” as used herein, may refer to the pixel area that is inter- or intra-predicted, the pixel areas may be arbitrary image regions that are not necessarily shaped as a rectangle and can be of arbitrary size.
Inter prediction differs from intra prediction in that the prediction samples are drawn from previously coded pictures. Inter prediction is more beneficial compared to intra prediction when temporal correlation is greater than spatial correlation. A generic expression for linear inter prediction is shown below in Eq. 2. Let T denote the set of decoded pictures that are available for inter prediction of picture t. Let B(t) denote the set of coordinates of pixels in picture t. The prediction value {circumflex over (p)}(i,j,t) for the pixel in coordinates (i,j) of picture t will be given as shown in Eq. 2:
                                          p            ^                    ⁡                      (                          i              ,              j              ,              t                        )                          =                              ∑                          τ              ∈              T                                ⁢                                          ⁢                                    ∑                                                (                                      m                    ,                    n                                    )                                ∈                                  B                  ⁡                                      (                    τ                    )                                                                        ⁢                                                  ⁢                          (                                                w                  ⁡                                      (                                          i                      ,                      j                      ,                      m                      ,                      n                      ,                      t                      ,                      τ                                        )                                                  ×                                  p                  ⁡                                      (                                          m                      ,                      n                      ,                      τ                                        )                                                              )                                                          Eq        .                                  ⁢        2            The linear combination weights, w(i, j, m, n, t, τ), are dependent on the temporal index of each picture. Both Eq. 1 and Eq. 2 share a lot of similarities being the generic expressions for intra and inter prediction. However, this is because the above expressions are unconstrained, rendering the problem of finding the optimal weights extremely complex.
In practical video codec designs, such as H.264/AVC, intra and inter prediction are constrained and defined in detail. A variety of intra prediction modes have been introduced which can also considerably improve coding performance compared to older standards. H.264/AVC employs intra prediction in the pixel domain that may be applied on 4×4, 8×8, or 16×16 pixel luma and chroma blocks. For the luma component, there are in total 9 prediction modes for 4×4 blocks (see, for example, FIG. 4 and FIG. 5A), 9 modes for 8×8 blocks, and 4 modes for 16×16 blocks (see, for example, FIG. 6). The nine prediction mode types for 4×4 and 8×8 intra prediction are Vertical, Horizontal, DC, Diagonal Down Left, Diagonal Down Right, Vertical Right, Horizontal Down, Vertical Left, and Horizontal Up as depicted in FIG. 4. Apart from the DC prediction mode, that predicts the block as the average value of the neighboring causal predictor pixels, all other prediction modes assume some type of directionality in the source signal. However, not all prediction modes are always available, i.e., if the top block belongs to a different slice or if this is the first row in the image, then the vertical prediction modes cannot be used. For 16×16 intra prediction the four possible modes are Vertical, Horizontal, DC, and Plane prediction (see FIG. 6), which predicts the entire macroblock by interpolating processed values from the top and the left of the macroblock. For the baseline, main, and high profiles of H.264/AVC, chroma sample intra prediction uses modes different to the ones for luma. Both chroma components use the same chroma intra prediction mode, which is applied on the entire chroma block size, i.e. for 4:2:0 it is 8×8. The four prediction modes for chroma samples are similar in concept to the 16×16 luma modes but differ in implementation. For the High 4:4:4 profile, however, the chroma samples adopt the luma intra prediction modes. Note that intraprediction type (16×16, 8×8, or 4×4) or direction may be used as a signaling mechanism for metadata representation.
In H.264/AVC, inter prediction is constrained to predict pixels pn(i,j,t) of a block b in picture t as {circumflex over (p)}(i,j,t) as shown in Eq. 3 below:
                                          p            ^                    ⁡                      (                          i              ,              j              ,              t                        )                          =                  β          +                                    ∑                              k                ∈                                  {                                      ref_LIST0                    ,                    ref_LIST1                                    }                                                      ⁢                                                  ⁢                          (                                                α                  k                                ×                                  p                  ⁡                                      (                                                                  i                        +                                                                              v                                                          b                              ,                              x                                                                                ⁡                                                      (                                                                                          t                                -                                k                                                            ,                              t                                                        )                                                                                              ,                                              j                        +                                                                              v                                                          b                              ,                              y                                                                                ⁡                                                      (                                                                                          t                                -                                k                                                            ,                              t                                                        )                                                                                              ,                                              t                        -                        k                                                              )                                                              )                                                          Eq        .                                  ⁢        3            The summation involves two prediction hypotheses: one from reference picture ref_LIST—0 and another from reference picture ref_LIST—1. A reference picture list in H.264/AVC contains previously coded pictures. P-coded slices (P_SLICE) have access to picture list LIST—0, while B-coded slices (B_SLICE) have access to both picture lists LIST—0 and LIST—1 that enable bi-prediction (a weighted average of two prediction hypotheses, each one originating from each list). Term β is a weighted prediction offset, and terms αk are weighted prediction gains that depend on the slice and the prediction reference index. Note that k may take smaller or larger values than t. The location of the samples for each prediction hypothesis is derived through a motion vector that is uniform for pixels belonging to the same block. The motion vector vb is written as shown in Eq. 4 below:vb(t−k,t)=[vb,x(t−k,t)vb,y(t−k,t)]  Eq. 4The motion vector yields the translational parameters required to predict a pixel in block b of picture t from pixels of picture t−k.
For inter prediction in H.264/AVC, one can consider for motion compensation a quad-tree block coding structure to predict a 16×16 Macroblock (MB) as is also depicted in FIG. 3. The entire MB could be predicted as a single 16×16 partition with a single motion vector, or it could be partitioned and predicted using smaller partitions down to a 4×4 block size. For each partition a different motion vector could be transmitted, whereas for the bi-predictive case, one may also transmit two sets of motion vectors per block, one for each prediction reference list (lists 0 and 1). Furthermore, the standard allows for the consideration of up to 16 references for motion compensated prediction which could be assigned down to an 8×8 block size. Motion compensation could also be performed down to quarter pixel accuracy while also one could consider weighted prediction methods to improve performance, especially during illumination change transitions. Hence there are in general single-list inter prediction modes for 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 blocks, most of which have counterparts for two-list prediction (bi-prediction), which results to multi-hypothesis motion compensated prediction. Additional inter prediction modes also include the SKIP and DIRECT modes, that derive prediction block location information from spatial or temporal neighbors.
The multitude of the above described prediction modes presents the encoder designer with a difficult problem. The goal is to select the suitable mode such that the overall picture or picture sequence distortion is minimized. Coding mode decision in modern video codecs, such as the JM software, is done through rate-distortion (R-D) optimization. See, for example, A. Ortega and K. Ramchandran, “Rate-distortion methods for image and video compression,” IEEE Signal Processing Magazine, pp. 23-50, November 1998 and G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression”, IEEE Signal Processing Magazine, vol. 15, no. 6, November 1998, pp. 74-90. Lagrangian minimization selects the coding strategy that minimizes a cost that is a weighted sum of the distortion D (e.g. Sum of Squared Errors) and the bit usage R as shown in Eq. 5 below:J=D+λR  Eq. 5Term λ is the Lagrangian multiplier. This process can be performed on a picture, region, or block basis. In one case, one can encode a block with all possible intra and inter prediction modes and select the one with the minimum cost. One possible, and widely used, distortion metric, the sum of squared errors of the pixels of the predicted block, can be written as shown in Eq. 6 below:
                    SSE        =                              ∑                                          (                                  j                  ,                  i                                )                            ∈              b                                ⁢                                          ⁢                                    (                                                p                  ⁡                                      (                                          j                      ,                      i                      ,                      t                                        )                                                  -                                                      p                    ^                                    ⁡                                      (                                          j                      ,                      i                      ,                      t                                        )                                                              )                        2                                              Eq        .                                  ⁢        6            Term b denotes the current block; hence pixels considered in this distortion metric are only those of the current block. However, this can create substantial perceptual problems.
The prediction of a block with either a spatial/intra or temporal/inter mode is followed by transform and quantization of the prediction residual. For a codec based on the H.264/AVC video coding standard (see, for example, Advanced video coding for generic audiovisual services, T-REC-H.264, March 2003), the process of transform and quantization for the most basic case of a 4×4 pixel block is now described. Let X denote the original 4×4 pixel block that is to be transformed. The transformed 4×4 pixel block Y is written as Y=HXHT where HT denotes the transpose of H. Next, the transformed block Y is quantized. Quantization involves multiplication of Y by a matrix, followed by subsequent bit-wise shift of the multiplied coefficients by some factor. A forward 4×4 quantization matrix A(Q) and a shifting factor E(Q) are indexed by the quantization parameter (QP) value Q. In general, the lower the QP value, the finer the quantization, and hence the lower the induced distortion. Let {circle around (x)} denote the operation between two matrices U and V such that for Z=U{circle around (x)}V, there is z(i,j)=u(i,j)×v(i,j), where small-letter terms denote elements of their respective matrices. The quantized coefficients {tilde over (Y)} are obtained by the following procedure, as implemented in the Joint Model (JM) reference software (see, for example, JVT reference software version JM13.2) and shown in Eq. 6 below:{tilde over (Y)}=sign{Y}[(|Y|{circle around (x)}A(Q)+F(Q)×215+E(Q))>>(15+E(Q))]  Eq. 7where F(Q) is a quantization rounding offsets matrix. The bit-wise right shift operator “>>” is applied to every element of the 4×4 matrix. The resulting matrix {tilde over (Y)} is then compressed using entropy coding and transmitted to the decoder, which has to apply the following process to obtain the reconstructed values {tilde over (Y)} of the block's transform coefficients as shown in Eq. 8 below:Ŷ={[({tilde over (Y)}{circle around (x)}(B(Q)<<4))<<E(Q)]+8}>>4  Eq. 8The inverse 4×4 quantization matrix B(Q) and the shifting factor E(Q) are both indexed by the QP Q. The final reconstructed residual block {circumflex over (X)} is given as shown in Eq. 9 below:{circumflex over (X)}=(HinvŶHinvT+32)>>6  Eq. 9
If the value of the QP Q used was low enough, then the reconstructed block X will be close to X. The reconstructed values depend also on the quantization offsets F(Q), which may bias towards reconstruction to a lower or greater value. Note, however, the prominent use of rounding. Rounding may introduce distortion since it involves sacrificing part of the signal fidelity. Rounding distortion propagation is one of the factors that may result to an increase in distortion even though Q is decreased. Rounding distortion also accumulates as is evident from Eqs. 7, 8, and 9. Furthermore, rounding in H.264/AVC (see, for example, “Advanced video coding for generic audiovisual services” cited above) is biased in favor of positive values. This is illustrated in FIG. 7 and is an additional source of distortion that is also asymmetric in nature; i.e., the distortion resulting from coding −x is not necessarily equal to the distortion resulting from coding +x. To have a better understanding of quantization in a state-of-the-art video coding standard, such as H.264/AVC, it can be simplified as shown in Eq. 10 below:
                              V          ⁡                      (                          v              ,              Q                        )                          =                  ⌊                                                                                          q                    n                                    ⁡                                      (                    Q                    )                                                  ×                v                            +                              f                ⁡                                  (                  Q                  )                                                                                    q                d                            ⁡                              (                Q                )                                              ⌋                                    Eq        .                                  ⁢        10            Let v denote the value to be quantized and let V(v,Q) denote the quantized value for input value v and QP Q. The numerator qn(Q) and the denominator qd(Q) are primarily functions of the quantization parameter Q. The quantization offset f(Q) may also depend on the QP Q, among others. The reconstruction of v to {circumflex over (v)} is given by the following inverse quantization process shown in Eq. 11 below:{circumflex over (v)}=(V(v,Q)×qn−1(Q)×qd−1(Q)+2m)>>(m+1)  Eq. 11The term m is larger than 1. Do note that the following expressions hold: qn−1(Q)≠(qn(Q))−1 and qd−1(Q)≠(qd(Q))−1.
During quantization an important problem arises: given certain reconstruction values that are indexed by the transmitted quantization indices, what is the best index to transmit given the value of the current coefficient? See, for example, the problem depicted in FIG. 16. Given a set of reconstruction points Q(i) that are derived at the decoder by transmitting the index i, one may seek to find the index i that minimizes some distortion metric. If the optimization metric is the absolute difference or the squared difference for this coefficient, then, obviously, the optimal reconstruction index is index k+4. However, in a two-dimensional signal, such as a video signal, the reconstruction level of a coefficient/pixel affects the neighboring ones. Furthermore, the effect is more complex perceptually. There will be, for example, cases, where transmitting index k+3 will make more sense, even if it sounds counter-intuitive. There could also be situations where transmitting index k+2 is the optimal thing to do. A special case of this problem is whether to transmit the lower value (k+3) or the upper value (k+4).
Rate-distortion-optimized mode decision along with a naïve quantization strategy at the encoder can contribute to artifacts such as those depicted in FIGS. 8A and 8B. Assume that something as simple as a uniform frame where the luminance Y value is 111 and the U and V components are both set to 128 is to be compressed. The JM reference software is used and the QP is set to the value 32. The input and the reconstructed frames are depicted in FIG. 8A and FIG. 8B, respectively (contrast is stretched for printing purposes; distortion is quite perceivable on a computer display).
The top left block in FIG. 8B is coded with the DC intra prediction mode, and its prediction value is 128. The original value of 111 results to a residual of −17, which after coding, reconstructs the block as 112. The encoder then codes the second block of the top row, whose original value is 111. Horizontal or DC prediction from the top left block yields a prediction value of 112, resulting to a residual of −1. After the residual is quantized, the block is reconstructed to a value of 110. The description may be similarly extended to the first two blocks of the second row. Consequently, reconstruction values of 112 and 110 alternate in a checkerboard pattern. In terms of SSE, all blocks have an SSE of 1×16×16, so the block-wise distortion is the same. Visually, however, the result is disappointing. The reasons for this failure are multiple:                (a) Certain block average (DC) values and QP combinations fail to exactly reconstruct the DC value.        (b) The residual is always transmitted in full without any regard for its consequences.        (c) Block-based mode decision in the JM software and most commercial implementations of H.264/AVC optimizes block R-D performance without consideration of the perceptual impact and the impact to the neighboring blocks.While factors (b) and (c) are within the encoder's control and can be addressed, factor (a) largely depends on the transform and quantization design in H.264/AVC. Note that the mode decision could have completely disregarded the residual and still have resulted to the same distortion. Since this would reduce the number of residual bits needed for transmission, it would have resulted in better R-D performance. The example discussed above shows a case of serious perceptual distortion in the spatial domain that is a result of naive application of mode decision and quantization. The discussion below illustrates the same problem in the temporal domain.        
The distortion problem can also manifest in the temporal domain as depicted in FIG. 9. Let the input sequence be composed of pictures that are flat and have the luma Y value 111, so, as depicted in FIG. 9, each input picture has the same uniform luma value and appears to have the same shading. Now assume that there is access to some new advanced mode decision algorithm that can encode a flat picture by avoiding the problems of FIG. 8. Let the quantization parameter value be 32. Picture 0 is reconstructed with a uniform value of 110. The encoder then proceeds to code picture 1. It now has access to both intra and inter prediction modes. For some reason, the mode decision algorithm selects to code the entire picture with inter 16×16 prediction from the previous picture using all-zero motion vectors. Thus, each block in picture 1 that has the original value 111 is predicted as 110. The prediction residual is 1 and is coded and reconstructed as 2, leading to a reconstructed value for the entire frame of 112. The rest of the pictures are coded in a similar manner leading to the alternating luma values as depicted in the output pictures of FIG. 9. This simple example shows that mode decision and quantization can create flickering in the temporal domain even if it is addressed in the spatial domain.
Work has been done in the past towards addressing temporal flickering. In a paper by A. Leontaris, Y. Tonomura, and T. Nakachi, “Rate control for flicker artifact suppression in motion JPEG2000,” in Proc. IEEE International Conference on Acoustics, Speech, and Signa Processing, May 2006, vol. 2, pp. 41-44, the temporal flicker artifact was categorized into: (a) flickering due to small temporal variations in the input signal luma component that give rise to large differences in the de-quantized coefficients (see T. Kuge, “Wavelet picture coding and its several problems of the application to the interlace HDTV and the ultra-high definition images,” in Proc. IEEE International Conference on Image Processing, June 2002, vol. 3, pp. 217-220), and (b) flicker artifact due to uneven quantization among similar collocated blocks across subsequent frames. Type (a) artifact is content-dependent and is perceived as ripples mainly across edges. Type (b) artifact is perceivable in static areas (often the background) across consecutive frames; it is also augmented when intra-frame coders (JPEG2000 or H.264/AVC constrained to use only intra-coded pictures) are used. It was first suggested in T. Carney, Y. C. Chang, S. A. Klein, and D. G. Messerschmitt, “Effects of dynamic quantization noise on video quality,” in Proc. SPIE Human Vision and Electronic Imaging IV, May 1999, pp. 141-151 and T. Kato, S. Tanabe, H. Watanabe, and H. Tominaga, “On a relationship between flicker artifacts and quantization error for Motion JPEG2000,” in Proc. FIT2003, J-039, September 2003 that temporal flickering is a result of temporal changes in the quantization. In particular, for the case of JPEG2000, it is a result of the post-compression quantization. In A. Becker, W. Chan, and D. Poulouin, “Flicker reduction in intraframe codecs,” in Proc. IEEE Data Compression Conference, March 2004, pp. 252-261, it was noted that temporal flickering can result if temporally collocated JPEG2000 code-blocks, corresponding to almost identical content in subsequent frames, are inconsistently quantized. Past work that has identified inconsistent or poorly done quantization across subsequent frames as a source of the temporal flickering artifact.
The approach presented in X. Fan, W. Gao, Y. Lu, and D. Zhao, “Flicking reduction in all intra frame coding,” Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, JVT-E070, October 2002 addresses flickering due to quantization in H.264/AVC-based codecs when only intra-coding is used. It also proposed a simple but intuitive metric for measuring the flickering artifact. The discussion below uses the notation introduced so far. Tilde denotes compressed and reconstructed values. The following simple notation is defined for a sum of squared differences over all pixels (j,i) of a block b in a frame F as shown in Eq. 12 below
                              SSD          ⁡                      (                          α              ,              β              ,              b                        )                          =                              ∑                                          (                                  j                  ,                  i                                )                            ∈              b                                ⁢                                          ⁢                                    (                                                α                  ⁡                                      (                                          j                      ,                      i                                        )                                                  -                                  β                  ⁡                                      (                                          j                      ,                      i                                        )                                                              )                        2                                              Eq        .                                  ⁢        12            The metric was defined as shown in Eq. 13 below:
                              f          -          norm                =                              ∑                                          (                                  t                  ,                  b                                )                            ,                                                SSD                  ⁡                                      (                                                                  p                        ⁡                                                  (                          t                          )                                                                    ,                                              p                        ⁡                                                  (                                                      t                            +                            1                                                    )                                                                    ,                      b                                        )                                                  <                ɛ                                              ⁢                                          ⁢                      SSD            ⁡                          (                                                                                          p                      ~                                        ⁡                                          (                                              t                        +                        1                                            )                                                        -                                                            p                      ~                                        ⁡                                          (                      t                      )                                                                      ,                                                      p                    ⁡                                          (                                              t                        +                        1                                            )                                                        -                                      p                    ⁡                                          (                      t                      )                                                                                  )                                                          Eq        .                                  ⁢        13            It is noted that only blocks, for which a constraint on the maximum sum of squared differences of the block pixel values in the current frame and the pixel values in the next frame is satisfied, are considered in the metric calculation. The main metric calculates the difference between the temporal squared differences of the original and reconstructed pixel values. It was shown in A. Leontaris, Y. Tonomura, T. Nakachi, and P. C. Cosman, “Flicker Suppression in JPEG2000 using Segmentation-based Adjustment of Block Truncation Lengths,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hi., Apr. 15-20, 2007 that the above metric performs well. The work in X. Fan, et al.,“Flicking reduction in all intra frame coding” (cited above), showed using the above metric that temporal flicker increases as a result of the spatial intra prediction in H.264/AVC. The proposed solution in X. Fan, et al., “Flicking reduction in all intra frame coding,” uses a quantized version of the original prediction to perform spatial intra prediction.
A method proposed in S. Sakaida, K. Iguchi, S. Gohshi, and Y. Fujita, “Adaptive quantization control for reducing flicker of AVC/H.264 intra frames,” in Proc. Picture Coding Symposium, December 2004 optimized coding mode decisions by minimizing a Lagrangian cost that took into account the flickering distortion DF. The cost can be written as shown in Eq. 14:J=D+μ×DF+λ×R  Eq. 14The flickering distortion was calculated similar to X. Fan, et al., “Flicking reduction in all intra frame coding”. Apart from optimizing the selection of the coding mode, S. Sakaida, K. Iguchi, S. Gohshi, and Y. Fujita in “Adaptive quantization control for reducing flicker of AVC/H.264 intra frames” also proposed a selection scheme for the quantization parameter. After the reconstruction of the current macroblock, the sum of absolute differences between the previous intra block and the current reconstructed intra block is calculated. If it is less than a threshold value, the coding of this macroblock is complete. Otherwise, the quantization parameter is decreased (the quantization becomes finer) and the above process is repeated.
An approach that uses post-processing to blend decoded frames with motion-compensated versions of previously decoded frames is discussed in Y. Kuszpet, D. Kletsel, Y. Moshe, and A. Levy, “Post-Processing for Flicker Reduction in H.264/AVC,” Proc. of the 16th Picture Coding Symposium (PCS 2007), Lisbon, Portugal, November 2007. This approach is mainly targeted at reducing flicker due to periodic intra-coded frames. Another approach with the same objective but a different methodology is presented in K. Chono, Y. Senda, and Y. Miyamoto, “Detented Quantization to Suppress Flicker Artifacts in Periodically Inserted Intra-Coded Pictures in H.264 Video Coding,” in Proc. IEEE International Conference on Image Processing, October 2006, pp. 1713-1716. The method is called “detented” quantization and comprises first deriving a motion-compensated (inter-coded) version of the frame that will be coded as intra, obtaining quantization levels for each reconstructed value of the inter-coded frame, and then using those levels to constrain the quantized values of the intra-predicted residuals so that the error patterns stay consistent.
The methods and systems discussed above address spatial and/or temporal flickering with either only limited improvements in the quality of compressed video and/or high processing overhead to achieve desired improvements. Therefore, there exists a need in the art to address the problem of spatial and temporal flickering that is a result of coding mode and quantization decisions at the encoder to provide for improvements in spatial and temporal perceptual quality of compressed video.