The subject of the present invention is a method for evaluating the degradation of a video image introduced by a coding and/or storage and/or digital transmission system, particularly a system employing digital video signal transmission and/or storage and/or coding with a low throughput.
The degradation is generally due to the errors introduced by the throughput-reduction algorithms and/or by defective transmission links or even to defects in the coders and decoders.
The digitizing of video signals has opened the possibility of being able to copy, store or transmit the video information while maintaining constant quality.
However, the large amount of information transported by the video images in practice requires the use of digital compression methods in order to reduce the binary throughput.
One compression method which is very widespread in the video field is described in the ISO/CEI 13918 MPEG2 standard. This algorithm is said to be of the xe2x80x9clossyxe2x80x9d type, since the image reproduced after decoding is not identical to the original. In order to maintain quality which is acceptable to the final viewer, the throughput-reduction algorithms take account of the perceptual properties of the human system of vision. Despite that, the constraints imposed, of throughput or of available bandwidth for the transmission, as well as the content of the video signal, imply the appearance of characteristic degradation in the signal after decoding. This degradation, introduced by the system, for example MPEG2, has a direct influence on the perceived quality of the final image.
Automatic evaluation of the quality of audio-visual signals has a wide range of applications in the digital television system: production, distribution and evaluation of the performance of the systems.
Furthermore, the existing devices were derived for laboratory tests and are not suitable for remote monitoring of the distribution networks.
Sequences of MPEG-coded images transmitted at low data rate for broadcasting digital television or for other multimedia applications will exhibit a certain number of defects or deformations with respect to the original sequence. In fact, a non-exhaustive list of visible degradations could be drawn up. Those of them which are most perceptible are granular errors, deformation of contours, losses of information, the xe2x80x9cexoticxe2x80x9d contours, block effects, etc. However, small-scale transmission errors may be conveyed by more or less localized effects on the image. In the event of a significant disturbance, they may be expressed as difficulties of access to the information, for example breaks in service or freezing of images for a longer or shorter time depending on the disturbance. The scope of the errors depends on the level of relevance and of the structure of the data which they affect: synchronization words, movement vectors, images coded with or without prediction or basic images for the predictions. In addition to breaks or freezing of images, the degradation observed is expressed as blocks or macroblocks which are erroneous or incorrectly positioned. This has the effect of propagating the degradation over the entire video sequence up to the image coded without prediction, hence coded independently of the others.
One evaluation method has already been proposed by the NTIA (National Telecommunications and Information Administration) in the article by A. A. WEBSTER and Colleagues, entitled xe2x80x9cAn objective video quality assessment system based on human perceptionxe2x80x9d and published in June 1993 in the magazine SPIE, vol. 13, p. 15-26.
This method employs an analysis of the degraded images and of the original images, after they have been filtered by two vertical and horizontal SOBEL operators (3xc3x973 matrices). The filtered image is obtained by convolution, by making the SOBEL matrices slide horizontally and vertically, and the results obtained represent the vertical and horizontal gradients of the image. In other words, the filtered image highlights the vertical and horizontal contours contained in the initial (unfiltered) image.
A measurement based on this information makes it possible to highlight the change in content between the input to the video system and its output.
The method proposed by the NTIA employs two parameters:
on the one hand, the spatial information SI which represents the standard deviation measured on the pixels of the filtered image via the SOBEL operator. It is a question here of determining the standard deviation at the level of the contours of the filtered image, considering that the contours are important for viewing and that they are affected by the various processing operations of the throughput-reduction digital systems;
on the other hand, the time-based information TI which represents the standard deviation of the difference image between two successive images, this standard deviation being calculated on the basis of the differences between the values of the same pixels of two successive frames. The parameter TI may reveal a jerky movement due to a defect in the coder.
The method proposed by the NTIA employs a comparative calculation of the parameters SI and TI over the digital video signal, between an input image and an output image of a system.
This method exhibits a certain number of drawbacks.
The SOBEL filter conserves only certain frequencies of the image which make it possible to take account of the loss of contours, which means that the loss of definition which could be taken into account is that which is situated in the range of frequencies conserved. In other words, the loss of definition can be only partially taken into account.
Moreover, the parameter SI takes account of defects which tend to compensate out. In fact, the loss of information from the image tends to make the parameter SI diminish, whereas the false contours and the block effects tend, in contrast, to make it increase, which means that the parameter SI is meaningful only if one or other of the phenomena is dominant.
Finally, the method of calculating the SI and TI parameters, by using a standard deviation calculated in overall terms on the entire image, drastically reduces the impact on these parameters of localized degradations.
The subject of the present invention is a method which allows the abovementioned defects to be at least partially remedied.
The method according to the present invention makes use of a block transform, for example the discrete cosine transform used particularly in the MPEG standard, in order to highlight the characteristic signatures of the defects identified. This original approach makes it possible not only to make fine measurements on the errors introduced, but also makes it possible to take account of the initial content of the video signal and of the algorithms employed in MPEG.
The block transformations of an image (Fourier transform, discrete cosine transform DCT, etc.) are obtained via the operation:
[Fn,m]=[T].[fn,m].[T]t in which f(x,y) designates the image block to be transformed and T(x,y) the matrix of the transformation. Another block transform is produced on the basis of the wavelet-transform of the image, by reorganizing the wavelet coefficients in such a way as to obtain transformed blocks having the desired size, and in particular of the same size as the blocks obtained by the abovementioned methods. Such a method of reorganization is set out in the article by R. de QUEIROZ and Colleagues, entitled xe2x80x9cWavelet transforms in a JPEG-like Image Coderxe2x80x9d, published in April ""1997 in the magazine IEEE Transactions On Circuits and Systems for Video Technology, vol. 7 no. 2 p. 419-424.
The basic idea of the invention is, in particular, to carry out calculations on the blocks according to which the transmitted image was coded, in such a way as to generate a meaningful parameter exempt from the block effect.
The invention thus relates to a method of evaluating the degradation of a video image coded by blocks of image points or pixels, this degradation being generated by a coding and/or storage and/or transmission system producing an output image on the basis of an input image, characterized in that it includes the following stages:
a) selecting an input image and determining a spatial activity (SA) of the input image in an analysis window representing at least one part of the image which exhibits a set of said blocks of image points or pixels, this determination implementing the following sub-stages:
i) determining, for each block (n,m) of the said set of blocks of pixels, the transformed coefficients Fn,m (i,j) via a block transform according to the said blocks
ii) determining, on the basis of the transformed coefficients Fn,m (i,j), the spatial activity bsa of each block of the said set of blocks of pixels
iii) determining, on the basis of the spatial activity of bsa of each block, the overall spatial activity SA1 of the set of the blocks of pixels constituting the analysis window
b) selecting the output image corresponding to the input image and determining the said overall spatial activity SA2 of the output image in the said analysis window, by implementing sub-stages a i) to a iii)
c) comparing the overall spatial activity (SA2) of the output image in the analysis window and the overall spatial activity (SA1) of the input image in the analysis window.
The spatial activity bsa of a block (n,m) can be obtained by the following formula:       bsa          n      ,      m        =            (                                    ∑            7                                i            =            0                          ⁢                                            ∑              7                                      j              =              0                                ⁢                                    F                              n                ,                m                            2                        ⁡                          (                              i                ,                j                            )                                          )        -          (                        F                      n            ,            m                    2                ⁡                  (                      0            ,            0                    )                    
The spatial activity bsa can be obtained as indicated above by combining the squares of the components of the image on the basis of nearly all the components of the DCT. Any other function for combining the components of the transform used is applicable for characterizing the content of the image. A more general function is:       bsa          n      ,      m        =            (                                    ∑            7                                i            =            0                          ⁢                                            ∑              7                                      j              =              0                                ⁢                                    (                                                k                  ⁡                                      (                                          i                      ,                      j                                        )                                                  ·                                                      F                                          n                      ,                      m                                                        ⁡                                      (                                          i                      ,                      j                                        )                                                              )                        p                              )        -                  (                              k            ⁡                          (                              0                ,                0                            )                                ·                                    F                              n                ,                m                                      ⁡                          (                              0                ,                0                            )                                      )            p      
in which k(i,j) is a constant coefficient for weighting the component i,j used, and p is a constant.
There are several options for choosing the constants k(i,j), and the choice of one or the other is made on the basis of the application sought. In fact, the function k(i,j) is chosen depending on the characteristic to be brought to the fore: taking account of the visual system or extracting a part of the relevant information. The following cases are proposed:
1) method number one is to be used when good correlation with human perception is favoured.
The values of the constants k(i,j) are initialized as a function of the relative importance of the coefficients of the transform for the human eye, particularly its frequency sensitivity, so as to supply an activity parameter representative of what is perceived. For example, in the case of the DCT, k(i,j)=1/Q(i,j) is taken. The Q(i,j) are the components of the quantization matrix exploited for the throughput reduction, set out in the following document which is extracted from the MPEG-2 standard: ISO/IEC CD 13818-2: xe2x80x9cInformation technologyxe2x80x94Generic coding of moving pictures and associated audio informationxe2x80x94Part 2: videoxe2x80x9d, December 1993, p. 45, xc2xa7 6.3.7.
2) the second method is used when the coefficients affected by the compression for example the DCT coefficients are identifiable. The constants k(i,j) are chosen in such a way as to eliminate certain coefficients of the transform used. In fact, the weighting k(i,j) is used in order to retain the coefficients which are most affected or sensitive to a given degradation. This involves a binary matrix allocating zero to the coefficients to be eliminated and one to the relevant coefficients. The selection is based either (a) on the position of the coefficient in the matrix, for example DCT matrix, or (b) on its average amplitude:
a) the coefficients corresponding to the high spatial frequencies are often the most affected by compression. An example of a weighting matrix depending on the placing of the DCT coefficient for example is given in the table below:
b) certain low-amplitude coefficients are brought down to zero during the compression stage.
In order to choose these coefficients, a weighted average of each of the coefficients over the image region analyzed (of size M.N blocks) is formed:             avCoef      ⁡              (                  i          ,          j                )              =                            ∑          M                          m          =          1                    ⁢                                    ∑            N                                n            =            1                          ⁢                  "LeftBracketingBar"                                                    F                                  n                  ,                  m                                            ⁡                              (                                  i                  ,                  j                                )                                                    Q              ⁡                              (                                  i                  ,                  j                                )                                              "RightBracketingBar"                      ,
where Q(i,j) is defined as above, at 1).
The coefficients the averages of which are among the 48 lowest values are adopted. For these values, k(i,j)=1, and for the others k(i,j)=0.
In the case of the xe2x80x9cspatial activityxe2x80x9d parameter, k(i,j)=1 is set, and p=2 for the rest of the document relating to the description of the figures.
The overall spatial activity SA of the set of blocks can then be determined by the following formula:   SA  =                    bsa                  n          ,          m                    _        =                  1                  H          xc3x97          W                    ⁢                                    ∑                          H              -              1                                            n            =            0                          ⁢                                            ∑                              W                -                1                                                    m              =              0                                ⁢                      bsa                          n              ,              m                                          
(Hxc3x97W) representing the number of blocks of pixels in the analysis window.
The said comparison (stage c) is advantageously carried out with the aid of the parameter LR defined in the following way:
LR=gj[fi(SA1,SA2)] with i,jxcex5{1,2}
with f1(x,y)=xxe2x88x92y or f2(x,y)=x/y
and g1(x)=100.|x| or g2(x)=100.|log(|x|)|
and, for example: LR=log10(SAe/SAs).
The method may then be characterized in that it performs the calculation of the parameter LR over a sequence of M input images and of M corresponding output images, and in that it performs the calculation of an evaluation parameter or quality score MLR defined in the following way:
MLR=maximumm(LR)
The method may advantageously be characterized in that it includes the determination of a time-based activity indicator TA for a group of M images, determined in the following way:       TA    =                                        ∑                          M              -              1                                            u            =            1                          ⁢                              F            SA            2                    ⁡                      (            u            )                                ⁢      xe2x80x83  
FSA(u), for u varying from 0 to Mxe2x88x921, designating the M coefficients of a block transform, for example a discrete cosine transform, applied to M time-based samples of the spatial activity SA.
The method may be characterized in that it includes the following stages, with a view to determining the block effect:
d) determining the spatial activity SAd1, of the input image in an offset analysis window exhibiting blocks of image points or pixels which are offset by at least one pixel in the direction of the lines of the image and/or in a direction perpendicular to the direction of the lines of the image with respect to the said blocks of pixels of the said analysis window, this determination implementing the following sub-stages
i) applying the said block transform to each offset block (n,m) of the said set of blocks of pixels of the offset analysis window in order to determine the transformed coefficients Fdn,m (i,j),
ii) determining, on the basis of the transformed coefficients Fdn,m (i,j) of the offset blocks, the spatial activity bsad of each block of the said set of offset blocks,
iii) determining the overall spatial activity SAd1, of the set of offset blocks constituting the offset analysis window,
e) determining the overall spatial activity SAd2 of the output image in the said offset analysis window by implementing sub-stages d i) to d iii) for the output image
f) comparing, on the one hand, the overall spatial activity SAd2 of the output image in the offset analysis window and the overall spatial activity SA2 of the output image in the analysis window, in order to evaluate the block effect in the output image and, on the other hand, the spatial activity SAd1 of the input image in the offset analysis window and the spatial activity SA1 of the input image in the analysis window in order to evaluate the block effect in the input image.
The block effect is characterized in two different ways:
intrinsically, that is to say via a function representative of the content of the image and of the image alone.
BM2=gj[fi(Sad2,SA2)]
in a differentiated way, that is to say that it is necessary to compare two values of a function representative of the content of the image: the first calculated on the reference image (input image) and the second on the degraded image (output image).
BM=gk[fj(fi(SAd2,SA2),fi(SAd1,SA1))], with i,j,kxcex5{1,2}
and then f1(x,y)=xxe2x88x92y or f2(x,y)=x/y
g1(x)=100.|x| or g2(x)=100.|log(|x|)|
In particular, the formula       BM    2    =      100    ·                            SA          d2                ⁡                  (          t          )                                      SA          1                ⁡                  (          t          )                    
is used.
Stage f advantageously implements the determination of a block-effect indicator BM, according to the following formula:   BM  =                              SA          d2                ⁢                  /                ⁢                  SA          2                                      SA          d1                ⁢                  /                ⁢                  SA          1                      xc3x97    100  