The ability to digitize audio and video signals opens up the possibility of being able to copy, store, or transmit this type of information while maintaining constant quality. However, the large quantity of information conveyed by audiovisual signals makes it necessary in practice to use digital compression methods in order to reduce the bit rate.
The Motion Picture Experts Group standard MPEG2 describes techniques of a certain type that are applicable to reducing bit rate. Those algorithms are said to be “with loss”, since the signals played back after decoding are no longer identical to the original signals. In order to maintain acceptable quality for the final viewer, algorithms for reducing bit rate take advantage of perceptual properties of human eyes and ears. In spite of this, signal content and the constraints imposed on bit rate or bandwidth available for transmission mean that characteristic degradation appears in the signal after decoding. Such degradation introduced by the global MPEG2 system for encoding and transmission has a direct influence on quality as finally perceived.
Automatic evaluation of the quality of audiovisual signals has a wide range of applications in digital television: production, distribution, and performance evaluation of systems.
Unfortunately, existing apparatuses are designed for laboratory tests and are unsuited for remote surveillance of distribution networks.
There are two different ways of qualifying the degradations that affect picture and sound quality during application of encoding to reduce bit rate or during transmission. Firstly, subjective tests conducted under precise conditions provide results that are reproducible. However such tests are lengthy and expensive to perform. Secondly, automatic systems for evaluating quality on the basis of objective measurements make it possible, for example, to facilitate the development of encoding algorithms and the making of comparisons between them. Such systems make it possible to test digital systems on a spot basis or continuously. In order to obtain objective measurements that are significantly correlated with subjective values, the properties of the human visual system must first be taken into account.
The notion of quality is essentially relative. Even a viewer placed under ordinary conditions of observation (at home) judges the quality of signals made available by comparison with a reference. Under such circumstances, the reference is constituted by the viewer's expectations or habits. Similarly, a method of objectively evaluating quality analyzes degradations introduced by the system on the signals by taking account of reference signals that are present at the input to the system. The study of objective metrics thus requires firstly analysis of defects introduced in the signals, and secondly analysis of the human perceptual system and its properties. The various approaches are based either on computing an error signal, or on identifying signatures that are specific to artifacts introduced by the audiovisual system. The application of perception models makes it possible to evaluate the importance of degradations for the human perceptual system HPS.
Subjective tests are the result of submitting audiovisual signals to a panel of observers representative of the population. A set of satisfaction tests is performed under controlled viewing and listening conditions. The signals are presented to the observers under a predefined protocol to enable the observers to respond to final quality. Quality is graded using a predefined scale. Quality evaluation scores are obtained after presenting audio, video, or simultaneous audio and video sequences. Statistical computations serve to refine the individual scores by filtering them and homogenizing them. Various subjective test methodologies have been standardized, in particular in the International Telecommunications Union Recommendation ITU-R Bt.500 entitled “Method for the subjective assessment of the quality of television pictures”. Two such methodologies using a continuous scoring scale are;
DSCQS: “double stimulus continuous quality scale” protocol; and
SSCQE: “single stimulus continuous quality evaluation” protocol.
The first method serves to obtain a score for a 10-second video sequence. Two sequences A and A′ are presented in succession corresponding respectively to the original and to the degraded sequence (cf. FIG. 1).
The second method omits reference signals and evaluates a given sequence in intrinsic manner. FIG. 2 shows a curve of subjective scores achieved during a 30-minute long sequence. The abscissa axis represents time. A subjective score sample was taken every N seconds. The ordinate represents the quality grading scale. The curve shows the impact on subjective quality of all of the disturbances to which the sequence was subject.
Objective measurements can be performed using various approaches.
The principle of the approach which uses perception models is to stimulate the behavior of the human perceptual system (HPS) in part or in full. Given that in this context it is the quality of audiovisual signals that is to be determined, it suffices to evaluate the perceivability of errors. By modeling certain functions of the HPS, it is possible to quantify the impact of errors on the sense organs of humans. These models act like weighting functions applied to the error signals. In this way, the effect of each degradation is modulated in proportion. The overall process makes it possible objectively to evaluate the quality of signals passing through an audiovisual system (see FIG. 3).
Reference signals Sref, e.g. representing an audiovisual sequence, and signals S0 from said sequence and degraded by an audiovisual system SA are compared in a module MID for identifying defects, and then a score NT is given to the defects by comparison with a model MOD.
In the context of computing an error signal, signal-to-noise ratio can be considered as a quality factor. However it is found in practice that it is poorly representative of subjective quality. This parameter is very general, and thus incapable of spotting local degradations, of the kind that are typical in digital systems. Furthermore, signal-to-noise ratio makes it possible to evaluate the fidelity of degraded signals compared with the original very strictly, but that is different from evaluating the overall perceived quality.
To obtain a better evaluation of quality, it is necessary to use a large amount of experimental data concerning the human perceptual system. Application of the data is greatly facilitated since the system has been studied in terms of its sensitivity to a stimulus (in this case the error) in the context of a picture, for example. In this context, what matters is the response of the human visual system (HVS) to a contrast and not to an absolute magnitude such as luminance.
Various test images, such as uniform areas of luminance, or frequencies in space or time, have made it possible experimentally to determine the sensitivity of the visual system and the associated values of just-perceivable contrast. The appearance of the HVS response to light intensity is logarithmic, with optimum sensitivity being at spatial frequencies close to 5 cycles/degree. Nevertheless, those results need to be applied with prudence, since they are visibility threshold values. This explains why it is difficult to predict the importance of degradations of large amplitude.
Hearing models proceed in similar manner. Experimentally, the sensitivity to various stimuli is measured. It is then applied to various signal errors in order to evaluate quality.
However, audiovisual signals are complex in terms of richness of information. Furthermore, in practice, the use of that type of model to evaluate audiovisual signals raises several problems. In addition to the fact that the reference and degraded signals need to be available physically at the same location, it is also essential for sequences to be caused to correspond exactly in space and in time. That approach is therefore applicable to evaluating equipment such as an encoder when all of the equipment is located in a single laboratory, or to some cases of evaluating transmission such as satellite transmission where the transmitter and the receiver can both be on the same premises.
The approach which makes use of parametric models combines a series of parameters or degradation indicators chosen for generating an overall objective score.
The objective measurements applied to the audio and/or video signals are indicators of signal content and of the degradations to which they have been subjected. The relevance of these parameters depends on how representative they are in terms of sensitivity to defects.
Two categories of approach are then possible when generating parameters:
1) category I: “with a priori knowledge of the reference signal”; and
2) category II: “without a priori knowledge of the reference signal”.
The first approach category I relies on performing the same transformation or the same parameter computation on the reference signal and on the degraded signal. Generating an overall quality score relies on comparing results coming from both treatments. The measured difference represents the degradation to which the signal has been subjected.
The second approach category II does not require knowledge about the original signal, but only knowledge about the characteristics which are specific to degradation. It is then possible to compute one indicator for one or more degradation types. Low bit rate encoding and disturbed broadcasting of digital television signals generate identifiable characteristic defects: the blocking effect, picture freezing, etc. Factors for detecting these defects can be generated and used as quality indicators.
An example of a parametric model:
Numerous parameters have been proposed in the literature for implementing parametric models. The present invention does not seek to define new parameters, but to propose a general model for making use of such measurements.
The approach consists in comparing two images (the reference image and the degraded image) only on the basis of parameters that are characteristic of their content. Which parameters are selected is associated with their sensitivity to certain degradations which the system under evaluation produces. Thereafter, a quality measurement is built up by correlation using a series of objective measurements.
As an example, we mention a technique developed by the US Institute of Telecommunication Sciences (ITS). It relies on extracting a space parameter SI and a time parameter TI characteristic of sequence content (see FIG. 4). For further information, reference can be made to an article by A. A. Webster et al. entitled “An objective video quality assessment system based on human perception”, published in SPIE, Vol. 1913, pp. 15–26, June 1993.
The space information that is considered as being important in this case is outline information. For an image I at time t, the space parameter SI is obtained from the standard deviation of the image as filtered by Sobel gradients. This technique reveals the outlines of the image under analysis, and these play an important part in vision:SIt=σx,y(Sobel[It(x, y)])
In analogous manner, time information at a given instant is defined by the standard deviation of the difference between two consecutive images:TIt=σx,y(It(x,y)−It−1(x, y))
A measurement based on those two pieces of information makes it possible to evaluate change in content between the input of a video system (Sref) and its output (Ss), by using various comparisons.
            M      1        =                  log        10            ⁡              [                                            TI              s                        ⁡                          (              t              )                                                          TI              ref                        ⁡                          (              t              )                                      ]                        M      2        =          [                                                  ST              ref                        ⁡                          (              t              )                                -                                    ST              s                        ⁡                          (              t              )                                                            SI            ref                    ⁡                      (            t            )                              ]                  M      3        =          [                                    TI            s                    ⁡                      (            t            )                          -                              TI            ref                    ⁡                      (            t            )                              ]      
The three parameters M1, M2, and M3 are taken from these comparisons by a comparator COMP. Each of them is sensitive to one or more degradations. Thus, by comparing parameters SI, loss of focus (reduction in SI) is taken into account as are the outlines that are artificially introduced by the blocking effect (increase in SI). Similarly, differences between two versions of TI reveal defects in the encoding of motion.
The following step consists in summing M1, M2, M3 over time using one of the Minkowski norms Lp (in general, p=1, 2, or ∞). In this manner, it is possible to construct a summing model. That makes it possible to produce a quality score at the outlet from a summing module SMOD. The chosen model is a linear combination of the terms in Mi:Q=α+βM1+γM2+μM3 
The weighting coefficients (α, β, γ, μ) are computed by an iterative procedure MIN for minimizing distortion between the objectives scores Q and the subjective scores obtained on the same batch of pictures. The idea is to use iteration to discover the parameters of the combinatory model. In this manner, the estimated objective measurement will come as close to possible to the subjective score. The performance index of the model is given by the correlation coefficient.
An example of a model has been proposed in the literature. It makes it possible to obtain a good correlation coefficient: 0.92.Q=4.77−0.992M1−0.272M2−0.356M3 
Nevertheless, it appears that combinatory models produce performance that is not so good when they are used with pictures other than those appearing in the batch which was used for devising the model.
This approach is less constricting to implement than the preceding approach. Nevertheless, in practice, it remains difficult to achieve space and time correspondence between the scores of the two signal sequences.