The field of digital broadcasting and digital mobile radio is expanding fast, in particular following the introduction of digital television and mobile telephones. In order to be able to provide a quality assured service, new instruments need to be developed for measuring the quality of all the systems necessary for the deployment of this technology.
Subjective tests are used for this purpose that evaluate the quality of sound signals by having experts or novices listen to them. This method is time-consuming and costly, because many strict conditions must be complied with for such tests (choice of panelists, listening conditions, test sequences, test chronology, etc.). It nevertheless yields databases consisting of reference signals and the scores assigned to them. These tests yield Mean Opinion Scores (MOS) that are recognized as the benchmark in the area of quality estimation.
Many studies of the human hearing system have been carried out with the aim of minimizing the number of subjective tests. Based on this work, models of the ear and of psychoacoustic phenomena have been developed and have been used to analyze sound signals and to estimate their quality using objective methods. The quality measured is the quality as perceived by the human ear, and is therefore referred to as the objective perceived quality.
It is possible to distinguish three classes of objective test methods: the first of these classes is the “complete reference” class in which the original signal is compared directly with the degraded signal (i.e. the signal after coding, broadcasting, multiplexing, etc.); the second class is the “reduced reference” class in which only parameters extracted from the two signals are compared; in the third class, defects generated by the broadcasting system are detected using their known main characteristics, and this circumvents the constraints associated with the use of a reference signal (in all other cases, the reference must be transmitted to the place of comparison and then synchronized precisely with the degraded signal, which makes the system complex and more costly).
Degradation by transmission errors significantly reduces the quality of the signal and occurs when broadcasting an MPEG digital stream, for example, or when broadcasting via the Internet, especially in the case of radio broadcasts.
In this context, it is desirable to have a method of objectively measuring the quality of a broadcast audio signal either without using a reference signal at all or using a “reduced” reference signal, for example because only these methods are suitable for monitoring a broadcast network where a plurality of remote measuring points may be necessary. It is also beneficial to exploit the relative simplicity of this kind of method for measuring the quality of a digital audio signal that has been subjected to digital coding, in particular with bit rate reduction, and/or decoding, whether the signal has been transmitted or not.
The number of audio quality measuring methods that have been developed varies widely from one class to another. A large number of complete reference methods have been developed, but only a few reduced reference methods or methods that do not use a reference.
Complete reference methods, which compare the signal to be evaluated with a reference signal, comprise the standard techniques used to estimate the quality of radio coders, for example. Their general principle is to use a perceptual model of human hearing to calculate internal representations of the original signal and the degraded signal and then to compare these two internal representations. One example of a method of this kind is described in the paper by JOHN G. BEERENDS and JAN A. STEMERDINK, “A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation”, published in “Journal of the Audio Engineering Society”, Vol. 12, December 1992, pages 963 to 978.
In order to obtain a representation that is as faithful as possible, these hearing models are based on masking experiments and must make it possible to predict whether the deterioration will be audible or not, since not all deterioration of a signal is audible or a nuisance. Perceptual models using a reference are based on the FIG. 1 diagram, and many methods of varying sophistication rely on this principle. The PErceived Audio Quality (PEAQ) algorithm was recently standardized by the ITU-R in Standard BS.1387. This algorithm is based on the standard principles and combines them with a quality prediction model using a neural network.
Although it must be remembered that they were designed for evaluating the impact of coding, the major benefit of these techniques is the ability to detect very slight deterioration. The measurements obtained are relative in that only differences are taken into account in this type of measurement. In the case of a coder of very high quality, a seriously degraded signal will be coded and then decoded almost transparently, and a very high score will therefore be assigned. Moreover, the score could be low for a signal that has been modified (equalized, colored, etc.) between the step of calculating the reference and the comparison step, even if the perceived quality of the two signals is very high.
There are as yet few methods that do not use a reference. The Output-Based objective speech Quality (OBQ) method is the most highly developed of the “no reference” methods. It is a method of estimating the quality of a speech signal alone, with no reference signal, and is based on calculating perceptual parameters representing the content of the signal, combined into a vector. Vectors calculated for non-degraded signals constitute a reference database. Quality is estimated by comparing the same parameters obtained from degraded signals with vectors from the reference database. The main method using neural networks is the Objective Scaling of Sound Quality And Reproduction (OSSQAR) method. The general principle of this method is to use a hearing model and a neural network conjointly. To simulate psychoacoustic phenomena, the network predicts the subjective quality of the signal from a perceptual representation of the signal calculated using the hearing model. Note that the results obtained with these methods are much better if the signals are part of the training database, or at least if they have similar characteristics.
Thus these methods are not suitable for evaluating the quality of all signals, for example radio or TV broadcast audio signals.
As indicated above, most objective perceptual measurement algorithms using a complete reference operate in accordance with the same principle; they compare the degraded sound signal and the original signal (i.e. the signal before transmission and/or coding and/or decoding, called the reference signal). These algorithms therefore require a reference signal, which must additionally be synchronized very accurately with the signal under test. These conditions can only be satisfied in simulation or during tests on coders and other “compact” systems or systems that are not geographically distributed; in contrast, the situation is very different when receiving a signal broadcast from send antennas A1 and receive antennas A2 (see FIG. 2).
The reference signal must be available at the comparison points. The only option for using a complete reference method is to transmit the reference to the comparison points without errors and then to synchronize it perfectly. These complete reference methods are not applicable in practice, for reasons of spectral congestion, and therefore of cost, as they would necessitate the use of a transparent second transmission channel.
The methods with no reference that have been proposed may yield good results, but only with signals having known characteristics modeled during the training phase. Methods with no reference do not work well on any signal.
Using a “reduced” reference, in which the reference audio signal is characterized by one or more numbers, has been suggested. A method of this kind is described in French Patent Application FR 2 769 777 filed 13 Oct. 1997. However, this method is not able to process all the samples, in particular because the bit rate of the proposed reference signal (which is at least 36 kbit/s for windows comprising 1024 signal samples) is too high to satisfy the practical constraints on installation and implementation in a broadcast network.