The quality of an audio device can be determined either subjectively or objectively. Subjective tests are time consuming, expensive, and difficult to reproduce. Therefore, several methods have been developed to measure the quality of an output signal, in particular a speech signal, of an audio device in an objective way. In such methods, the speech quality of an output signal as received from a speech signal processing system is determined by comparison with a reference signal.
A current method that is widely used for this purpose is the method described in ITU-T Recommendation P.862 entitled “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs”. In ITU-T recommendation P.862, the quality of an output signal from a speech signal processing system, which signal is generally distorted, is to be determined. The output signal and a reference signal, for example the input signal of the speech signal processing system, are mapped onto representation signals according to a psycho-physical perception model of the human auditory system. Based on these signals, a differential signal is determined which is representative of distortion within the output signal as compared to the reference signal. A quality indicator representing a perceived quality of an output signal is commonly defined as an indicator which shows a high correlation with the subjectively perceived speech quality. The quality indicator is commonly expressed as a Mean Opinion Score (MOS) as determined in a subjective test where subjects (human) express their opinion on a quality scale. In general the quality indicator is derived from a comparison of the internal representation of the output signal of a device under test with the internal representation of the input signal to the device under test. The internal representation can be calculated by transforming the signal from the external, physical domain, towards the internal, psychophysical domain. In ITU-T recommendation P.862 the core of the algorithm that is used in the calculation of the psychophysical signal representation is composed of the following main operations, scaling towards a fixed level, time alignment, transformation from the amplitude-time to the power-time-frequency domain, warping of power and frequency scale. The operations lead to an internal representation in terms of loudness-time-pitch from which difference functions can be calculated. These difference functions are then used to derive a single quality indicator. For each speech file one can thus derive a MOS score and a quality indicator score which should have the highest possible correlation between them. As an example one can determine the quality of a speech codec by comparing the internal representations of the output of the codec with the internal representations of the input of the codec. For each speech file that is coded by the codec the quality indicator will produce a number that should have a high correlation with the subjectively determined MOS score for that en/decoded speech file. The differential signal is then processed in accordance with a cognitive model, in which certain properties of human hearing perception based on testing have been modeled, to obtain a quality signal that is a measure of the quality of the auditive perception of the output signal.
As clearly indicated by ITU-T recommendation P.862, PESQ is known to provide inaccurate predictions when used at varying listening levels. PESQ assumes a standard listening level of 79 dB SPL (Sonic Pressure Level) and compensates for non-optimum signal levels in the input signal. The subjective effect of deviation from optimum listening levels is therefore not taken into account. In present-day telecommunications systems, in particular systems using Voice-Over-IP (VOIP) and similar technologies, non-optimum listening levels occur very often. Consequently, PESQ frequently does not provide optimum predictions of the perception of speech signals processed in such telecommunication systems, which are becoming increasingly popular.