Draft ITU-T recommendation P.862, “Telephone transmission quality, telephone installations, local line networks—Methods for objective and subjective assessment of quality—Perceptual evaluation of speech quality (PESQ) [see reference 8], an objective method for end-to-end speech quality assessment of narrow-bank telephone networks and speech codecs”, ITU-T 02.2001, discloses prior art PESQ methods and systems.
Measuring the quality of audio signals, degraded in audio processing or transmission systems, may have poor results for very weak or silent portions in the input signal. The methods and systems known from Recommendation P.862 have the disadvantage that they do not compensate for differences in power level on a frame by frame basis correctly. These differences are caused by gain variations or noise in the input signal. The incorrect compensation leads to low correlations between subjective and objective scores, especially when the original reference input speech signal contains low levels of noise.
According to a prior art method and system, disclosed in applicant's EP01200945, improvements are achieved by applying a first scaling step in a pre-processing stage with a first scaling factor which is a function of the reciprocal value of the power of the output signal increased by an adjustment value. A second scaling step is applied with a second scaling factor which is substantially equal to the first scaling factor raised to an exponent having a adjustment value between zero and one. The second scaling step may be carried out on various locations in the device, while the adjustment values are adjusted using test signals with well defined subjective quality scores.
Both, in the methods and systems of Recommendation P.862 and EP01200945 the degraded output signal is scaled locally to match the reference input signal in the power domain.
It has been found that the results of the (perceptual) quality measurement process can be improved by application of “soft-scaling” at least one stage of the method and system respectively.
Introduction of “soft-scaling” instead of “hard scaling” (using “hard” scaling thresholds) is based on the observation and understanding that—the field of the invention relates assessment of audio quality as experienced by human users—human audio perception mechanisms rather use “soft thresholds” than “hard thresholds”. Based on that observation and a better understanding of how those human audio scaling mechanism works, the present invention presents such “soft-scaling” mechanisms, to be added to or inserted into the prior art method or system respectively.