A quality assessment of audio or speech signals may be obtained from human listeners, in which listeners are typically asked to judge the quality of a processed audio or speech sequence relative to an original unprocessed version of the same sequence. While such a process can provide a reasonable assessment of audio quality, the process is labour-intensive, time-consuming and limited to the subjective interpretation of the listeners. Accordingly, the usefulness of human listeners for determining audio quality is limited in view of these restraints. Thus, the application of audio quality measurement has not been applied to areas where such information would be useful.
For example, a system for providing objective audio quality measurement would be useful in a variety of applications where an objective assessment of the audio quality can be obtained quickly and efficiently without involving human testers each time an assessment in required. Such applications include: the assessment or characterization of implementations of audio processing equipment; the evaluation of equipment or a circuit prior to placing it into service (perceptual quality line up); on-line monitoring processes to monitor audio transmissions in service; audio codec development involving comparisons of competing encoding/compression algorithms; network planning to optimize the cost and performance of a transmission network under given constraints; and, as an aid to subjective assessment, for example, as a tool for screening critical material to include in a listening test.
Current objective measures of audio or speech quality include THD (Total Harmonic Distortion) and SNR (Signal-to-Noise Ratio). The latter metric can be measured on either the time domain signal or a frequency domain representation of the signal. However, these measures are known to provide a very crude measure of audio or speech quality and are not well correlated with the subjective quality of a processed sound as compared to a test sound as determined by a human listener. Furthermore, this lack of correlation worsens when these metrics are used to measure the quality of devices such as A/D and D/A converters and perceptual audio (or speech) codecs which make use of the masking properties of the human auditory system often resulting in audio (or speech) signals being perceived as being of good or excellent quality even though the measured SNR may be poor.
Some methods and systems for measurement of objective perceptual quality of wide-band audio have been proposed. However, all of these methods and systems employ algorithms that have been shown to result in inadequate levels of performance in tests conducted by the ITU-R (International Telecommunications Union-Radio Communications) in 1995–1996. Such methods and systems include J. G. Beerends and J. A. Stemerdink, “A perceptual audio quality measure based on a psychoacoustic sound representation”, J. Audio Eng. Soc., Vol. 40, pp. 963–978, December 1992; C. Colomes, M. Lever, J. B. Rault, and Y. F. Dehery, “A perceptual model applied to audio bit-rate reduction”, J. Audio Eng. Soc., Vol. 43, pp. 233–240, April 1995; K. Brandenburg and T. Sporer. “‘NMR’ and ‘Masking Flag’: Evaluation of quality using perceptual criteria”, 11th International AES Conference on Audio Test and Measurement, Portland, 1992, pp. 169–179; T. Thiede and E. Kabot, “A New Perceptual Quality Measure for Bit Rate Reduced Audio”, Proceedings of the Audio Engineering Society, Copenhagen, Denmark, Reprint Number 4280, 1996.
Accordingly, there is a need for an efficient system and methodology for obtaining an estimate of the perceptual quality of an audio or speech sequence, particularly audio or speech sequences that have been processed in some manner, that provides acceptable performance and that permits frequent and automated monitoring of audio or speech equipment performance and the degree of communication network degradation