Deployment of a multitude of speech coding and synthesis systems on telecommunications networks, as well as in auditory prosthetic systems, has increased the importance of accurate evaluation and monitoring of quality of speech signals and more generally audio signals.
There are a number of known methods of evaluating speech quality based on subjective testing. An absolute Category Rating (ACR) system, such as a Mean Option Score (MOS) testing, provides a one dimensional quality measurement. The Diagnostic Acceptability Measure (DAM) is another method of evaluating speech quality which requires subjective testing. The Diagnostic Acceptability Measure provides a multidimensional quality measurement based on axes such as “interrupted” and “tinny”.
Alternative to subjective measurement methods are objective methods of measuring speech quality. One such objective method of measuring speech quality is known as Perceptual Evaluation of Speech Quality (PESQ) algorithm. The Perceptual Evaluation of Speech Quality algorithm has been standardised by the International Telecommunications Union (ITU). The Perceptual Evaluation of Speech Quality algorithm is however inappropriate for many synthesis systems including low bit-rate vocoders (i.e., below 4 kbps) as well as speech degraded by environmental conditions such as babble and military vehicle noise. In addition, the Perceptual Evaluation of Speech Quality algorithm fails to predict the quality of low pass filtered speech as well as speech degraded by narrow band noise.
The Perceptual Evaluation of Speech Quality algorithm uses a psychoacoustic masking model (PMM) in order to predict the mean option score. The psychoacoustic masking model is an attempt at modelling the linear component of what is a highly non-linear hydromechanics of the human cochlea. In essence, the psychoacoustic masking model is a very approximate estimation of the linear component of the Basilar membrane (BM) response. As such, the psychoacoustic masking model is not able to predict a number of linear and non linear characteristics of the true physiological response of the cochlea and its corresponding psychophysics.