Speech quality assessment provides for optimisation in the control and design of speech coding and transmission algorithms and equipment.
Methods of assessing speech quality involving human listener rating schemes such as, for example, the Mean Opinion Score (MOS) or the Diagnostic Acceptability Measure (DAM), provide a subjective quality measure.
This type of speech quality assessment is rather expensive and requires appropriate facilities and test equipment and conditions.
In order to avoid human listeners, objective speech measurements have been proposed, attempting to estimate or predict subjective speech quality using mathematical expressions.
Typically, objective speech quality assessment methods are based on a comparison of the clean, undistorted original input speech signal and the degraded output speech signal. However, in practice, the clean original input signal is usually not available at the output of a system or device under test.
International patent application WO-A-96/06495 proposes to analyze certain statistical characteristics of speech which are talker independent in order to determine how the output signal has been modified or distorted by a telecommunications link, for example, without requiring the clean, undistorted input signal.
For the same purpose, International patent application WO-A-96/06496 discloses to analyze, by a speech recognizer, the content of a received signal. The result of this analysis is processed by a speech synthesizer to generate a speech signal having no distortions.
International patent application WO-A-97/05730 discloses speech quality measurement using vocal tract analysis and a neural network for producing a reference signal as a replica of the clean input signal.
Speech recognition, speech synthesis and adaptation of the synthesized signal to the voice and other properties of the talker of the degraded signal, in order to provide a reference signal for comparison with the degraded speech signal for assessing the speech quality thereof, comprise in practise computationally intensive tasks with a limited accuracy.
However, it is impossible to reconstruct from the degraded speech signal a reference signal which is equal to the original input speech signal.
Further, the reference signal becomes available with a delay that prevents timely feedback for control purposes to improve speech quality if the assessed quality is below a set level.