Speech quality is a judgment of a perceived multidimensional construct that is internal to the listener and is typically considered as a mapping between the desired and observed features of the speech signal. Speech quality assessment may be used for analyzing the perceptual effects of various degradations on a speech signal. These degradations may be caused when speech processing systems are deployed in non-ideal operating conditions and the problem is compounded further by the increasing complexity and non-linear processing integrated into modern communication systems. In the telecommunications industry, such degradations impact the quality of service of a system and objective techniques for speech quality assessment may be used for optimizing network parameters, capacity management and cost optimization based on customer experience.
The quality of a speech signal (e.g. a voicemail) may be obtained in a listening test with a number of human subjects (subjective methods) or algorithmically (objective methods). As the quality of a speech signal is a highly subjective measure, a number of techniques for subjective speech quality assessment have been proposed. The International Telecommunication Union (ITU) standard outlines a number of protocols for carrying out subjective quality experiments on various measurement scales. There are broadly two types of subjective tests, one where the subjects rate the absolute quality of a signal (absolute rating) and the other where subjects provide a preference for one of a pair of signals (preference rating). A frequently used rating scale for absolute rating is the 5-point Absolute Category Rating (ACR) listening quality scale.
Although it is possible to get accurate results with subjective testing for small quantities of data (and are believed to give the true speech quality), they are time consuming and expensive to administer for large amounts of audio and thus unsuitable for real-time (or even near real-time) applications. The objective methods for speech quality assessment aim to overcome these issues by modeling the relationship between the desired and perceived characteristics of the signal algorithmically, without the use of listeners.