During the past decades objective speech quality measurement methods have been developed and deployed using a perceptual measurement approach. In this approach a perception based algorithm simulates the behaviour of a subject that rates the quality of an audio fragment in a listening test. For speech quality one mostly uses the so-called absolute category rating listening test, where subjects judge the quality of a degraded speech fragment without having access to the clean reference speech fragment. Listening tests carried out within the International Telecommunication Union (ITU) mostly use an absolute category rating (ACR) 5 point opinion scale, which is consequently also used in the objective speech quality measurement methods that were standardized by the ITU, Perceptual Speech Quality Measure (PSQM (ITU-T Rec. P.861, 1996)), and its follow up Perceptual Evaluation of Speech Quality (PESQ (ITU-T Rec. P.862, 2000)). The focus of these measurement standards is on narrowband speech quality (audio bandwidth 100-3500 Hz), although a wideband extension (50-7000 Hz) was devised in 2005. PESQ provides for very good correlations with subjective listening tests on narrowband speech data and acceptable correlations for wideband data.
As new wideband voice services are being rolled out by the telecommunication industry the need emerged for an advanced measurement standard of verified performance, and capable of higher audio bandwidths. Therefore ITU-T (ITU-Telecom sector) Study Group 12 initiated the standardization of a new speech quality assessment algorithm as a technology update of PESQ. The new, third generation, measurement standard, POLQA (Perceptual Objective Listening Quality Assessment), overcomes shortcomings of the PESQ P.862 standard such as incorrect assessment of the impact of linear frequency response distortions, time stretching/compression as found in Voice-over-IP, certain type of codec distortions and reverberations.
Although POLQA (P.863) provides a number of improvements over the former quality assessment algorithms PSQM (P.861) and PESQ (P.862), the present versions of POLQA, like PSQM and PESQ, fails to address an elementary subjective perceptive quality condition, namely intelligibility. Despite also being dependent on a number of audio quality parameters, intelligibility is more closely related to information transfer than to the quality of sound. In terms of the quality assessment algorithms, the nature of intelligibility as opposed to sound quality causes the algorithms to yield an evaluation score that mismatches the score that would have been assigned if the speech signal had been evaluated by a person or an audience. Keeping in focus the objective of information sharing, a human being will value an intelligible speech signal above a signal which is less intelligible but which is similar in terms of sound quality.
Although much progression is achieved, the present models in a number of cases still unexpectedly fail to correctly predict human intelligibility evaluation scores.