In almost all communication networks carrying speech information there is a possibility that the quality of the speech will be degraded by interference or damage. In the case of digital networks, transmission quality is generally higher than with analogue networks, although network operators are always striving for improvements.
In the case of a Universal Mobile Telecommunications Network (UMTS) cellular system, speech data is compressed to conserve bandwidth and is transported end-to-end in a number of different frame structures. The maximum transmission quality that is achievable is limited by the speech compression algorithm used. However, some further degradation will result from damage to frames as they are transported over the radio leg(s) of the transmission path. Loss of complete frames might also occur. If the levels of damage and frame loss are significant, end users will perceive a reduced speech quality for received signals.
Traditionally, telephone network operators monitored quality by conducting sample calls and asking participants for their subjective opinion as to call quality. The International Telecommunications Union Telecommunication Standardization Sector (ITU-T) provides guidelines for performing listening tests in its recommendation P.800. The recommendation specifies the environment and settings in which listening tests should be carried out. By following the guidelines it is possible to attain comparable results from different test situations. In these tests untrained listeners evaluate the quality of the system under test by opinion rating. Usually, Absolute Category Rating (ACR) is used. ACR requires the listeners to evaluate the quality of the system by rating speech quality on a scale of one to five. The average of the ACR ratings (across all listeners) is called Mean Opinion Score (MOS).
Although subjective testing is inevitably the most accurate speech quality assessment method, it has its limitations. Performing listening tests is a time-consuming and expensive process and is impractical for widespread use in on operational networks. Hence the telecommunications industry has attempted to develop objective and automatic speech quality assessment methods.
Perceptual Evaluation of Speech Quality (PESQ) is an intrusive speech quality assessment algorithm standardised by ITU-T in recommendation P.862. The PESQ algorithm can be used to predict the subjective quality of narrow-band telephony and speech codecs in a variety of test conditions and applications. The PESQ algorithm takes its input samples in linear 16-bit PCM format, sampled with an 8 or 16 KHz sampling frequency. Ideal sample length is between 8 and 20 seconds. The algorithm uses a psychoacoustic perceptual model to calculate the difference between a reference speech sample and a degraded sample. The difference between the samples is mapped into a PESQ score, ranging from −0.5 to 4.5. As the MOS scale ranges from 1 to 5, the ITU-T has defined a mapping function which allows PESQ scores to be compared with subjective MOS scores. The PESQ algorithm has demonstrated acceptable accuracy in evaluating speech quality, taking into account the effects of transmission channel errors, transcoding, packet loss and packet loss concealment methods. The correlations between PESQ scores and subjective listening test results have been benchmarked to be around 0.935. However in some circumstances, such as evaluating packet loss with PCM type codecs, the correlation is reduced. Therefore PESQ can not be used to replace subjective testing completely.
The P.563 algorithm is a non-intrusive speech quality assessment method standardised by ITU-T in recommendation P.563. Unlike the PESQ algorithm, the P.563 algorithm does not need a reference sample to evaluate speech quality. Therefore the algorithm can be applied in live networks anywhere in the call chain.
In the case of the recently developed Voice over Internet Protocol (VoIP), objective and automatic speech quality estimation methods have emerged, for example VQMon™ from Telchemy Inc. and PsyVoIP™ from Psytechnics Ltd. Such methods try to estimate how the IP-network impairments (delay, jitter, packet loss) affect the speech quality of a VoIP-call. The impairments are analysed by looking at the Real Time Protocol RTP frames, which are used to carry the speech in VoIP. As the analysis is done on real traffic, it can be performed constantly and no separate test calls are necessary.