The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Internet Protocol (IP) telephony is a technology that is being widely implemented and gaining widespread acceptance. However, transmitting voice over a network infrastructure that was developed and built for delay-insensitive applications, such as file transfers, poses various challenges. IP telephony systems do, at times, experience voice quality and performance problems. For example, jitter and latency are examples of common types of problems that are encountered with IP calls. Quality of Service (QoS) mechanisms are in place in modern network infrastructures which allow for allocation of priority to voice traffic, but a primary issue with IP telephony continues to be the quality of the voice received by a call participant.
Most applications that attempt to analyze the quality of voice over an IP telephony network consider latency and jitter, and compute a prediction of the level of degradation of voice quality. However, the predictions based on such parameters have limited value because, ultimately, voice quality is about a user's perception of the audible sounds being received in a call. Latency and jitter are useful metrics but they do not capture the perceived voice quality as experienced by a user.
Perceptual algorithms that are used to model the human ear are currently in existence. Such perceptual measures typically compute a number referred to as the Mean Opinion Score (MOS) that is used to categorize how the human ear perceives the voice quality. There are a number of techniques that are used to measure the perceptual voice quality, such as PESQ (Perceptual Evaluation of Speech Quality), PSQM (Perceptual Speech Quality Measurement) and PAMS (Perceptual Analysis Measurement System). However, each of these techniques operates on an end-to-end basis. A reference waveform signal is generated at one end of the network and transmitted to the other end and two waveforms, the reference waveform and the degraded waveform, are compared at the receiving endpoint to measure the degradation or distortion imparted to the signal by the network infrastructure.
The PESQ technique is described in a document entitled “Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs” (ITU-T Recommendation P.862), which is available from the International Telecommunication Union, and is hereby incorporated by reference in its entirety for all purposes as if fully set forth herein.
Since the foregoing techniques operate on an end-to-end basis, they are unable to determine a precise location within the network at which the speech degradation occurs, such as a specific hop or a specific component. Such techniques simply measure the perceptual degradation over the entire transmission path of the waveform signal. Hence, locating the true source of degradation remains a challenge and, consequently, actually eliminating the degradation remains an even bigger challenge. Therefore, valuable time and resources are expended in attempting to locate the degradation source and to fix the problem that is causing the degradation.
Based on the foregoing, there is a clear need for a technique for determining the source of perceptual speech degradation in an IP telephony environment.