Communications traffic is increasingly being carried over computer networks, with Voice over Internet Protocol (VoIP) applications becoming popular in both the public Internet and in enterprise networks. Since IP networks do not guarantee end-to-end delay, packet loss rates, jitter, and available band-width, monitoring and estimating the VoIP call quality in prevailing network conditions is essential to mitigating issues that can significantly reduce the Quality-of-Experience (QoE) as experienced by end users.
In a VoIP application, voice/video is digitized and packetized at a sender before its transmission over the IP network to a receiver. At the receiver the packets are decoded and played out to a listener. The process of converting an analog voice signal to digital is done by a “Codec”. Codecs vary in bandwidth required, latency, sample period, frame size, and the maximum achieved end user perceived quality; thus, different codecs are better suited to different network conditions. International Telecommunication Union's Telecommunication Standardization Sector (ITU-T) outlines two test methods to assess QoE: subjective testing and objective testing. Subjective testing was the earliest approach to evaluating the quality by assigning Mean Opinion Scores (MOS).
ITU-T recommendation P.800 presents the MOS subjective test procedures for audio quality testing. It usually involves 12-24 participants; the participants individually listen to an audio stream for several seconds and rate the audio quality on the scale of 1 (poor) to 5 (excellent). Similarly, International Telecommunication Union's Radiocommunication Sector (ITU-R) BT.500 presents a methodology to obtain MOS values for video quality. Subjective testing using MOS is time consuming, expensive and does not allow for real time measurements.
Several techniques are developed for monitoring MOS in an objective way, i.e., without human perception. The first technique applies an online method by locally monitoring different network characteristics at the sender to estimate call quality. Usually different network impairment factors are monitored and then used in a computational mathematical model to result in a single metric that is used for indicating the call quality in progress. Because it is mainly based on monitoring different QoS factors and then converting these factors into a single metric, the first technique can be applied regardless of the network conditions. This type of techniques is being used nowadays by industry and research as a live voice quality measurement tool. However, the first technique is considered an inaccurate method as it is considered only estimates for the transmission planning purposes and not for actual customer opinion prediction.
The second technique applies an offline method in a non-intrusive way; in other words, the second technique records from both of a sender and a receiver, then the receiver sends the recorded file to the sender where the sender can compare both recorded files using one of the intrusive algorithms, e.g., PSQM (Perceptual Speech Quality Measure), PESQ (Perceptual Evaluation of Speech Quality), and POLQA (Perceptual Objective Listening Quality Assessment). Also, current research takes into account the delay in the second technique. The second technique is the high accuracy level because it is considered the approved industrial method for measuring the VoIP call quality; however, in order to monitor the call this technique will be at the expense of bandwidth overhead as a result of sending the audio recorded file from the receiver each certain interval of time.