VoIP is a promising technology that is expected to replace traditional telephone networks. Although VoIP is efficient, its speech quality is still less than what telephone users are accustomed to, due to packet loss, more perceived echo, excessive delay, and clipping. Network administrators need to maintain certain level of quality by monitoring speech quality of live calls for actions to be taken.
Subjective methods are unsuitable for live calls assessment. Objective intrusive methods mandate a reference signal, therefore cannot monitor live calls. The only suitable method is the E-model that estimates the speech quality based on statistics collected from the network.
The above discussion led VoIP service providers to implement a variety of techniques to enhance speech quality offered in VoIP. This resulted in a large number of providers offering services with competing prices and different levels of quality. The problem now changed to how to assess the quality of speech offered. Providers need methods to assess the performance of their services and compare it to services offered by competing providers. Engineers also need these methods to evaluate the newly developed techniques and compare it to old ones. Network administrators need methods to monitor the quality of speech transmitted through the networks, so they can take actions whenever the quality of speech degrades. Finally, the users also need these methods to compare the quality offered by different service providers.
One of the most important issues in VoIP these days is to measure the speech quality. Efforts have been focused to develop methods to measure the speech quality especially for VoIP. To measure the speech quality correctly, these methods have to reflect the human perception of speech quality. The most reliable approaches are the subjective methods. In these methods, a number of subjects (humans) rate the speech signals. The average of their ratings is calculated and is considered as the quality rate of the signal.
The most well known subjective test is MOS (Mean Opinion Score). Developers try to develop objective methods which give the same quality scores as MOS does.
Objective methods are those carried out by machines, without human interference. These methods are necessary for monitoring network performance since subjective methods cannot be used for this purpose. Most of the available objective methods are intrusive in nature. In these intrusive techniques, a reference signal is injected in the network from some point and received at another point. Since the original signal is known, the received degraded signal can be rated by comparing it to the original one. These techniques give relatively good estimates of MOS scores.
The most reliable and widely used of these methods are PAMS (Perceptual Analysis/Measurement System), PSQM (Perceptual Speech Quality Measurement) and PESQ (Perceptual Evaluation of Speech Quality).
Another approach is called non-intrusive. In this approach, no reference signal is injected in the network. Instead, the algorithm operates on signals that are present in the network or on the statistics collected from the network. The challenge in this technique is that the original signal is not known to the algorithm, so it cannot compare between original and degraded signals to assess the quality of the received signal. Some attempts are made in this approach but no robust algorithm is found.
The E-model is the leading method currently available that is considered non-intrusive. This model uses the statistics collected from the network during operation, such as the packet loss rate, delay, jitter estimates and signal to noise ratio, to estimate a quality score of the signal. This method is based on statistical nature, which does not guarantee accuracy. For example, statistically if packet loss rate is high, the quality score is low. But this is not always true, since the quality of the signal is still high, if most of the lost packets are in silence periods. Another non-intrusive method was recently developed. The idea is to estimate the original speech from the degraded counterpart. Then the two speech signals are compared to provide the quality assessment. This method, however, is seen to be inaccurate and very complex to implement.
There is, therefore, a need for a method which can monitor a live call and determines the quality of this call for its whole duration. These methods are needed to alert network administrator when the quality of the call is degraded, so some action can be taken to guarantee an acceptable quality of calls all the time. This method has to assess speech quality by examining degraded signals only without any information about original signals.
The reader is assumed to be familiar with the various current VoIP speech quality standards.
There is also a need in the art for an objective non-intrusive method to assess speech quality based on characteristics extracted from degraded signals and not statistics collected from the network.