1. Field of the Invention
The present invention relates to assessing voice quality in a telecommunication system.
2. Related Art
Modern telecommunication systems, including VoIP networks, use a multitude of telecommunication technologies, which include packetization, echo cancellation, speech coding, noise reduction, automatic gain control (AGC), voice activity detection (VAD), comfort noise generation (CNG), packet loss control (PLC), jitter buffers, etc. All of these technologies contribute significantly to the degradation of the transmitted voice signal over VoIP networks, and consequently, to conversational quality.
FIG. 1 illustrates conventional telecommunication system 100 utilizing packet network 130. As shown, telephone 110 is in communication with gateway 120 that is typically located at a central office. Similarly, telephone 150 is in communication with gateway 140 that is typically located at a central office. Gateways 120 and 140 are in turn in communication with each other over packet network 130. Each gateway 120 or 140 receives an analog voice signal from its local telephone 110 or 150, respectively, digitizes the analog voice, encodes the digitized voice and packetizes the encoded data for transmission over packet network 130 to the other gateway 140 or 120, respectively. In turn, the other gateway 140 or 120 performs the tasks of depacketizing and decoding the data received over the packet network for transmission of the analog voice signal to its local telephone 150 or 110, respectively.
For example, in the process of transmitting the speech signal from one side to another, modern telecommunication networks add significant transmission delay that are typically caused by digitization and packetization of the speech signal, which include signal processing delay, routing delay, packet loss, jitter delay, etc. As these transmission delays increase, they interfere with normal and natural conversational patterns. This degradation is beyond the traditional voice signal quality, which is not impacted due to delay. Rather, the increased delay significantly impacts conversational effort, ease and satisfaction. The same is true of other voice technology components used in communication systems. As further examples, noise reduction, automatic gain control, comfort noise generation and echo cancellation technologies add their own degradation to the speech signal. These degradations, in turn, impact conversational quality, effort and user satisfaction in these telecommunication systems.
The current practice in assessing voice quality in the telecommunication network is confined to estimating the voice signal quality. These current techniques, however, do not include any metrics or models for quantifying the effects of delay and other communication impairments on the ease and naturalness of conversations.
Conventional voice quality assessment systems predict and monitor one-way voice quality utilized in conventional models, which are typically referred to as Objective Listening Quality (OLQ) models or simply Voice Quality Models, such as E-Model, PsyVoIP, VQMON and PsyVoIP. Presently, a number of parties are also in pursuit of a conversational quality measurement model, which is reflected in the activities of the International Telecommunications Union (ITU-T), Study Group 12 (SG12).
The E-Model is a 1998 ITU-T standard, referred to as G.107. It is a widely employed opinion model and has been endorsed by ETSI and TIA. E-Model is a network-planning model, which predicts what the voice quality would be by making several assumptions of the network, the terminals used and the usage scenario. E-Model uses several parameters to estimate the voice quality before a call is made. The estimated voice quality aids the network transmission planner to determine what equipment and technologies to deploy for the call. This model does not actually monitor the calls in progress to determine the voice quality of a given call. Therefore, E-Model is not an in-service non-intrusive monitoring device (INMD), but it is merely a planning device. Further, this model is confined to narrow-band telephony (300 Hz-3400 Hz) and includes a limited set of voice technologies, such as narrow-band speech codecs, round-trip delays below 600 ms, bit errors, packet loss, and limited levels of residual echo. However, E-Model fails to include effects of a number of significant voice technologies, such as wideband telephony (for example, 50 Hz-7000 Hz bandwidth), hands-free communications (such as speaker phones), multi-party conversations (conferencing), round-trip delays of greater than 600 ms, noise reduction system, more annoying effects of residual echoes, etc. Even more, E-Model does not measure the actual conversational patterns in predicting voice quality, but it only computes an estimated conversational quality (CQE) due to the effects of a limited set of voice technologies incorporated in that model.
VQMON and PsyVoIP are two other models of monitoring voice quality, which are real-time voice quality monitoring models or in-service non-intrusive monitoring devices (INMDs), which are strictly Objective Listening Quality (OLQ) models as they measure only the one-way voice quality. PsyVoIP is a proprietary model from PsyTechnics, a U.K. company, and VQMON is a proprietary model from Telchemy, a U.S. company. Both these models use only the packet-layer-based information and not the true speech signal in the actual payload. Hence, they are referred to as the packet-based Voice Transmission Quality (VTQ) models. Using information contained at the packet-layer, they compute the one-way voice quality on a real-time basis. These models include the effects of some voice technologies, such as narrow-band speech codecs, packet delay, packet jitter, bit errors packet loss rate, packet loss pattern, etc. However, both models fail to include the effects of a number of significant voice technologies, such as wideband telephony (for example, 50 Hz-7000 Hz bandwidth), hands-free communications (such as speaker phones), multi-party conversations (conferencing), round-trip delays, noise reduction system, effects of residual echoes and echo cancellers, etc. Even more, these models also do not predict total conversational voice quality, but they merely predict a one-way voice quality. Additionally, these models also do not utilize actual conversational parameters and patterns in predicting voice quality.
The fourth model is the ITU-T P.862 standard, entitled “Perceptual Evaluation of Speech Quality (PESQ).” The PESQ model is not an in-service non-intrusive measurement device, because it does not measure or monitor real-time voice quality on a per call basis, but it is merely a Listening Quality (LQ) model. Moreover, PESQ is an intrusive technique, which requires the injection of a reference test signal, and then compares the degraded output speech with the pristine input reference signal. Similar to the limitations of all of the above models, the relevance of this model is confined to narrow-band telephony (300 Hz-3400 Hz) and includes a limited set of voice technologies, such as narrow-band speech codecs, bit errors, packet loss, VAD, and jitter. The PESQ model fails to include the effects of a number of significant voice technologies, such as extended wideband telephony (for example, 50 Hz-14000 Hz bandwidth), hands-free communications (such as speaker phones), multi-party conversations (conferencing), round-trip delays, noise reduction system, effects of residual echoes and echo cancellers, etc. Further, The PESQ model does not predict conversational voice quality; but it merely predicts one-way voice quality, and also does not utilize actual conversational parameters and patterns in predicting voice quality.
However, conversations, by definition, are multi-way communications, where parties talk and hear, which are what most users do when using telecommunication systems. The current models in practice merely capture the effects of one party talking and the other party listening passively. Hence, the existing models are referred to as Listening Quality (LQ) models. While this is a very useful first step, it does not capture the true conversational ease or user dis/satisfaction. Having a model by which one can predict and monitor the effects of delay (and other technological components in a network) on the conversational quality is of paramount benefit to network service providers, operators and technology designers.