Voice-over-IP (VoIP) is a technique for delivering voice information over a network that employs the Internet Protocol (IP). The network, called a VoIP network, transmits voice information digitally in the form of voice packets. The VoIP network is different from a public-switched telephone network (PSTN), which transmits voice signals as a stream of analog signals. Protocols used by the PSTN generally include IXC (Interchange Carrier), LEC (Local Exchange Carrier), and C-LEC (Competitive Local Exchange Carrier), which transmit analog voice signals in a manner different from the IP.
Although a VoIP network is different from a PSTN in many aspects, a phone call that originates from a PSTN can be sent over the VoIP network to a computer. Conversely, voice packets originating from the Internet can also be sent over the VoIP to reach a telephone on a PSTN. For example, an Internet Telephony Service Provider (ITSP) network is a VoIP network. The ITSP network is built on the physical infrastructure of the Internet, and further includes gateways to perform appropriate conversions for transmitting calls between a PSTN and the Internet. The gateway includes conversion circuits for performing analog-to-digital and digital-to-analog conversions, as well as appropriate protocol conversions.
For voice packets received from the Internet, the gateway converts them into analog signals, and sends the analog signals to the PSTN. The gateway also converts analog signals coming from the PSTN into voice packets. The gateway performs the conversions in both directions at the same time, allowing a full-duplex (two-way) conversation to take place between users connected to either the Internet or the PSTN.
Compared to transmissions of data packets, voice transmissions are more susceptible to delays and variations in the delays. The delay variations, also called jitters, can greatly distort voice signals and render them unrecognizable to a user. Therefore, maintaining a Quality of Service (QoS) acceptable to a user is an important issue in voice transmissions. Furthermore, because the IP is a “best effort” protocol that generally does not guarantee QoS, there is no assurance of the quality of voice transmissions over a VoIP network. Some VoIP networks therefore use an improved version of the IP, called Real-Time Protocol (RTP), to transmit voice packets to ensure QoS and timely delivery of the voice packets. RTP provides end-to-end delivery services of real-time audio and video.
Quality of voice transmissions can be determined using conventional measurements for data transmissions, such as distortions, packet loss, and signal-to-noise ratio. However, characteristics of voice transmissions are different from those of data transmissions in many aspects. One of the most distinctive aspects of voice transmissions is the subjectivity of voice quality. Standards have been developed to measure the quality of voice transmissions from the perspective of a listener. The standards include Perceptual Speech Quality Measurement (PSQM) and Perceptual Analysis-Measurement System (PAMS).
PSQM is an ITU standard that defines an algorithm for estimating the subjective quality of voice-band speech codecs (coder-decoder). PSQM is an advanced version of MOS (Mean Opinion Score) algorithm, which has been widely used as a subjective means to rate vocoders (voice coder). Scores produced by the PSQM algorithm range from a scale of 1 (ideal) to 5 (poor). The PSQM scores can be converted to MOS scores by a standard formula.
The PSQM algorithm measures distortions of a speech signal when transmitted through various codecs and transmission media. It can effectively measure voice quality on IP networks and wireless networks. Unlike measurement of signal-to-noise ratios, the PSQM algorithm measures distortions in an internal psycho-acoustic domain to mimic the sound perception of people (e.g., phone users) in real-life situations, so that the measured distortions can be correlated with human perceptions. The PSQM algorithm converts signals in a physical domain into the perceptually meaningful psycho-acoustic domain through a series of nonlinear processes. The processes generally include time-frequency mapping, frequency warping, intensity warping, loudness scaling, asymmetric masking, cognitive modeling, and so forth.
Another important standard for measuring clarity of voice signals as perceived by a human is Perceptual Analysis-Measurement System (PAMS). The PAMS uses a perceptual model similar to that of PSQM to provide a repeatable, objective means for measuring perceived voice quality. The PAMS uses an effective signal processing model to produce a number of types of scores. The PAMS provides a “listening-quality” and a “listening-effort” score, both of which can be converted to MOS scores and are on the same 1-to-5 scale.
Voice quality on an IP network can be determined using the above standards together with the conventional measurements for data transmissions. Deterioration of voice quality is often an indicator of a problem in the network. Frequently, the problem in the network is in a hardware module or a software client along a transmission path of the voice signals.