The Public Switched Telephone Network (PSTN) is a collection of interconnected voice-oriented public telephone networks. The PSTN is sometimes referred to as the Plain Old Telephone Service (POTS). Originally, the PSTN was a network of fixed-line analog telephone systems, however the PSTN is now almost entirely digital and includes mobile as well as fixed (also referred to as land-line) telephones. The basic digital circuit in the PSTN is a 64-kilobit-per-second channel known as Digital Signal 0 (DS0). To carry a typical phone call the audio sound is digitized at an 8 kHz sample rate using 8-bit Pulse Code Modulation (PCM).
Multiple DS0s are multiplexed together on higher capacity circuits, such that 24 DS0s make a single DS1 signal, which when carried on copper is known as T1. The European equivalent is known as an E1 and contains 32 of the 64 kbit/s channels. In conventional networks, this multiplexing is moved as close to the end user as possible.
Another network used to carry voice data is known as Voice Over Internet Protocol (VOIP). VOIP allows users to send and receive voice and data information over a combination of a phone network and a digital communications network. A conventional VOIP network includes two gateways with a packet network between the gateways. A gateway is used to convert voice streams carried by conventional equipment into data packets and also to convert data packets into voice. A gateway is equipped with standard interfaces to the PSTN or Private Branch eXchange (PBX) as well as interfaces to a packet network. The necessary encoding/decoding, compression/decompression, voice activity detection/comfort noise generation and packetizing/depacketizing are performed by the gateway. The processing of a voice signal into the format necessary for transport over a packet network is performed by the encoding/decoding subsystem within the gateway also known as a vocoder or alternatively as a codec. In a conventional scenario wherein a first gateway is connected between a PSTN and a packet network and a second gateway is coupled between the packet network and a PBX, the output of the first gateway comprises packetized data, suitable for transmission across the packet network. The second gateway receives the packet data on the packet network. The vocoder within the second gateway depacketizes, decompresses and decodes the packet data into a voice signal.
Yet another network type used to carry voice data is a cellular telephony network. In a cellular telephony network a portable telephone, referred to as a cell phone, sends and receives messages through a cell site or cell tower. Radio waves are used to transfer signals to and from the cell phone. Each cell site (or more simply “cell”) typically has a range of approximately 3-15 miles and overlaps other cells. All of the cells are connected to one or more cellular switching exchanges which can detect the strength of the signal received from the cell phone.
As the telephone user moves to or from one cell area to another, the exchange automatically commands the cell phone and a cell site with a stronger signal (from the cell phone) to go to a new radio channel. When the cell phone responds through the new cell site, the exchange switches the connection to the new cell site.
Another technology used with cellular phones is known as Code-Division Multiple Access (CDMA). CDMA cell phones are not assigned a specific channel but instead cycle through many channels in a pattern specific to each phone. As the user moves from one cell to another, the cell phone actually connects to both sites simultaneously. This is known as a “soft handoff” because, unlike with traditional cellular technology, there is no one defined point where the cell phone switches to the new cell.
Modern mobile phones use cells because radio frequencies are a limited, shared resource. Cell sites and cell phones change frequency under computer control and use low power transmitters so that a limited number of radio frequencies can be reused by many callers with less interference. CDMA handsets, in particular, have strict power controls to avoid interference with each other.
The quality of a voice signal transmitted over a communications network can be evaluated in several ways. One way of evaluating vice quality of a communication comprises using one or more scoring metrics. These metrics may include Perceptual Speech Quality Measurement (PSQM), Perceptual Analysis Measurement System (PAMS), Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS).
Each of PSQM, PAMS, and PESQ metrics measure perceptual speech quality for narrowband (300-3400 Hz) telephone signals. These metrics require active testing, in which a reference voice signal is transmitted across a network, and the received voice signal is compared with the reference signal. Each metric utilizes a mathematical process that measures the differences between the received signal and the reference signal based on factors of human perception, which results in a speech quality score.
The PSQM metric produces scores that reliably predict the results of subjective tests, and reflect a perceptual distance measure. PSQM scores reflect the amount of divergence from a clean signal that a distorted signal exhibits once it has been processed by some telephony system. PSQM scores range from 0 to infinity, the score representing the perceptual distance between the received signal and the reference signal. For example, a score of “0” indicates a perfect match between the received signal and the reference signal, or perfect quality. Higher PSQM scores indicate increasing levels of distortion, or lower quality. In practice, upper limits of PSQM scores range from 6 to 12. One drawback associated with the PSQM metric is that it does not accurately report the effect of distortion when that distortion is caused by packet loss or other types of time clipping.
The PAMS metric comprises a speech quality metric that uses an auditory model to mathematically describe the way a human ear perceives voice, and performs an analysis of errors upon that model. PAMS scores range from 1 to 5, where 5 is the best quality possible. A PAMS score of 4 or above is widely considered “business quality voice.” PAMS scores are usually expressed to two decimal places (4.84, for example). PAMS also splits its criteria into two different areas known as listening effort and listening quality. Listening Effort (LE) is defined as the amount of effort a person must give to understand the received signal. Listening Quality (LQ) is the overall clarity and fidelity of the received signal. PAMS is used to objectively predict results of subjective speech quality tests for networks on which coding distortions as well as packet loss are potential problems. PAMS has gained wide acceptance worldwide as an effective and robust measurement of speech quality in packet voice networks.
The PESQ metric is a combination of PAMS and PSQM. PESQ builds on both of the PAMS and PSQM techniques by adding additional processing steps to account for signal-level differences and the identification of errors associated with packet loss. PESQ provides a score of −1 to 4.5, which is equivalent to the PAMS Listening Quality Score of 1 to 5. PESQ is an effective technique for measuring speech quality on networks with variable delay, filtering, packet or cell loss, and channel errors.
Another metric used for measuring voice quality is Mean Opinion Score (MOS). An example of how MOS is performed is where pre-selected voice samples are played to a mixed group of men and women under controlled conditions. The men and women are asked their opinion of the audio data they have just heard. The scores given by the group are weighed to give a single MOS score ranging from 1 (bad) to 5 (Excellent). Performing this type of subjective testing provides the most comprehensive means for determining and rating the overall perceived voice quality by users.