Conventional telephone conversations take place over a circuit switched network. A circuit-switched network involves a dedicated physical path for a single connection between two end-points for the duration of the connection. In the Public Switched Telephone Network (PSTN), a telephone service provider dedicates a physical path between two end-points to a called number for the duration of a call.
In contrast to circuit switched networks, packet switched networks can be used to transmit telephone calls without requiring a dedicated connection, which leads to reduced costs. Packet switched networks typically use protocols to divide messages or data into packets. Division into packets allows each packet to be transmitted individually. In most packet switched networks, packets are allowed to follow different routes to a destination. After the packets arrive at the destination, they can be recompiled into the original message. An example packet switched network is the global computing network often referred to as the Internet.
Example packet switched networks may use Transmission Control Protocol/Internet Protocol (TCP/IP), X.25, and Frame Relay protocols. In contrast to circuit switched networks that were conventionally used for real-time communications, packet switching allows for delays in transmission, and provides extra control such as retransmission of data, recognition of duplicate messages, flow control mechanisms, etc. In general, packet switched networks provide a robust system for information transfer. Additionally, packet switched networks provide a low cost solution for information transfer since it does not require dedicated leased paths between endpoints.
Improvements in communications and computing technologies allow conventional real-time applications over a packet switched network. For example, in a voice over Internet Protocol (VoIP) network, the audio phone information is converted from analog to digital and encapsulated in packets to send through packet switched networks. This allows for delivery of audio information at a much lower monetary cost than through a dedicated PSTN circuit, however it has an associated cost in the quality of the communication.
Low-bit-rate audio codecs (coder/decoders) and digital signal processing (DSP) techniques may be employed to conserve bandwidth in voice communications, but may degrade the quality of a voice signal.
Various means have been employed to measure voice quality in telecommunications networks. For linear systems, objective audio measurements such as frequency response and signal-to-noise ratios are typical. To estimate user experience, subjective test methodologies such as ACR (absolute category ranking) are employed. MOS (mean-opinion-scores) is an example of an ACR test, in which users are presented with audio material and make listening judgments about quality on a five-point scale (1-bad, to 5-excellent).
Voice-over-Packet systems require new quality metric methodologies. MOS tests are non-real-time experiments involving human listeners, and cannot be run directly on revenue-generating calls, although predictions of MOS scores can be made. The use of non-linear, low-bit-rate audio codecs such as ITU standard G.729 means that some traditional measurements of audio quality such as frequency response cannot be used since linear methods cannot characterize a non-linear system.
For voice-over-packet transmission systems, it is desirable to monitor the voice quality of a particular connection or call for test purposes, and in order to monitor or regulate service. An example of service regulation is an SLA or service level agreement between telecommunication service providers which dictates minimum performance standards and specifies penalties for non-compliance.
It is desirable for a voice-over-packet endpoint to be capable of measuring quantities relevant to its local voice quality situation, and to report these quantities to concerned higher level entities, for example to billing or logging servers. It is further desired that such measurements be objective or numerical in nature, simple, unambiguous, require very little computation, and be based on information sources that are readily available in the endpoint. It is desired that such measurements be useful, for example, producing an output which can be employed directly by concerned entities, without requiring additional translation or calculation.
Finally, it is desired that such a metric be both perceptually relevant and effective, meaning strongly correlated with the subjective experience of users.
A wide variety of methods for measuring and reporting voice quality statistics of interest from voice-over packet processing devices are employed. Many of these techniques follow directly from established practices in both the traditional voice-telecom and data-communications fields, which reflects the hybrid audio/data nature of voice-over-packet systems.
The fundamental transport mechanism for voice-over-packet telecommunication systems is data packets, which are generated at a transmitter and sent at regular, short intervals to a receiver. The primary voice quality impairment of interest is the phenomenon of ‘packet loss’, in which packets from the transmitter do not arrive at the receiver at the required moment, for whatever reason. The receiver is then forced to generate a ‘fake’ or ‘concealment’ audio frame in an attempt to minimize the user annoyance that would result from the audio dropout or silence caused by the missing audio.
The phenomenon of packet loss is widely understood to be a primary source of voice quality degradation due to transmission network impairment. Voice-over-packet equipment has typically reported packet counts, such as ‘packets received’, ‘packets lost’, ‘late packets’, ‘early packets’ etc. as a primary voice quality metric. This is a metric which comes from data-network viewpoint. However, packet loss counts can be ambiguous, as vendors employ widely different definitions of packet ‘loss’ and ‘discard’ events. Also, since these quantities are typically reported only at the end of a call, it is not possible to determine from a packet loss count or rate whether the loss events occurred in a single burst, or spread out over time.
In the transition of the PSTN from analog to digital transmission, telecom engineers defined a formulation known as ‘errored seconds’, applied to digital transmission trunks, for use in billing, troubleshooting, and SLA monitoring. In the digital trunk case, the fundamental voice quality error mechanism is the bit-error. A ‘T1/E1 errored second’ and ‘T1/E1 severely errored second’ were defined as one-second intervals in which >TE or >TS bit errors were observed, with TE and TS being thresholds for ‘errored’ and ‘severely errored’ respectively. This is a well-known formulation for expressing the impact of an impairment whose intensity is expected to change over time.
It has been suggested to express the data-oriented metric of interest, packet loss, in the telecom-oriented errored-seconds formulation to obtain a time-based voice quality metric relevant to voice-over-packet systems. However, there are fundamental problems with a direct analogy. All packet loss leads to audio concealment, but not all audio concealment is the result of packet loss. Clock skew, clock drift, and internal equipment factors are additional phenomena which also lead to audio concealment.
What is needed is an objective metric for audio transmitted through a packet switched network, and particularly for voice data transmitted over a packet switched network.