1. Field of the Invention
The present invention relates to systems and methods for measuring the quality of service in a communications network, and, more particularly, the perceptual quality of digitally encoded signals in a communications network, without interrupting the flow of the data streams carrying the digitally encoded signals or injecting a known test signal into the communications network.
2. Description of the Related Art
In the measurement and monitoring of voice call quality, one may apply active or passive methods. An active (or intrusive) method is one that transmits test signals through the network to make measurements. A passive (or non-intrusive) method operates on the existing voice calls without adding measurement traffic to the network. Several passive (or non-intrusive) methods and related methods have already been developed or proposed for measuring or monitoring voice call quality and audio signal quality. In L. Sun, and E.C. Ifeachor, “Perceived Speech Quality Prediction For Voice Over IP-Based Networks,” IEEE International Conference on Communications (ICC 2002), Vol. 4, pp. 2573-2577, 2002, for example, an artificial neural network model is developed and proposed for the non-intrusive monitoring of voice-over-IP (VoIP) calls. In A. D. Clark, “Modeling The Effects Of Burst Packet Loss And Recency On Subjective Voice Quality,” 2nd IP-Telephony Workshop, Columbia University, New York, April 2001, a method called “Vqmon” is developed for estimating the transmission quality of VoIP calls. The method is intended for the non-intrusive real-time passive monitoring of VoIP calls. It uses an extended version of the ITU G.107 E-model. VQmon is offered as a commercial product by Telchemy, Inc. (see www.telchemy.com).
In P. Gray, M. P. Hollier, and R. E. Massara, “Non-Intrusive Speech-Quality Assessment Using Vocal-Tract Models,” IEE Proceedings—Vision, Image and Signal Processing, vol. 147, No. 6, pp. 493-501, December 2000, an automatic speech recognition (ASR) system is used to evaluate VoIP speech quality for the G.723 and G.729 codecs. Results are presented for the ‘recognition accuracy’ as a function of various packet loss rates and packet sizes. In S. Mohamed, F. Cervantes-Perez, and H. Afifi, “Audio Quality Assessment In Packet Networks: An ‘Inter-Subjective’ Neural Network Model,” 15th International Conference on Information Networking, pp. 579-586, 2001, a neural network model approach is developed for audio signals. In W. Jiang, and H. Schulzrinne, “Speech Recognition Performance As An Effective Perceived Quality Predictor,” 10th IEEE International Workshop on Quality of Service, pp. 269-275, 2002, a mean-opinion-score (MOS) estimation method is developed based on machine speech recognition. In “3SQM™ Advanced Non-Intrusive Voice Quality Testing,” White Paper, Opticom GmbH, Germany, 2003, a new non-intrusive method is proposed.
In recent years, packet-switching networks have been used increasingly for transport of real-time media signals, such as, for example, digitally-encoded voice and video signals, that are transmitted either in real-time or in some kind of delayed fashion. A specific example is the increasing use of the Internet for carrying voice-over-internet protocol (VoIP) calls. In a VoIP call, a digitally encoded voice signal is packetized and incorporated into an Internet Protocol (IP) packet stream, which is then transmitted over the Internet to a destination device. At the destination device, the digitally encoded voice signal is extracted from VoIP packet payloads in the packet stream and then decoded into a signal that is played-out in real-time to the user at the destination device.
When a packetized real-time media stream is transmitted across a packet-switching network, the packet stream may be corrupted by a number of network impairments. Examples of network impairments include packet-discarding at routers due to packet bit errors, packet-dropping at interface buffers due to traffic congestion, packet-duplication, time delays beyond a predetermined hard or soft real-time deadline, packet-misrouting and loss of packet-sequence. These impairments generally degrade the quality of the media signal that is eventually received at the destination.
Due to network impairments that may be encountered in the transmission of real-time media over a packet-switching network, it is important to be able to measure and monitor the quality-of-service (QoS) that is being provided by the network. Typical network QoS measures include, for example, end-to-end packet delay, end-to-end packet delay jitter, packet corruption, and packet loss. To monitor such network QoS measures, one can deploy commercially available monitoring systems.
Although monitoring and measuring the QoS in a network provides valuable information regarding the ability of a network to properly support real-time media signal transmissions, such measures do not directly reflect the perceptual (subjective) quality of the media signal as it is actually perceived by an end-user. This is the case because the perceptual quality of a real-time media signal, as perceived by the end-user, is difficult to quantify in terms of the network QoS measures.
To deal with this general problem, objective methods have been developed for estimating perceptual quality of media signals. For example, perceptual speech quality measurement (“PSQM”) is a means for objectively assessing the quality of speech that has been degraded by a telephony network. It has a high correlation to perceptual quality across a range of distortion types, and is used to test networks that are subject to different coding types and transmission errors. PSQM is used primarily to test networks that have speech compression, digital speech interpolation, and packetization. PSQM of this type has been recommended by the International Telecommunication Union-Telecommunications Standardization Sector (“ITU-T”) Recommendation P.861. Another example of an objective method for voice signals is ITU-T Recommendation P.862 (PESQ).
FIG. 1 depicts the basic approach of PSQM that has been adopted by some organizations to estimate the perceptual quality of a VoIP call that traverses a packet-switching network. In an objective method, a source of speech 10, generates a known signal 12, which is then transmitted across the network 17. The known signal 12 may be a pre-recorded natural voice signal, or a specialized test signal, such as the Artificial Speech-like Test Stimulus (ASTS™) or ITU-T Recommendation P. 50. The known signal for use in PSQM may be stored in a commonly used file format such as, for example, a wave (.WAV) file. The process of transmitting a known signal to evaluate the degradation in quality after it has traversed the network may be termed an active method because the known test signal is actually injected into and transmitted across the network.
PSQM uses a psychoacoustic model 14, which aims to mimic the perception of speech in real life, and was originally developed to test compressor/decompressors (codecs). A codec, which typically comprises a software-, hardware- or firmware- based algorithm, translates speech, video or audio signals between their uncompressed form and the compressed forms in which they are typically transmitted. The algorithm functions by comparing the quality of signal 12 before it has been transmitted across the network 17 to the quality of signal 12 after it has been transmitted across the network 17 (i.e., comparing input signal 13 to input signal 15).
PSQM provides an output 20 in the range of 0 to 6.5, where values close to 0 indicate very good speech quality, and values close to 6.5 indicate poor speech quality. At the destination, a quality measure or score (e.g. mean opinion score (MOS)) is computed 22 based on the received signal and the known artificial voice signal that was transmitted and is output 24. Although PSQM does not have a direct correlation to MOS, the perceptual quality is nevertheless inferred from the objective quality. That is, if a person listens to a speech sample that has a PSQM value of 2, that person would think the quality was worse than a speech sample having a PSQM value of 1. PSQM values can be roughly translated into MOS values.
Accordingly, there is a need for an accurate perceptual measurement system for determining the mean opinion score and quality of packet media streams in a packet-switched network and framed encoded media (e.g., voice, video and audio) signals in a non-packetized (or circuit switched) communications network. There is a further need for a passive (non-intrusive) objective perceptual measurement system that utilizes objective measurement tools. There is still a further need for a perceptual quality measurement system that does not depend on whether the signal being measured is packetized or not.