This invention relates to measurement of error characteristics of a communication channel. The invention is of particular use for measuring perceived transmission performance of the communication channel.
Signals carried over telecommunications links can undergo considerable transformations, such as digitization, compression, encryption and modulation. They can also be distorted due to the effects of transmission errors. It is highly desirable to be able to determine the combined effect of such transformations and transmission errors on the quality of the received signal as perceived by a human.
The present invention is concerned with channels employing transport of frames of data. Packet based communication systems such as the Internet Protocol (IP) defined in Internet Engineering Task Force (IETF) request for comment (RFC) number 791 are a good example of a frame based communication system. Packet systems typically comprise a number of routing nodes that buffer data prior to forwarding packets towards their final destination. A feature of such networks is that the time taken for packets to transit the network is not constant because the buffering delay at each node depends upon its instantaneous load level. This variation in packet transit times is often called jitter. In a complex packet system, such as a wide area network (WAN), individual packets belonging to a media stream may take different routes, and hence packets may arrive at their destination in a different order.
The most common form of transmission error in packet networks is packet loss, which occurs when a packet is discarded at the final receiver or at an intermediate routing node. In general, a packet is discarded because:                transmission errors have been detected in the contents of the packet;        a buffer is full and the packet cannot be stored;        the packet has arrived too late to be of use;        
Packet systems that transport real-time data such as speech, audio and video streams often employ a protocol to compensate for the effects of variable transmission delay, packet reordering and packet loss. A good example is the real-time transport protocol (RTP) defined in IETF RFC number 1189. RTP is intended to be operated over an IP network using the user datagram protocol (UDP) defined in IETF RFC number 768. In addition to the encoded media data, RTP packets include a sequence number and a timestamp—the sequence number facilitates the reordering of packets that arrive out of order, the identification of late packets and the detection of missing packets; the timestamp is used to buffer packets at the receiver such that the net transmission delay is constant. The buffer at the receiver is sometimes referred to as a jitter buffer. The additional delay introduced by the jitter buffer is a trade-off between minimising the overall delay of the communication channel and minimising the number of packets discarded due to late arrival. In some schemes, the buffering delay can be changed adaptively to match changes in jitter statistics. In an IP communication system, the detection of transmission errors in packets is often performed by the link layer, for example ethernet.
In a typical implementation, frames of data extracted from packets in the jitter buffer are passed to a signal decoder in the order in which they were originally generated by a signal encoder. If a frame of data is missing due to a lost or late packet, this is indicated to the signal decoder by means of frame classification data. In its most simple form, the frame classification may be a binary flag indicating whether valid data is available or not. If a packet containing multiple frames of data is lost, all of the corresponding frames will be marked as being unavailable.
In the event of missing data, the signal decoder will typically produce an output signal with a duration corresponding to the missing input data. This is necessary to keep the delay across the communication channel constant. One solution to lost packets in a speech system is to mute the output of the signal decoder for the period corresponding to the missing data. A more effective solution frequently used in code excited linear predictor (CELP) speech decoders is to repeat the last known value of parameters that are known to change slowly, such as pitch and linear predictor coefficients, and to synthesise random values for the other parameters, such as the stochastic codebook index. A strategy used in video decoders it to simply freeze the output. Such techniques are commonly called error concealment in the art.
In a radio system employing frame based transmission, symbol errors may occur when a transmitted symbol is incorrectly decoded by a receiver. Many transmission schemes include forward error correction (FEC) techniques that allow a limited number of transmission errors to be corrected. The symbol errors that are introduced by the transmission link are commonly called raw errors, whilst errors that remain after the application of FEC decoding are commonly called residual errors. If there is no FEC then the residual errors are equivalent to the raw errors.
Bad frames are frames of data that contain symbol errors that have been detected, but not corrected. The error detection mechanism may be a by-product of an FEC scheme or the result of a specific checksum calculation. In some schemes, a frame of data is classified as bad if an error is detected in any symbol position. In other schemes, a frame is only classified as bad if errors are detected in particular symbol positions within the frame. This latter technique is often used in unequal error protection (UEP) transmission schemes.
UEP is frequently employed in speech or video transmission systems where the contribution of a symbol to the perceived quality of the transmission depends upon its position within a frame. The error protection scheme is said to be unequal if more powerful FEC is applied to the most important symbol positions at the expense of weaker protection of less important symbol positions. Groups of symbols that receive the same level of FEC are said to belong to the same symbol class. UEP schemes typically only provide a checksum for the most important symbols, and hence only those frames received with a residual error in one or more of the most important symbol positions are classified as bad frames. This approach has been found to yield better overall transmission quality in systems where the presence of residual errors in the least important symbols is, on average, less deleterious than the effect of discarding every frame that contains one or more residual errors. A good example of such a UEP scheme is that specified for the global system for mobile communications (GSM) adaptive multi-rate (AMR) speech service in European Telecommunications Standardisation Institute (ETSI) technical specification GSM 05.03.
For any checksum, there is a finite probability that the checksum will be valid for a corrupted frame. For very short checksum lengths, this probability can become significant and undetected bad frames can become a problem. In this situation, it is common to implement additional bad frame detection techniques—many examples being based on the internal variables of a Viterbi FEC decoder. Such additional checks only indicate the probability that a frame is corrupted, and may therefore be classified differently to an invalid checksum. In a variation of bad frame classification, the AMR speech service described in the ETSI GSM specifications provides a class for frames with uncorrupted Class 1 bits (the most important bits) and the possibility of errors in the Class 2 bits (which are not protected by the checksum).
Bad frames in a radio system may be handled in much the same way as described for the packet based system described above.
Variable rate coding is a known form of signal coding whereby the encoding rate can be changed on a frame-by-frame basis. The rate may be changed according to characteristics of the input signal or due to knowledge of the capacity of the transmission network. In a variable rate communication system, the data passed to the signal decoder may therefore have a multi-level frame classification that includes an indication of the rate at which the data was encoded.
Layered or embedded coding is a known form of signal coding whereby encoded data is divided into so-called ‘core data’ and ‘enhancement data’. The core data is the minimum information required to generate an output frame at the decoder without using error concealment techniques. The enhancement data is used to improve the perceived quality of the decoded signal, if available. The advantage of layered coding is that enhancement information can be sent in packets marked as being lower priority than packets containing core information. During periods of congestion, routing nodes can discard the lower priority enhancement packets to reduce the number of lost core packets. In a layered communication system, the data passed to the signal decoder may therefore have a multi-level frame classification that indicates which layers (or types of data) are available (if any).
Other examples of systems where packets or frames of information may be classified according to detected transmission errors in a received frame include radio communication systems such as that described in the Third Generation Partnership Project (3GPP) series of specifications for a so-called third generation public land mobile radio system (PLMN).
Objective processes for the purpose of measuring the perceived quality of a signal are currently under development and are of application in equipment development, equipment testing, and evaluation of system performance.
A number of patents and applications relate to this field, for example, European Patent 0647375, granted on 14th Oct. 1998. In this invention two initially identical copies of a test signal are used. The first copy is transmitted over a communications system under test. The resulting signal, which may have been degraded, is compared with a reference copy to identify audible errors in the degraded signal. These audible errors are assessed to determine their perceived significance—that is, errors that are considered significant by human listeners are given greater weight than those that are not considered so significant. In particular inaudible errors are irrelevant to perception and need not be assessed.
This system provides an output comparable to subjective quality measures originally devised for use by human subjects. More specifically, it generates two values, YLE and YLQ, equivalent to the “Mean Opinion Scores” (MOS) for “listening effort” and “listening quality”, which would be given by a panel of human listeners when listening to the same signal. The use of an automated system allows for more consistent assessment than human assessors could achieve, and also allows the use of compressed and simplified test sequences, which give spurious results when used with human assessors because such sequences do not convey intelligible content.
In the patent specification referred to above, an auditory transform of each signal is taken, to emulate the response of the human auditory system (ear and brain) to sound. The degraded signal is then compared with the reference signal after each has been transformed such that the subjective quality that would be perceived by a listener using the network is determined from parameters extracted from the transforms.
Such automated systems require a known (reference) signal to be played through a distorting system (the communications network or other system under test) to derive a degraded signal, which is compared with an undistorted version of the reference signal. Such systems are known as “intrusive” measurement systems, because whilst the test is carried out the channel under test cannot, in general, carry live traffic.
Measurement systems that do not require a reference signal are known as “non-intrusive”. A description of such a system is provided in the literature (Non-intrusive speech quality assessment using vocal-tract models, Gray P.; Hollier M. P.; and Massara. R. E.; IEE Proceedings—Vision, Image and Signal Processing, 147 (6), 493–501, December 2000.). Such systems are not, in general, as accurate as intrusive measurement systems but have the advantage that they can be used on revenue earning traffic.
German patent application DE 4324292 discloses the measurement of a bit error rate (BER) over a period of time, the formation of a statistical representation therefrom, and the use of a transform to map the statistical representation to a measure of the speech quality of a digital mobile radio system. The invention is characterised by the fact that the mapping is derived from the results of subjective experiments. The application discloses the derivation of speech quality based on the analysis of BER and the use of the mean, standard deviation and probability distribution of a plurality of bit error measurements. Patent application DE 4324292 does not describe the use of a frame classification algorithm. The only specific means of generating the required bit error information described in the embodiment and claims of DE 4324292 is the RXQUAL parameter produced by GSM systems. RXQUAL is a coarse estimate of BER prior to channel decoding measured over a period of 480 ms (in other words the raw BER). However, it is known that the ability of a FEC decoder to correct errors depends on the bit-by-bit burst characteristics of the raw errors. Such detailed burst information is lost in the averaging over 10,944 bits performed in the RXQUAL calculation, and the embodiment described in DE 4324292 is unlikely to provide a reliable estimate of speech quality across a wide range of radio propagation conditions. This conclusion is confirmed in the literature (Radio link parameter based speech quality index-SQI; Karlsson, A.; Heikkila, G.; Minde, T. B.; Nordlund, M.; Timus, B.; Wiren, N; Proceedings of ICECS '99. The 6th IEEE International Conference on Electronics, Circuits and Systems, Volume: 3, 1999 Page(s): 1569–1572 vol. 3).
US patent application U.S. Pat. No. 6,157,830 discloses an arrangement whereby radio link parameters are converted into a set of temporal parameters that are combined to yield a set of correlated parameters that are in turn mapped into a speech quality measure by means of an estimator. This patent discloses the derivation of temporal parameters from measures of raw BER over 0.5 second intervals, the mean frame erasure rate calculated over a 5 second interval and the calculation of the number of consecutive frame erasures in a 5 second interval. The patent goes on to disclose the statistical analysis of the temporal parameters, providing maximum value, minimum value, mean value, standard deviation, skewness, and kurtosis as examples.
International patent application WO 01/97414 describes a method of determining the perceived quality of a speech transmission system by using a measure of link quality to retrieve a previously stored perceived quality score calculated for the same link quality. The pre-calculation of the perceived quality score for a given link quality is performed by: 1) using a description of the link quality to degrade a copy of a test signal; 2) deriving the corresponding perceived quality score by using an intrusive objective speech quality measurement algorithm to compare the degraded version of the test signal with an undegraded version. WO 01/97414 discloses that bit error rate, packet delay variation, and packet loss characteristics (number of packets lost and any pattern to them) are suitable measures of the link quality for mapping to a perceived quality score, but does not provide any specific description of statistical representations of these parameters.