1. Field of Invention
The present invention relates generally to telephony and, more particularly, to measuring the level of speech distortion in transmitted voice waveforms.
2. Discussion of the Related Art
When viewed from the perspective of the user of a telephone, the quality of a voice telephone connection depends in very large part on how the speaker's voice on the other end of the call sounds to the listener. In particular, it is well known that users will base their assessment of the quality of each call on what might be called clarity, as determined by at least four independent characteristics:
(1) Volume of the received voice signal, which will determine whether the user will find the speech to be too loud or too soft; PA1 (2) Noise on the line, such as static, popping, and crackle, which will determine whether the listener will have difficulty separating the speech from background noise; PA1 (3) Echo on the line, which will determine whether speakers will be distracted by hearing their own voice echoed back to them as they are talking; and PA1 (4) Speech distortion, caused by conditions on the telephone connection that will make the distant speaker sound "tinny," or "raspy," or otherwise distort the voice in ways that cannot be duplicated in natural, face-to-face conversation.
Of these four characteristics, the first three have been present in telephone networks from the beginning. The fourth, speech distortion, however, has only occurred with the advent of modern digital telephone networks. The reason why this occurs in digital telephone networks is that nearly all of the possible causes of perceptible speech distortion over telephone connections stem from malfunctions in the analog-to-digital (A/D) and digital-to-analog (D/A) conversions, or in the transport of digitally encoded voice signals. Speech distortion from these sources are caused, for example, by overdriving of the A/D converter, which produces "clipping" of the waveform that makes speech sound mechanical, encoding that produces high levels of "quantizing" noise that makes speech sound "raspy," and malfunctions or high bit error rates in the digital transport, which results in analog waveforms at the distant end of a connection that could not possibly be produced by the human voice.
Because of the competition for customers that has emerged with the demise of the single-provider monopolies in global telephony, the quality of telephone services in general, and the question of clarity of calls, in particular, have become major concerns in marketing telephone services. Such concerns have, in turn, created ever-increasing demands for capabilities to monitor, and maintain the clarity of, telephone services to ensure that users will remain satisfied with the service they are purchasing.
Various techniques have been developed for monitoring and evaluating the factors that affect clarity of transmitted voice telephone signals. For example, techniques have been developed for refining test capabilities, establishing standards and providing models for collecting and interpreting samples of objectively measurable characteristics of telephone connections such as loss, noise, slope distortion, signal fidelity and echo path loss and delay. Further, techniques have been developed for non-intrusive monitoring which enables the collection of data from live conversation without intruding on, or illegally listening to, live telephone conversations, and thereby obtain measurements of speech power, line noise and echo path loss and delay.
Such telephone measurement techniques and technologies, together with various interpretation models have enabled the development of practices for timely detection and correction of adverse effects relating to low volume, noise and echo characteristics. Additionally, these measurement techniques have provided standards for the design of new telephone systems as well as standards for management of systems that has increased the clarity with regard to three of the clarity factors, i.e., noise, low volume and echo.
However, it would also be desirable to provide a system which is capable of processing data from live telephone conversations to measure speech distortion created in voice signals transmitted by modem digital and/or packet switched voice networks. Various techniques have been used in an attempt to measure speech distortion in digitally mastered waveforms and pseudo speech signals to predict user perception of speech distortion under various conditions. For example, a technique known as PAMS, that was developed in the United Kingdom, uses a recording of digitally mastered phonemes. According to this process, the digitally mastered phonemes are transmitted over a telephone system and recorded at the receiving end. The recorded signal is processed and compared to the originally transmitted signal to provide a measurement of the level of distortion of the transmitted signal.
Other commonly used methods of measuring distortion in audio signals have included the introduction of a sinusoidal waveform at the input of the audio signal and an analysis of the output of the audio channel to detect harmonics and other components that were not part of the original signal. This methodology, however, has certain limitations. Chief among these limitations is that the method provides no basis for assessing the user perception of speech distortion. Essentially, what this means is that there is no means for correlating what happens to individual frequencies with the overall effect of those distortions on user perception.
Further, each of these techniques are only effective when known signals are transmitted. The PAMS technique requires the transmission of a special signal containing special phonemes and a comparison of the transmitted signal with the received signal. The second technique requires transmission of sinusoidal waveforms on the audio channel. It would therefore be advantageous to provide a system that would allow measurement and interpretation of speech distortion that uses samples of natural speech from live telephone conversations and does not require the introduction of special signals or comparison with an original signal. It would also be advantageous to be able to sample such signals in a nonintrusive monitoring situation that enables collection of data from live conversations.