The intelligibility of speech, simply referred to as speech intelligibility, generally relates to how well speech is understood, and is a measure of the effectiveness of speech communication. A person that talks rapidly or in a slurred manner may be very difficult to understand. However, even a well-spoken message in the native language of the listener can be misunderstood by the listener if the message is not fully audible and/or if it has been distorted on the way to the listener. On the other hand, a synthesized voice for example may be well understood by a listener, but considered harsh, unnatural and of generally low quality, implying that a message lacking quality may still be intelligible. Speech intelligibility is therefore generally not limited to speech quality, but often regarded as a more general measure of the effectiveness with respect to the understanding of the speech.
There exist standardized methods for subjectively measuring speech intelligibility based on listening tests performed by a large number of human talkers and listeners. In ITU-T standard P.800 named “Methods for subjective determination of transmission quality”, a so-called Mean Opinion Score (MOS) method ranged from 0-5 is proposed to evaluate the speech quality of a telecommunication system by listening tests. However, the requirements on this type of tests with a large and carefully selected group of talkers and listeners make the measurements very costly and time-consuming. Besides, it is obviously not possible to real-time feed the speech quality results back into the considered system.
There is therefore a general interest in defining methods for objectively measuring speech intelligibility, thereby eliminating time-consuming and subjective evaluation.
In ITU-T G.168 “Digital network echo canceller”, figures I.6-15/G.168 demonstrated some relations between subjective speech quality and objective measurements, which throw light on the possibility to measure speech quality/speech intelligibility by an objective method.
IEC 60268-16 “Objective rating of speech intelligibility by speech transmission index” in Sound System Equipment is a standardization work that relates to objective methods for determining transmission quality of speech with respect to intelligibility. The methods can be used to compare speech transmission quality at various positions and for various conditions in the same listening space or for assessing a speech communication channel, and in particular for assessing the effect of changes in the acoustic properties, e.g. the effects from echo, reverberation and noise.
A practical configuration of a measuring system for speech intelligibility proposed in IEC 60268-16 typically involves a sound source/loudspeaker and a microphone. A so-called speech transmission index (STI) is measured based on the reduction of modulation of a set of test signals when sounded in a room such as a theater or a concert hall or through a communication channel. For measurements in a room such as a theater, each test signal will typically be transmitted by a sound source towards a microphone system that receives the transmitted sound. More precisely, the speech transmission index (STI) is an objective measure based on the weighted contribution of a number of frequency octave bands within the frequency range of speech. Each frequency octave band signal is modulated by a set of different modulation frequencies to define a complete matrix of differently modulated test signals in different frequency octave bands. A so-called modulation transfer function, which defines the reduction in modulation, is determined separately for each modulation frequency in each octave band, and finally the modulation transfer function values for all modulation frequencies and all octave bands are combined to form an overall measure of speech intelligibility.
The speech transmission index for telecommunication systems STITEL is a simplified version of STI, and may be used instead of STI under typical conditions of a single telecommunication channel.
Although the introduction of the STI methods represents a significant progress with regard to objective measurement of speech intelligibility, these methods are still quite time-consuming because of the relatively large set of test signals required, and they are also customized for measurements in relatively simple settings such as constrained listening spaces or isolated communication channels.
In modern mobile communication networks, there are often more complex technical settings including interrelated signal paths with effects of echo, reverberation and noise. In the digital core network of a public land mobile network (PLMN), for example, there are generally so-called voice enhancement devices such as echo cancellers (EC), noise reduction (NR), mobile cross-talk control (MCC) and level control (LC) devices for improving speech quality and speech intelligibility. On the network side, the echo canceller (EC) is a particularly important voice enhancement device responsible for handling the part of the far-end signal that is reflected into the near-end signal path as a disturbing echo. In such more complex technical settings, there is a demand for effective methods for objectively measuring speech intelligibility. The measurement results could then serve as a basis for optimizing and coordinating the operation of voice enhancement devices in networks such as the digital mobile core network.