Mobile communications are subject to adverse noise conditions. A user listening to a signal received over a communication channel perceives the quality of the signal as being degraded as a result of the ambient noise at the transmitting end of the communication channel (far-end), the ambient noise at the user's receiving end of the communication channel (near-end), and the communication channel itself.
The problem of far-end ambient noise has been extensively addressed through the application of noise reduction algorithms to signals prior to their transmission over a communication channel. These algorithms generally lead to far-end ambient noise being well compensated for in signals received at a user apparatus, such that the fact that a far-end user may be located in a noisy environment does not significantly disrupt a near-end user's listening experience.
The problems of near-end ambient noise and the adverse effects caused by the communication link have been less well addressed.
Near-end ambient noise often has the effect of masking a speech signal such that the speech signal is not intelligible to the near-end listener. The conventional method of improving the intelligibility of speech in such a situation is to apply an equal gain across all frequencies of the received speech signal to increase its total power. However, increasing the power across all frequencies can cause discomfort and listening fatigue to the listener. Additionally, the digital dynamic range of the signal processor in the user apparatus limits the amplification that can be applied to the signal, with the result that clipping of the signal may occur if a sufficiently high gain factor is applied.
Generally in speech signals, vowels are the strongest (most powerful) speech components. Voiced consonants are the next strongest components, and unvoiced consonants are the weakest components. The power distribution as a function of frequency of vowels is weighted heavily towards the low frequency end of the spectrum. In other words, vowels are more powerful in low frequency bands. Voiced consonants also generally exhibit a power distribution weighted towards low frequencies; however the weighting is not as extreme as with most vowels. Some unvoiced consonants (for example ‘s’, ‘f’, ‘sh’) exhibit a power distribution weighted towards higher frequency bands. As the power of the near-end ambient noise increases, a near-end listener first loses the ability to hear the weak consonants which become masked by the noise. The listener can still hear the strong vowels at this ambient noise level. However, as the ambient noise power increases further the vowels also become masked by the noise.
Consonants carry more linguistic information than vowels. In other words, the intelligibility of a speech signal to a near-end listener depends more heavily on the listener's ability to determine the consonants in the speech signal than the vowels. Consequently, the masking effect of near-end ambient noise significantly degrades the intelligibility of a speech signal when the ambient noise is powerful enough to mask the consonants in the speech signal, even if the vowels can still be heard by the listener.
The intelligibility of speech is associated with the formant structure of speech. Voiced sounds extend over a frequency range. Within this frequency range, the power of a voiced sound peaks at a number of frequencies due to the manner in which the sound was created in the vocal tract. These peaks are referred to as formants. The first formant (lowest frequency peak) alone is observed to contribute minimally to the intelligibility of speech. However, a strong correlation is observed between the second formant (next lowest frequency peak after the first formant) and speech intelligibility.
Generally, frequencies between 1.5 kHz and 3.5 kHz are considered to contribute more heavily to the intelligibility of speech than other frequencies.
So as to overcome the problems associated with applying a constant amplification across all frequencies of the speech signal, it has been proposed to amplify the high frequency bands of a speech signal but not the middle frequency bands. Typically, high frequency bands are in the range 2 kHz to 4 kHz, and middle frequency bands are in the range 0.8 kHz to 2 kHz. This approach has the potential to improve the intelligibility of the speech signal by increasing the power of the high frequency consonants, and increasing the power of the second formants without causing increased discomfort to the listener due to unnecessary amplification of the lower frequency bands.
It has also been proposed to use a speech enhancer with a transfer function that approximates the inverse of the Fletcher-Munson curves. The Fletcher-Munson curves approximate the frequency response of the human hearing system at different volume levels. The speech enhancer is configured to apply different gain factors to different frequency bands of a speech signal in dependence on the Fletcher-Munson curves so as to increase the intelligibility of the speech signal.
However, a problem with these methods is that the speech signal has a tendency to be over-amplified in the high frequency bands causing the speech to sound distorted and causing a perceptual imbalance in the overall power distribution (as a function of frequency) of the speech signal.
Additionally, speech signals received over a communication channel suffer from variations in the spectral shape of the signals (distortions) caused by the communication channel.
Additionally, known methods of increasing the intelligibility of speech signals tend to be computationally complex, and are therefore not desirable for use with low-power platforms.
There is therefore a need to provide a user apparatus capable of improving the perceived quality of a speech signal as determined by a listener at the user apparatus when the user apparatus is located in a region of significant ambient noise, using a process that is low in computational complexity.