When talking in a noisy environment, a talker adjusts the level and the spectral content of his/her speech based on the level of ambient noise to make his/her speech more intelligible. This is called the Lombard effect (see e.g., J. C. Junqua, “The Lombard reflex and its role on human listeners and automatic speech recognizer,” J. Acoustic Soc. Amer., Vol. 93, 1993, 510-524.). When mobile terminals are used in noisy environments, it is desired that the terminal behave in a similar manner (i.e., the speech in the received down-link signal should be processed in such a way that the resulting speech emitted from the electro-acoustical transducer (e.g., loudspeaker) of the terminal is as intelligible as possible when reaching the ear of the user of the terminal.
In several studies, speech intelligibility has been improved by increasing the power of the speech signal (see e.g., “The influence of first and second formants on the intelligibility of clipped speech,” J. Audio Eng. Soc., vol. 16, pp. 182-185, 1968; R. J. Niederjohn and J. H. Grotelueschen, “The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, pp. 277-282, August 1976; and J. Lynch, “A methodology for evaluating the performance of dynamic range control algorithms for speech enhancement”, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87. Volume 12, April 1987 Page(s):153-156). Examples of applications that do this are adaptive level controllers and compressors (see e.g., J. Lynch, “A methodology for evaluating the performance of dynamic range control algorithms for speech enhancement”, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87. Volume 12, April 1987 Page(s):153-156).
For a signal that already is close to its digital overload level it is, however, not possible to increase intelligibility by increasing the power level of the speech, since this would cause digital clipping and hence a distortion of the signal. Accordingly, a method that preserves the level of the speech while optimizing the spectral characteristics is of interest.
Studies have shown that emphasizing the second formant of the speech relative to the first formant may improve the intelligibility of the speech while maintaining the overall signal power (see e.g., J. C. Junqua, “The Lombard reflex and its role on human listeners and automatic speech recognizer,” J. Acoustic Soc. Amer., Vol. 93, 1993, 510-524; I. B. Thomas, “The second formant and speech intelligibility,” in Proc. Nut. Electronics Conf., vol. 23, 1967, pp. 544-548; and “The influence of first and second formants on the intelligibility of clipped speech,” J. Audio Eng. Soc., vol. 16, pp. 182-185, 1968).
Methods based on linear filtering for improving intelligibility are discussed in B Sauert, G Enzner, and P. Vary, “Near end listening enhancement with strict loudspeaker output power constraining”, International Workshop on Acoustic Echo and Noise Control, IWAENC 2006, September 12-14, Paris, France. A method of producing an equal SNR at all frequencies is presented and also a method doing the opposite (i.e., attenuating the signal at inaudible frequencies and amplifying the audible frequencies).
It should be noted that the methods of altering the spectral characteristics of the signal may be used in conjunction with a method that raises the overall level. Prior altering the spectral characteristics, a frequency independent gain may be applied to raise the overall signal level, if the overload point not is reached.
A problem with methods that alter the spectral characteristics of the emitted speech is the inability to obtain the maximum desired effect in a controlled manner. What is desired, therefore, are improved systems and methods for improving intelligibility of speech in a noisy environment.