The present invention relates to the field of signal processing, and in particular to signal processing of audio signals containing speech.
There are a variety of approaches to improving the speech intelligibility of audio signals. One approach is to improve the noisy audio signal. Another approach is to improve the signals that have been degraded by reverberation and echoes, etc. Yet another approach is that a good audio signal may be modified to make it more intelligible for the hearing-impaired—a method used, for example, in hearing aids. It is also possible to modify a good audio signal so it is more intelligible in the presence of high background noise.
U.S. Pat No. 5,459,813 discloses that “unvoiced sounds” (e.g., consonants) are masked by much stronger “voiced sounds” (e.g., vowels). Since unvoiced sounds are critical for the intelligibility of speech, this patent disclose enhancing these sounds, for example, by clipping or amplitude compression.
The publication entitled “Effects of Amplitude Distortion upon Intelligibility of Speech” by J. C. Liqulider in the Journal of the Acoustical Society of America, October 1946 discloses “peak clipping”. This peak clipping without ambient noise has little effect on the intelligibility of speech. Peak clipping at −20 dB still yields approximately 96% intelligibility. “Center clipping” is considerably worse since the consonants are removed, which are especially critical to intelligibility. Peak clipping at −24 dB requires amplification of only approximately 14 dB to obtain the same intelligibility. In the publication Speech Monographs, March 1960, the article by Elwood Kretsinger et al. entitled “The Use of Fast Limiting to Improve the Intelligibility of Speech in Noise” discloses that consonants are approximately 12 dB weaker than vowels. Thus, by amplifying the consonants relative to the vowels, the intelligibility of speech in the audio signal is increased. Replacing the clipper with a fast peak limiter (22 msec.) enables intelligibility to be increased still further. At −10 dB limiting, intelligibility is increased from 56% to 84%.
From the article by Ian Thomas et al., entitled “The Intelligibility of Filtered-Clipped Speech in Noise” in the Journal of the Audio Engineering Society, June 1970, it is known that the fundamental wave of an audio signal that contains speech contributes very little to speech intelligibility, while the first resonance frequency is extremely important. For this reason, the signal should be high-pass-filtered before clipping.
From the article by Ian Thomas et al., entitled “Intelligibility Enhancement through Spectral Weighting,” in the Proceedings of the 1972 IEEE Conference on Speech Communication and Processing, it is known that, while clipping does improve the intelligibility of speech, it also degrades signal quality. Therefore, this publication proposes shifting the signal energy into the significant frequency ranges.
U.S. Pat. No. 5,479,560 discloses an approach in which the audio signals are broken up into multiple frequency bands, and the high-energy frequency bands are amplified relatively strongly while the others are lowered. This technique is based on the fact that speech is composed of a sequence of phonemes. Phonemes consist of a plurality of frequencies that undergo significant amplification at the resonance frequencies of the mouth and throat cavity. A frequency band with this type of spectral peak is called a formant. Formants are especially important for the recognition of phonemes and thus speech. Therefore, one approach to improving speech intelligibility involves amplifying the peaks (formants) of the frequency spectrum of an audio signal while attenuating the intermediate valleys. For an adult male, the fundamental frequency of speech is in the range of approximately 60-240 Hz. The first four formants are at 500 Hz, 1,500 Hz, 2,500 Hz, and 3,500 Hz as disclosed in U.S. Pat. No. 5,459,813.
U.S. Pat. No. 4,454,609 discloses having the consonants undergo amplification.
U.S. Pat. No. 5,553,151 discloses “forward masking”, wherein weak consonants are temporarily masked by the preceding strong vowels. This patent discloses a relatively fast compressor with an “attack time” of approximately 10 msec., and a “release time” of approximately 75 to 150 msec.
A problem inherent in the known systems for improving the intelligibility of speech in audio signals is their relatively high complexity. That is, there is a high level of complexity in both the software requirement to calculate the individual algorithms and in the hardware requirement. On the other hand, in the simpler systems the audio signal is modified to such an extent that the speech no longer sounds natural. In addition, certain disturbances may be imparted on the speech signal in the simpler systems that may even work against improved intelligibility.
Therefore, there is a need for an apparatus and method of reduced complexity for improving the speech quality of audio signals. In addition, there is a need for an apparatus and method of improving the speech intelligibility of a relatively good audio signal with the volume unmodified. That is, a system wherein the intelligibility remains the same at low volume or that intelligibility is improved in the presence of ambient noise.