Human speech generally has a relatively large dynamic range. For example, the amplitudes of some consonant sounds (e.g., the unvoiced consonants P, T, S, and F) are often 30 dB lower than the amplitudes of vowel sounds in the same spoken sentence. Therefore, the consonant sounds will sometimes drop below a listener's speech detection threshold, thus compromising the intelligibility of the speech. This problem is exacerbated when the listener is hard of hearing, the listener is located in a noisy environment, or the listener is located in an area that receives a low signal strength.
Traditionally, the potential unintelligibility of certain sounds in a speech signal was overcome using some form of amplitude compression on the signal. For example, in one prior approach, the amplitude peaks of a speech signal were clipped and the resulting signal was amplified so that the difference between the peaks of the new signal and the low portions of the new signal would be reduced while maintaining the signal's original loudness. Amplitude compression, signal. In addition, amplitude compression techniques tend to amplify some undesired low-level signal components (e.g., background noise) in an inappropriate manner, thus compromising the quality of the resultant signal.
Therefore, there is a need for a method and apparatus that is capable of enhancing the intelligibility of processed speech without the undesirable effects associated with prior techniques.