Reducing the bandwidth associated with a speech signal for coding applications often results in the listener having difficulty in understanding consonant sounds. It is desirable to strengthen the available acoustic cues to make consonant contrasts more distinct, and potentially more robust to subsequent coding degradations. The intelligibility of speech is an important issue in the design of speech coding algorithms. In narrowband speech the distinction between consonants can be poor, even in quiet conditions and prior to signal encoding. This happens most often for those consonants that differ by place of articulation. While reduced intelligibility may be partly attributed to the removal of high frequency information, resulting in a loss of cue redundancy, the problem is often intensified by the weak nature of the acoustic cues available in consonants. It is thus advantageous to strengthen the identifying cues to improve speech perception.
Speakers naturally revise their speech when talking to impaired listeners or in adverse environments. This type of speech, known as clear speech, is typically half the speaking rate of conversational speech. Other differences include longer formant transitions, more salient consonant contrasts (increased consonant-vowel ratio, CVR), and pauses, which are more frequent and longer in duration. Prior art attempts to improve intelligibility involve artificially modifying speech to possess these characteristics. Although increased CVR may lead to improved intelligibility in the presence of noise due to the inherent low energy of consonants, in a noise-free environment, significantly modifying the natural relative CV amplitudes of a phoneme can prove unfavorable by creating the perception of a different phoneme.
Techniques for the selective modification of speech duration to improve or maintain the level of intelligibility have also been proposed. There are two main approaches. The first approach modifies the speech only during steady-state sections by increasing the speaking rate without causing a corresponding decrease in quality or intelligibility. Alternatively, the speech may be modified only during non-steady-state, transient regions. Both approaches result in a change in the signal duration, and both detect and treat transient regions of speech in a different manner from the rest of the signal. For real-time applications, however, the signal duration must remain essentially unchanged.
Thus, there is a need to enhance the intelligibility of narrowband speech without lengthening the overall duration of the signal.