A formant is a concentration of acoustic energy in or around a particular frequency in a speech signal. Intelligibility of speech is heavily dependent on the audibility of higher formants. However, in the presence of listener noise the higher formants may be masked by the noise and, as a result, speech may become less intelligible. If a reasonable spectrum of listener background noise is available then the speech spectrum may be appropriately modified to make the formants audible. However, that is not always possible.
Typical speech intelligibility improvement algorithms work on pulse code modulated (“PCM”) streams. The algorithms spectrally rebalance the signals so that higher formants are boosted with respect to the first formant. Typical problems with intelligibility occur when these higher formants are masked by noise.
An inherent problem with working on PCM streams is that if the input to, and the output from, the algorithm is a compressed bit stream (e.g. adaptive multi-rate (“AMR”) or Global System for Mobile Communications-half rate (“GSM-HR”) then decoding steps and re-encoding steps have to be performed within the algorithm. The decoding step converts the bitstream to a linear domain (e.g., sample-by-sample) PCM stream, the spectral rebalancing step applies time varying filters to speech and performs spectral tilt and the encoding step converts PCM stream back to the expected bitstream. One issue with this approach is that the decoding and encoding steps degrade the speech quality (i.e., tandem coding effect).