In recent years, many speech transmission and speech storage applications have employed digital speech compression techniques to reduce transmission bandwidth or storage capacity requirements. Linear Predictive Coding (LPC) techniques provide good compression performance and have become particularly popular for such applications. Speech coding algorithms based on LPC techniques have been incorporated in wireless transmission standards including North American digital cellular standards IS-54 and IS-95, as well as the European Global System for Mobile Communications (GSM) standard.
LPC based speech coding algorithms represent speech signals as combinations of excitation waveforms and sequences of all pole filters which model effects of the human articulatory system on the excitation waveforms. The excitation waveforms and the filter coefficients can be encoded more efficiently than the input speech signal to provide a compressed representation of the speech signal.
To accommodate changes in spectral characteristics of the input speech signal, conventional LPC based codecs update the filter coefficients once every 10 ms to 30 ms (for wireless telephone applications, typically 20 ms). This rate of updating the filter coefficients has proven to be subjectively acceptable for the transmission of speech sounds, but can result in subjectively unacceptable distortions for background noise or other environmental sounds.
Such background noise is common in digital cellular telephony because mobile telephones are often operated in noisy environments. Users of digital cellular telephones report subjectively annoying "swishing" or "waterfall" sounds during non-speech intervals, or report the presence of background noise which "seems to be coming from under water".
The subjectively annoying distortions of noise and environmental sounds can be reduced by squelching or attenuating non-speech sounds. However, this approach also leads to subjectively annoying results. In particular, the absence of background noise during non-speech intervals often causes the caller to wonder whether the call has been dropped.
Alternatively, the distorted noise can be replaced by synthetic noise which does not have the annoying characteristics of noise processed by LPC based techniques. While this approach avoids the annoying characteristics of the distorted noise and does not convey the impression that the call may have been dropped, it eliminates transmission of background sounds that may contain information of value to the caller. Moreover, because the real background sounds are transmitted along with the speech sounds during speech intervals, this approach results in distinguishable and annoying discontinuities in the perception of background sounds at noise to speech transitions.
Another approach involves enhancing the speech signal relative to the background noise before any encoding of the speech signal is performed. This has been achieved by providing an array of microphones and processing the signals from the individual microphones according to noise cancellation techniques so as to suppress the background noise and enhance the speech sounds. While this approach has been used in some military, police and medical applications, it is currently too expensive for consumer applications. Moreover, it is impractical to build the required array of microphones into a small portable handset.