With respect to speech communication, background noise can include vehicular, street, aircraft, babble noise such as restaurant/cafe type noises, music, and many other audible noises. How noisy the speech signal is depends on the level of background noise. Because most cellular telephone calls are made at locations that are not within the control of the service provider, a great deal of noisy speech can be introduced. For example, if a cell phone rings and the user answers it, speech communication is effectuated whether the user is in a quiet park or near a noisy jackhammer. Thus, the effects of background noise are a major concern for cellular phone users and providers.
In the telecommunication industry, speech is digitized and compressed per ITU (International Telecommunication Union) standards, or other standards such as wireless GSM (global system for mobile communications). There are many standards depending upon the amount of compression and application needs. It is advantageous to highly compress the signal prior to transmission because as the compression increases, the bit rate decreases. This allows more information to transfer in the same amount of bandwidth thereby saving bandwidth, power and memory. However, as the bit rate decreases, speech recovery becomes increasingly more difficult. For example, for telephone application (speech signal with frequency bandwidth of around 3.3 kHz) digital speech signal is typically 16 bits linear or 128 kbits/s. ITU-T standard G.711 is operating at 64 kbits/s or half of the linear PCM (pulse coding modulation) digital speech signal. The standards continue to decrease in bit rate as demands for bandwidth rise (e.g., G.726 is 32 kbits/s; G.728 is 16 kbits/s; G.729 is 8 kbits/s). A standard is currently under development which will decrease the bit rate even lower to 4 kbits/s.
Typically speech coding is achieved by first deriving a set of parameters from the input speech signal (parameter extraction) using certain estimation techniques, and then applying a set of quantization schemes (parameter coding) based on another set of techniques, such as scalar quantization, vector quantization, etc. When background noise is in the environment (e.g., additive speech and noise at the same time), the parameter extraction and coding becomes more difficult and can result in more estimation errors in the extraction and more degradation in the coding. Therefore, when the signal to noise ratio (SNR) is low (i.e., noise energy is high), accurately deriving and coding the parameters is more challenging.
Previous solutions for coding speech in noisy environments attempts to find one compromise set of techniques for a variety of noise levels and noise types. These techniques use one set of non-varying or static decision mechanisms with controlling parameters (thresholds) calculated over a broad range of noises. It is difficult to accurately and precisely code speech using a single set of thresholds that does not, for example, take into account any adjustment of the background noise. Moreover, these and other prior art techniques are not particularly useful at low bit rates where it is even more difficult to accurately code speech.
Accordingly, there is a need for an improved method for speech coding useful at low bit rates. In particular, there is a need for an improved method for speech coding at high compression whereby the influence from the background noise is considered. Even more particular, there is a need for an improved method for selecting threshold levels in speech coding useful at low bit rates and furthermore, the method considers and uses the background noise for adaptive tuning of the thresholds, or even choosing different speech coding schemes.