In recent years, many speech transmission and speech storage applications have employed digital speech compression techniques to reduce transmission bandwidth or storage capacity requirements. Linear predictive coding (LPC) techniques providing good compression performance are being used in many speech coding algorithm designs, where spectral characteristics of speech signals are represented by a set of LPC coefficients or its equivalent. More specifically, the most widely used vocoders in telephony today are based on the Code Excited Linear Predictive (CELP) vocoder model design. Speech coding algorithms based on LPC techniques have been incorporated in wireless transmission standards including North American digital cellular standards IS-54B and IS-96B, as well as the European global system for mobile communications (GSM) standard.
LPC based speech coding algorithms represent speech signals as combinations of excitation waveforms and a time-varying all pole filter which model effects of the human articulatory system on the excitation waveforms. The excitation waveforms and the filter coefficients can be encoded more efficiently than the input speech signal to provide a compressed representation of the speech signal.
To accommodate changes in spectral characteristics of the input speech signal, conventional LPC based codecs update the filter coefficients once every 10 milliseconds to 30 milliseconds (for wireless telephone applications, typically 20 milliseconds). This rate of updating the filter coefficients has proven to be subjectively acceptable for the characterization of speech components, but can result in subjectively unacceptable distortions for background noise or other environmental sounds.
Such background noise is common in digital cellular telephony because mobile telephones are often operated in noisy environments. In digital telephony applications, far-end users have reported subjectively annoying "swishing" or "waterfall" sounds during non-speech intervals, or report the presence of background noise which "seems to be coming from under water".
The subjectively annoying distortions of noise and environmental sounds can be reduced by attenuating non-speech sounds. However, this approach also leads to subjectively annoying results. In particular, the absence of background noise during non-speech intervals often causes the subscriber to wonder whether the call has been dropped.
Alternatively, the distorted noise can be replaced by synthetic noise which does not have the annoying characteristics of noise processed by LPC based techniques. While this approach avoids the annoying characteristics of the distorted noise and does not convey the impression that the call may have been dropped, it eliminates transmission of background sounds that may contain information of value to the subscriber. Moreover, because the real background sounds are transmitted along with the speech sounds during speech intervals, this approach results in distinguishable and annoying discontinuities in the perception of background sounds at noise to speech transitions.
Another approach involves enhancing the speech signal relative to the background noise before any encoding of the speech signal is performed. This has been achieved by providing an array of microphones and processing the signals from the individual microphones according to noise cancellation techniques so as to suppress the background noise and enhance the speech sounds. While this approach has been used in some military, police and medical applications, it is currently too expensive for consumer applications. Moreover, it is impractical to build the required array of microphones into a small portable headset.
One effective solution to the problem of noise distortions occurring when LPC type codecs are used is presented in the application PCT/CA95/00559 dated Oct. 3, 1995. The solution involves the detection of background noise (or equivalently, the detection of the absence of speech), at which time the parameters of the speech encoder or decoder would be manipulated in order to emulate the effect of an LPC analysis using a very long analysis window (typically this window may be in the order of 400 milliseconds or 20 times the typical analysis window). This process is supplemented with a low-pass filter designed to compensate for the slow roll-off of the LPC synthesis filter when the input signal consists of broadband noise.
While this procedure is very effective in dealing with background noise artifacts, it does assume access to either the speech encoder or the speech decoder. However, there are cases where it would be desirable to apply this background noise conditioning procedure, with access limited to the compressed bit stream only. One such example is a point-to-point telephone connection between two digital cellular mobile telephones. Normally, in this type of connections the speech signal undergoes two stages of speech coding in each direction, causing degradation of the signal. In the interest of improved sound quality, it is desirable to remove the speech decoder/speech encoder pair operating at each of the base-stations servicing the two mobile sets. This can be achieved by using a bypass mechanism that is described in the international patent application PCT/CA95/00704 dated Dec. 13, 1995. The contents of this application are incorporated herein by reference. The basic idea behind this approach is the provision of digital signal processors including a codec and a bypass mechanism that is invoked when the incoming signal is in a format compatible with the codec. In use, the digital signal processor associated with the first base station that receives the RF signal from a first mobile terminal determines, through signaling and control that a compatible digital signal processor exists at the second base station associated with the mobile terminal at which the call is directed. The digital signal processor associated with the first base station rather than synthesizing the compressed speech signals into PCM samples invokes the bypass mechanism and outputs the compressed speech in the transport network. The compressed speech signal, when arriving at the digital signal processor associated with the second base station is routed such as to bypass the local codec. Decompression of the signal occurs only at the second mobile terminal.
In this network configuration, background noise conditioning at the base-station or at any point in the transmission link connecting the two base stations during the given call is only possible through the manipulation of the compressed bitstream transported between the two base-stations. An obvious approach to the solution of this problem would be to apply the noise conditioning technique described in U.S. Pat. No. 5,642,464 using the compressed bit stream, synthesize speech signal based on the filter coefficients and compress the resulting signal using another stage of speech encoding. This, however, would be equivalent to a tandemed connection of speech codecs that as pointed out earlier is undesirable because it causes additional degradation of the input signal.
Against this background, it clearly appears that a need exists in the industry to provide novel methods and systems allowing to condition signals representative of audio information in digitized and compressed form in order to remove noise artifacts or other undesirable elements from the signal, without the need for accessing the speech encoder or the speech decoder stages of the communication link.