Cellular telephones, speaker phones, and various other communication devices utilize background noise suppression to enhance the quality of a received signal. In particular, the presence of acoustic background noise can substantially degrade the performance of a speech communication system. The problem is exacerbated when a digital speech coder is used in the communication link, since such coders are tuned to specific characteristics of clean speech signals and handle noisy speech and background noise rather poorly.
A simplified block diagram of a basic noise suppression system 100 is shown in FIG. 1. Such a system is typically utilized to attenuate the input speech/noise signal when signal-to-noise (SNR) values are low. As shown, system 100 includes fast Fourier transformer (FFT) 101, and inverse FFT 102, total channel energy estimator 103, noise energy estimator 105, SNR estimator 106, and channel gain generator 104. During operation, the input signal (comprised of speech plus noise) is transformed into the frequency domain by FFT 101 and grouped into channels that are similar to critical bands of hearing. The channel signal energies are computed via estimator 103, and the background noise channel energies are conditionally updated via estimator 105 as a function of the spectral distance between the signal energy and noise energy estimates. From these energy estimates, the channel SNR vector is computed by estimator 106, which is then used to determine the individual channel gains. The channel gains are then applied via a mixer to the original complex spectrum of the input signal and inverse transformed, using the overlap-and-add method, to produce the noise suppressed output signal. As discussed above, when SNR values are estimated to be low, attenuation of the FFT signal takes place.
FIG. 2 shows the basic gain as a function of SNR for prior-art systems. From FIG. 2 it can be seen that for low channel SNR (i.e., less than an SNR threshold), the signal is presumed to be noise, and the gain for that channel is set to the minimum (in this case, −13 dB). As the SNR increases past the SNR threshold, the gain function enters a transition region, where the gain follows a constant slope of approximately 1, meaning that for every dB increase in SNR, the gain is increased by 1 dB. As the SNR is increased further (generally speech) the gain is clamped at 0 dB so as not to increase the power of the input signal. This gain function is representative of each channel of the communication system such that it is possible to have the gain in one channel be 0 dB while it can be −13 dB in another.
Prior-art noise suppression circuitry 100 additionally includes analysis circuitry 107 and synthesis circuitry 108. These components tend to blend signal discontinuities associated with the dynamics of the noise suppression system. More specifically, as the input speech+noise frames are processed, the filter gain characteristics within channel gain generator 104 change from frame to frame, thus leaving the potential for abrupt changes in output signal content at frame boundaries. Therefore, it is necessary to blend adjacent frames together by adding a decreasing signal envelope from the current frame to an increasing signal envelope for the next frame. Such a technique can be described as “overlap windowing”, and is well known in the prior art. An example of an overlap window is given in equation 4.1.2.1-3 as described in Cellular System Remote unit-Base Station Compatibility Standard of the Electronic Industry Association/Telecommunications Industry Association Interim Standard 127 as:
                              g          ⁡                      (            n            )                          =                  {                                                                                                                d                      ⁡                                              (                                                  n                          ,                          m                                                )                                                              ⁢                                                                  sin                        2                                            ⁡                                              (                                                                                                            π                              ⁡                                                              (                                                                  n                                  +                                  0.5                                                                )                                                                                      /                            2                                                    ⁢                          D                                                )                                                                              ,                                                                                                                        0                      ≤                      n                      <                      D                                        ,                                    ⁢                                                                                                                                                                                  d                    ⁡                                          (                                              n                        ,                        m                                            )                                                        ,                                                                                                                        D                      ≤                      n                      <                      L                                        ,                                    ⁢                                                                                                                                                                                                        d                      ⁡                                              (                                                  n                          ,                          m                                                )                                                              ⁢                                                                  sin                        2                                            ⁡                                              (                                                                                                            π                              ⁡                                                              (                                                                  n                                  -                                  L                                  +                                  D                                  +                                  0.5                                                                )                                                                                      /                            2                                                    ⁢                          D                                                )                                                                              ,                                                                                                  L                    ≤                    n                    <                                          D                      +                      L                                                        ,                                                                                                      0                  ,                                                                                                  D                    +                    L                                    ≤                  n                  <                  M                                                                                                    where g(n) is the windowed, zero-padded input sequence, d(n,m) is the input signal, n is the sample index, m is the frame index, D is the overlap delay, L is the frame length, and M is the FFT length. Here, we are interested in the increasing signal envelope at the beginning of the frame (samples 0 to D−1), and the decreasing signal envelope near the end of the frame (samples L to D+L−1). The significance of these envelopes is that when the signal is reconstructed at the noise suppression output, the output signal with the increasing signal envelope at the beginning of the current frame will be added to the output signal with the decreasing envelope from the previous frame. As one skilled in the art would appreciate, the sum of the two envelopes (windows) yields the trigonometric identity function:sin2(π(n+0.5)/2D)+cos2(π(n+0.5)/2D)=1Thus, the signal at the overlap portions of the noise suppression output will be reconstructed properly due to the sum of the overlapping windows having unity weight.
While this method is effective in smoothing frame discontinuities, it also produces an increase in delay through the noise suppression system. This is due to the fact that the samples for the next frame are not yet available for the addition process, so the addition of these samples to the overlap section of the current frame must be delayed until the next frame is processed. Thus, there exists a tradeoff between performance and delay, with greater smoothing intervals leading to better performance and the longer the delays.
The delay problem is compounded when noise suppression is included as part of a speech coding system, as is the case with many wireless digital communications systems. In such systems, the speech coder also adds delay, typically in the form of what is known as linear predictive coding (LPC) “look-ahead” delay. This delay comprises an additional buffering (via buffer 110) that is required to extend speech samples beyond the current frame for the purposes of estimating the short-term spectrum towards the end of the current frame. The reason being is that the spectral parameters (or LP parameters) are interpolated over shorter time intervals (called sub-frames), and it is desirable for the current set of LP parameters to be representative of the center of the last sub-frame of the current frame. This however, requires an LPC analysis buffer that extends beyond the frame currently being coded, which incurs delay. As is the case with noise suppression, there is a tradeoff between performance and delay.
Thus, for typical LPC analysis, analyzer 111 accesses buffer 110. As discussed above, speech samples beyond the current frame are included in the analysis buffer 110. The window that is applied to the current analysis buffer may be symmetric or non-symmetric based on the amount of look-ahead delay that is used and the length of analysis buffer circuitry 111. As is known in the art, autocorrelation analysis is applied, which is followed by a process to solve the autocorrelation “normal equations”, known as the Levinson-Durbin recursion. The result is a set of direct form LP coefficients (A(z)), which are used by the speech coder to represent the short-term spectral envelope.
FIG. 3 illustrates the interactions between the prior-art noise suppressor and LPC analysis processes. In particular, FIG. 3 shows the relationship in time, along the horizontal dimension, between the various buffer elements, and how those elements contribute to system delay. This example assumes that the digital system has a sampling frequency of 8000 Hz and operates on 20 millisecond (ms) frames, as is common in wireless telephony applications, which corresponds to a frame length of 160 samples. As one skilled in the art will appreciate, various sampling frequencies and frames lengths are possible. The relative timing is indicated in FIG. 3 by the sample indices at the top of the diagram. Here it is assumed that the current sample is n=0, which represents the last sample received in input frame m. Upon receiving the last sample in frame m, the noise suppression analysis window 302 is applied to the input frame 301.
As is evident, the analysis window overlaps with the previous frame by 40 samples (or 5 ms). This overlap facilitates the inter-frame smoothing as discussed previously, which after noise suppression is applied, produces a corresponding output from the noise suppression synthesis circuitry 303. Although a 40 sample overlap is used, other values (up to 160 samples) are possible. Here it can be seen how the overlapping of the frames contributes to the source of the delay. Particularly, for the given frame m, the corresponding noise suppression output frame represents samples that were received 5 ms earlier. This delay is denoted as Dns on the lower right of the diagram. The noise suppression output is then loaded directly in the LPC analysis buffer 304.
From FIG. 3 it can be seen that the coded speech frame 306 is divided into sub-frames, each of length 40 samples (5 ms). As mentioned earlier, in order for the LP parameter interpolation to be effective, the center of the LPC analysis frame should be aligned with the center of the last sub-frame. In order to accomplish this objective, asymmetric LPC analysis circuitry 305 is used to weight the samples towards the front of the LPC analysis buffer with greater magnitude than the samples towards the rear of the LPC analysis buffer. For this example, the LPC analysis look-ahead (given as Dlpc) is 40 samples (5 ms), and the LPC analysis circuitry length is 160 samples (20 ms). The following should be noted:                Symmetric LPC circuitry typically provides better performance than asymmetric circuitry due to reduced spectral smearing and narrower main lobe responses.        LPC analysis circuitry can generally be made symmetric by increasing algorithmic delay (look-ahead).        
Supporting evidence for the first point can be found in FIG. 4. The top plot shows a Hamming window w1(n), which is well known in the art, and an asymmetric window w2(n), which is commonly used in practice. The asymmetric window consists of the first half of a Hamming window for the first 108 samples, followed by a trailing quarter wavelength sine wave for the last 52 samples. This window has been designed such that the weighted energy of the window is centered about sample number n=100. This value of n is chosen by taking the LPC buffer length (L=160), and subtracting the look-ahead (Dlpc=40) plus half of the subframe length (20). The bottom plot shows the respective frequency responses for each of the windows, which were obtained by taking the log magnitude of the DFT of each of the windows. From this plot it is clear that the asymmetric window exhibits increased spectral leakage in the 100 to 200 Hz range, which could result in noticeable degradation in quality when compared to a similar symmetric window with slightly increased look-ahead delay.
Because in a two-way voice communications system, it is desirable to minimize round-trip delay while maximizing audio quality, there is a need for a method and apparatus for coding a noise-suppressed signal that could consolidate the noise suppression and LPC analysis delays into a lesser net delay, while maintaining the same audio quality, or conversely, maintain a given delay while improving overall audio quality.