1. Field of the Invention
This invention relates generally to the processing of speech signals. More specifically, the present invention is concerned with a method and apparatus for enhancing a speech signal contaminated by additive noise while avoiding complex iterations and reducing the required signal processing computations.
2. Description of the Prior Art
Speech signals used in, e.g., digital communications often need enhancement to improve speech quality and reduce the transmission bandwidth. Speech enhancement is employed when the intelligibility of the speech signal is reduced due to either channel noise or noise present in the environment (additive noise) of the talker. Speech coders and speech recognition systems are especially sensitive to the need for clean speech; the adverse effects of additive noise, such as motorcycle or automobile noise, on speech signals in speech coders and speech recognition systems can be substantial.
Additionally, speech enhancement is particularly important for speech compression applications in, e.g., computerized voice notes, voice prompts, and voice messaging, digital simultaneous voice and data (DSVD), computer networks, Internet telephones and Internet speech players, telephone voice transmissions, video conferencing, digital answering machines, and military security systems. Conventional approaches for enhancing speech signals include spectrum subtraction, spectral amplitude estimation, Wiener filtering, HMM-based speech enhancement, and Kalman filtering.
Various methods of using Kalman filters to enhance noise-corrupted speech signals have been previously disclosed. The following references, incorporated by reference herein, are helpful to an understanding of Kalman filtering: [1] K. K. Paliwal et al., xe2x80x9cA Speech Enhancement Method based on Kalman Filtering,xe2x80x9d Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, April 1987, pp. 177-180; [2] J. D. Gibson, et al., xe2x80x9cFiltering of Colored Noise for Speech Enhancement and Codingxe2x80x9d, IEEE Trans. Signal Processing, vol. 39, no. 8, pp. 1732-1741, August 1991; [3] B. Lee, et al., xe2x80x9cAn EM-based Approach for Parameter Enhancement with an Application to Speech Signals,xe2x80x9d Signal Processing, vol. 46, no. 1 pp. 1-14, September 1995; [4] M. Nied{dot over (z)}wiecki et al., xe2x80x9cAdaptive Scheme for Elimination of Broadband Noise and Impulsive Disturbance from AR and ARMA Signalsxe2x80x9d IEEE Trans. Signal Processing, vol. 44, no. 3, pp. 528-537, March 1996.
Speech signals corrupted by white noise can be enhanced based on a delayed-Kalman filtering method as disclosed in reference [1], and speech signals corrupted by colored noise can be filtered based on scalar and vector Kalman filtering algorithms as disclosed in reference [2]. Reference [3] discloses a non-Gaussian autoregressive (AR) model for speech signals and models the distribution of the driving-noise as a Gaussian mixture, with application of a decision-directed nonlinear Kalman filter. References [1], [2] and [3] use an EM (Expectation-Maximization)-based algorithm to identify unknown parameters. Reference [4] assumes that speech signals are non-stationary AR processes and uses a random-walk model for the AR coefficients and an extended Kalman filter to simultaneously estimate speech and AR coefficients.
One main drawback of the above-referenced conventional Kalman filtering algorithms, in which speech and noise signals are modeled as AR processes and represented in a state-space domain, is that they require complicated computations to identify the AR parameters of the speech signal. In particular, in these conventional techniques, a high order AR model is required to obtain an accurate model of the speech signal; identification of AR coefficients and the application of the high-order Kalman filter all require extensive computations. In the conventional Kalman filtering technique, a Kalman-EM algorithm involving complex iterations is generally employed in the Kalman filter so that the AR parameters can be estimated. As a result, it is difficult and expensive to implement a speech enhancement system based on the conventional Kalman filtering technique. In fact, these drawbacks are so significant that the aforementioned Kalman filtering algorithms are still not suitable for practical implementation.
In view of the foregoing disadvantages of the prior art methods, it is an object of the present invention to provide a simple and practical method and apparatus for enhancing speech signals based on Kalman filtering while avoiding complex iterations and reducing the required computations and while maintaining comparable performance relative to the conventional Kalman-EM technique.
It is still another object of the present invention to model and filter speech signals in the subband domain such that lower-order Kalman filters can be applied, while employing a frame-based method to identify the AR parameters of the enhanced speech signals by first dividing each input observed subband signal into consecutive voice frames and then in each voice frame estimating the autocorrelation (AC) function of the enhanced subband signals by a novel correlation subtraction method of the present invention and applying a Yule-Walker equation to the AC function of the enhanced subband signals to obtain the derived AR parameters of the enhanced subband speech signals and carry out the subband Kalman filtering.
As noted above, the AC functions of the enhanced subband speech can be estimated frame-by-frame by a novel correlation subtraction method of this invention. This method first calculates the AC function of the observed noisy subband signal in each voice frame, and then in each voice frame obtains the AC function of the enhanced subband signal by subtracting the AC function of the subband noise from the AC function of the noisy subband signal. The AC function of the subband noise is calculated in a non-speech interval comprising at least one non-speech frame which is located at the beginning of the data sequence. It is assumed that the subband noise is stationary and, hence, that the AC function of the subband noise will not change. Thus, the same AC function for the subband noise is used in the application of the correlation subtraction method for all of the voice frames for that subband. The subtraction can be performed after the AC function of the subband noise is multiplied by xcex1, where xcex1 is a constant between zero and one. An advantage of this method is that no iteration is needed, and yet the performance is close to that achieved by employing an EM algorithm.
As noted previously, in conventional Kalman filtering techniques, to achieve a good model of the speech signal, a high order AR model is required. Thus, the computational complexity of the conventional Kalman filter is high. To solve this problem, the present invention decomposes the speech signal into subbands and performs the Kalman filtering in the subband domain. In each subband, only a low order AR model for the subband speech signal is used. The subband Kalman filtering scheme greatly reduces the computations and at the same time achieves good performance.
The speech enhancement apparatus of this invention includes a multichannel analysis filter bank for decomposing the observed noise-corrupted speech signal into subband speech signals. A plurality of parameter estimation units respectively estimate autoregressive parameters of each subband speech signal in accordance with a correlation subtraction method and a Yule-Walker equation and apply these parameters to filter each subband speech signal according to a Kalman filtering algorithm. Thereafter, a multichannel synthesis filter bank reconstructs the filtered subband speech signals to yield an enhanced speech signal.
The speech enhancement method of this invention includes decomposing the corrupted speech signal into a plurality of subband speech signals, estimating the autoregressive parameters of the subband speech signals, applying these parameters to filter the subband speech signals according to a subband Kalman filtering algorithm, and reconstructing the filtered subband speech signals into an enhanced speech signal.
Other features and advantages of the invention will become apparent upon reference to the following description of the preferred embodiments when read in light of the attached drawings.