1. Field of the Invention
The present invention relates to systems and methods for audio signal processing, in particular to systems and methods for enhancing speech quality in an acoustic environment.
2. Description of the Related Art
Speech signal processing is important in many areas of everyday communication, particularly in those areas where noises are profuse. Noises in the real world abound from multiple sources, including apparently single source noises, which in the real world transgress into multiple sounds with echoes and reverberations. Unless separated and isolated, it is difficult to extract the desired noise from background noise. Background noise may include numerous noise signals generated by the general environment, signals generated by background conversations of other people, as well as the echoes, reflections, and reverberations generated from each of the signals. In communication where users often talk in noisy environments, it is desirable to separate the user's speech signals from background noise. Speech communication mediums, such as cell phones, speakerphones, headsets, hearing aids, cordless telephones, teleconferences, CB radios, walkie-talkies, computer telephony applications, computer and automobile voice command applications and other hands-free applications, intercoms, microphone systems and so forth, can take advantage of speech signal processing to separate the desired speech signals from background noise.
Many methods have been created to separate desired sound signals from background noise signals. Prior art noise filters identify signals with predetermined characteristics as white noise signals, and subtract such signals from the input signals. These methods, while simple and fast enough for real time processing of sound signals, are not easily adaptable to different sound environments, and can result in substantial degradation of the speech signal sought to be resolved. The predetermined assumptions of noise characteristics can be over-inclusive or under-inclusive. As a result, portions of a person's speech may be considered “noise” by these methods and therefore removed from the output speech signals, while portions of background noise such as music or conversation may be considered non-noise by these methods and therefore included in the output speech signals.
Other more recently developed methods, such as Independent Component Analysis (“ICA”), provide relatively accurate and flexible means for the separation of speech signals from background noise. For example, PCT publication WO 00/41441 discloses using a specific ICA technique to process input audio signals to reduce noise in the output audio signal. ICA is a technique for separating mixed source signals (components) which are presumably independent from each other. In its simplified form, independent component analysis operates an “un-mixing” matrix of weights on the mixed signals, for example multiplying the matrix with the mixed signals, to produce separated signals. The weights are assigned initial values, and then adjusted to maximize joint entropy of the signals in order to minimize information redundancy. This weight-adjusting and entropy-increasing process is repeated until the information redundancy of the signals is reduced to a minimum. Because this technique does not require information on the source of each signal, it is known as a “blind source separation” method (“BSS”). Blind separation problems refer to the idea of separating mixed signals that come from multiple independent sources.
One of the earliest discussions of ICA is that by Tony Bell in U.S. Pat. No. 5,706,402 which spawned further research. There are now many different ICA techniques or algorithms. A summary of the most widely used algorithms and techniques can be found in books and references therein about ICA (e.g Te-Won Lee, Independent Component Analysis: Theory and Applications, Kluwer Academic Publishers, Boston, September 1998, Hyvarinen et al., Independent Component Analysis, 1st edition (Wiley-Interscience, May 18, 2001); Mark Girolami, Self-Organizing Neural Networks: Independent Component Analysis and Blind Source Separation (Perspectives in Neural Computing) (Springer Verlag, September 1999); and Mark Girolami editor), Advances in Independent Component Analysis (Perspectives in Neural Computing) (Springer Verlag August 2000). Singular value decomposition algorithms have been disclosed in Adaptive Filter Theory by Simon Haykin (Third Edition, Prentice-Hall (NJ), (1996).
Many popular ICA algorithms have been developed to optimize their performance, including a number which have evolved by significant modifications of those which only existed a decade ago. For example, the work described in A. J. Bell and T J Sejnowski, Neural Computation 7:1129-1159 (1995), and Bell, A. J. U.S. Pat. No. 5,706,402, is usually not used in its patented form. Instead, in order to optimize its performance, this algorithm has gone through several recharacterizations by a number of different entities. One such change includes the use of the “natural gradient”, described in Amari, Cichocki, Yang (1996). Other popular ICA algorithms include methods that compute higher-order statistics such as cumulants (Cardoso, 1992; Comon, 1994; Hyvaerinen and Oja, 1997).
However, many known ICA algorithms are not able to effectively separate signals that have been recorded in a real environment which inherently include acoustic echoes, such as those due to room reflections. It is emphasized that the methods mentioned so far are restricted to the separation of signals resulting from a linear stationary mixture of source signals. The phenomenon resulting from the summing of direct path signals and their echoic counterparts is termed reverberation and poses a major issue in artificial speech enhancement and recognition systems. Presently, ICA algorithms require include long filters which can separate those time-delayed and echoed signals, thus precluding effective real time use.
FIG. 1 shows one embodiment of a prior art ICA signal separation system 100. In such a prior art system, a network of filters, acting as a neural network, serve to resolve individual signals from any number of mixed signals inputted into the filter network. As shown in FIG. 1, the system 100 includes two input channels 110 and 120 that receive input signals X1 and X2. For signal X1, an ICA direct filter W1 and an ICA cross filter C2 are applied. For signal X2, an ICA direct filter W2 and an ICA cross filter C1 are applied. The direct filters W1 and W2 communicate for direct adjustments. The cross filters are feedback filters that merge their respective filtered signals with signals filtered by the direct filters. After convergence of the ICA filters, the produced output signals U1 and U2 represent the separated signals.
U.S. Pat. No. 5,675,659, Torkkola et al., proposes methods and an apparatus for blind separation of delayed and filtered sources. Torkkola suggests an ICA system maximizing the entropy of separated outputs but employing un-mixing filters instead of static coefficients like in Bell's patent. However, the ICA calculations described in Torkkola to calculate the joint entropy and to adjust the cross filter weights are numerically unstable in the presence of input signals with time-varying input energy like speech signals and introduce reverberation artifacts into the separated output signals. The proposed filtering scheme therefore does not achieve stable and perceptually acceptable blind source separation of real-life speech signals.
Typical ICA implementations also face additional hurdles as requiring substantial computing power to repeatedly calculate the joint entropy of signals and to adjust the filter weights. Many ICA implementations also require multiple rounds of feedback filters and direct correlation of filters. As a result, it is difficult to accomplish ICA filtering of speech in real time and use a large number of microphones to separate a large number of mixed source signals. In the case of sources originating from spatially localized locations, the un-mixing filter coefficients can be computed with a reasonable amount of filter taps and recording microphones. However if the source signals are distributed in space like background noise originating from vibrations, wind noise or background conversation, the signals recorded at microphone locations emanate from many different directions requiring either very long and complicated filter structures or a very large number of microphones. Since any real-life system is limited in processing power and hardware complexity, an additional processing approach has to complement the discussed ICA filter structure to provide a robust methodology for real-time speech signal enhancement. The computational complexity of such a system should be compatible with the processing power of small consumer devices such as cell phones, Personal Digital Assistants (PDAs), audio surveillance devices, radios, and the like.
What is desired is a simplified speech processing method that can separate speech signals from background noise in real-time and does not require substantial computing power, but still produce relatively accurate results and can adapt flexibly to different environments.