The problem of extracting a signal of interest from noisy observations is well known by acoustics engineers. Especially, users of portable speech processing systems often encounter the problem of interfering noise reducing the quality and intelligibility of speech. To reduce these harmful noise contributions, several single channel speech enhancement algorithms have been developed [1-4]. Nonetheless, even though single-channel algorithms are able to improve signal quality, recent studies have reported that they are still unable to improve speech intelligibility [5]. In contrast, multiple-microphone noise reduction schemes have been shown repeatedly to increase speech intelligibility and quality [6,7].
Multiple microphone speech enhancement algorithms can be roughly classified into quasi-stationary spatial filtering and time-variant envelope filtering [8]. Quasi-stationary spatial filtering exploits the spatial configuration of the sound sources to reduce noise by spatial filter. The filter characteristics do not change with the dynamics of speech but with the slower changes in the spatial configuration of the sound sources. They achieve almost artefact-free speech enhancement in simple, low reverberating environments and computer simulations. Typical examples are adaptive noise cancelling, positive and differential beam-forming [30] and blind source separation [28,29]. The most promising algorithms of this class proposed hitherto are based on blind source separation (BSS). BSS is the sole technique, which aims to estimate an exact model of the acoustic environment and to possibly invert it. It includes the model for de-mixing of a number of acoustic sources from an equal number of spatially diverse recordings. Additionally, multi-path propagation, though reverberation is also included in BSS models. The basic problem of BSS consists in recovering hidden source signals using only its linear mixtures and nothing else. Assume ds statistically independent sources s(t)=[s1(t), . . . , sss(t)]T. The sources are convolved and mixed in a linear medium leading to dx sensor signals x(t)=[x1(t), . . . , xdx(t)]T that may include additional noise:
                              x          ⁡                      (            t            )                          =                                            ∑                              τ                =                0                            P                        ⁢                                          G                ⁡                                  (                  τ                  )                                            ⁢                              s                ⁡                                  (                                      t                    -                    τ                                    )                                                              +                                    n              ⁡                              (                t                )                                      .                                              (        1        )            
The aim of source separation is to identify the multiple channel transfer characteristics G(τ), to possibly invert it and to obtain estimates of the hidden sources given by:
                              u          ⁡                      (            t            )                          =                              ∑                          τ              =              0                        Q                    ⁢                                    W              ⁡                              (                τ                )                                      ⁢                          x              ⁡                              (                                  t                  -                  τ                                )                                                                        (        2        )            where W(τ) is the estimated inverse multiple channel transfer characteristics of G(τ). Numerous algorithms have been proposed for the estimation of the inverse model W(τ). They are mainly based on the exploitation of the assumption on the statistical independence of the hidden source signal. The statistical independence can be exploited in different ways and additional constraints can be introduced, such as for example intrinsic correlations or non-stationnarity of source signals and/or noise. As a result a large number of BSS algorithms under various implementation forms (e.g. time domain, frequency domain and time-frequency domain) have been proposed recently for multiple-channel speech enhancement (see for example [28,29]).
Dogan and Stems [9] use cumulant based source separation to enhance the signal of interest in binaural hearing aids. Rosca et al. [10] apply blind source separation for de-mixing delayed and convoluted sources from the signals of a microphone array. A post-processing is proposed to improve the enhancement. Jourjine et al. [11] use the statistical distribution of the signals (estimated using histograms) to separate speech and noise. Balan et al. [2] propose an autoregressive (AR) modelling to separate sources from a degenerated mixture. Several approaches use the spatial information given by a plurality of microphone using beamformers. Koroljow and Gibian [12] use first and second order beamformer to adapt the directivity of the hearing aids to the noise conditions.
Bhadkamkar and Ngo [3] combine a negative beamformer to extract the speech source and a post-processing to remove the reverberation and echoes. Lindemann [13] uses a beamformer to extract the energy from the speech source and an omni-directional microphone to obtain the whole energy from the speech and noise sources. The ratio between these two energies allows to enhance the speech signal by a spectral weighting. Feng et al. [14] reconstructs the enhanced signal using delayed versions of the signals of a binaural hearing aid system.
BSS techniques have been shown to achieve almost artefact-free speech enhancement in simple, low reverberating environments, laboratory studies and computer simulations but perform poorly for recordings in reverberant environment or/and with diffuse noise. One could speculate that in reverberant environments the number of model parameters becomes too large to be identified accurately in noisy, non-stationary conditions.
In contrast, envelope filtering (e.g. Wiener, DCT-Bark, coherence and directional filtering) do not yield such failures since they use a simple statistical description of the acoustical environment or the binaural interaction in the human auditory system [8]. Such algorithms process the signal in an appropriate dual domain. The envelope of the target signal or equivalently a short time weighting index (short-time signal-to-noise ratio (SNR), coherence) is estimated in several frequency bands. The target is assumed to be of frontal incidence and the enhanced signal is obtained by modulating the spectral envelope of the noisy signal by the estimated short time weighting index. The adaptation of the weighting index has a temporal resolution of about the syllable rate. Dual channel approaches based on the statistical description of the sources using the coherence function have been presented [1,15-17]. Further improvements have been obtained by merging spatial coherence of noisy sound fields, masking properties of the human auditory system and subspace approaches [19].
Multi-channel speech enhancement algorithms based on envelope filtering are particularly appropriate for complex acoustic environments, namely diffuse noise and highly reverberating. Nevertheless, they are unable to provide loss-less or artefact-free enhancement. Globally, they reduce noise contributions in the time-frequency domains without any speech contributions. In contrast, in time-frequency domains with speech contributions, the noise cannot be reduced and distortions can be introduced. This is mainly the reason why envelope filtering might help reducing the listening effort in noisy environments but intelligibility improvement is generally leaking [20].
The above considerations point out that performance of multiple channel speech enhancement algorithms depend essentially on the complexity of the acoustical context. A given algorithm is appropriated for a specific acoustic environment and in order to cope with changing properties of the acoustic environment composite algorithms have been proposed more recently.
The approach proposed by Melanson and Lindemann in [21] consists in a manual switching between different algorithms to enhance speech under various conditions. A manual switching between several combinations of filtering and dynamic compression has also been proposed by Lindemann et al. [22].
More advanced techniques using an automatic switching according to different noise conditions have been proposed by Killion et al. in [23]. The input of the hearing aid is switched automatically between omnidirectional and directional microphone.
A strategy selective algorithm has been described by Wittkop [24]. This algorithm uses an envelope filtering based on a generalized Wiener approach and an envelope filtering invoking directional inter-aural level and phase differences. A coherence measure is used to identify the acoustical situations and gradually switch off the directional filtering with increasing complexity. It is pointed out that this algorithm helps reducing the listening effort in noisy environments but that intelligibility improvement is still lacking.
Therefore, it is the aim of the present invention to provide a composite method including source separation and coherence based envelope filtering. Source separation and coherence based envelope filtering are achieved in the time Bark domain, i.e. in specific frequency bands. Source separation is performed in bands where coherent sound fields of the signal of interest or of a predominant noise source are detected. Coherence based envelope filtering acts in bands where the sound fields are diffuse and/or where the complexity of the acoustic environment is too large. Source separation and coherence based envelope filtering may act in parallel and are activated in a smooth way through a coherence measure in the Bark bands.
It is further an issue of the present invention to provide a real binaural enhancement of the observed sound field by using the multiple channel transfer characteristics identified by source separation. Indeed, commonly speech enhancement algorithms achieve mainly a monaural speech enhancement, which implies that users of such devices loose the ability to localize sources. A promising solution, which could achieve real binaural speech enhancement, consists of a device with one or two microphones in each ear and an RF-link in-between. The benefit for the user would be enormous. Notably it has been reported that binaural hearing increases the loudness and signal-to-noise ratio of the perceived sound, it improves intelligibility and quality of speech and allows the localization of sources, which is of prime importance in situations of danger. Lindemann and Melanson [25] propose a system with wireless transmission between the hearing aids and a processing unit wearied at the belt of the user. Brander [7] similarly proposes a direct communication between the two ear devices. Goldberg et al. [26] combine the transmission and the enhancement. Finally optical transmission via glasses has been proposed by Martin [27]. Nevertheless in none of these approaches a virtual reconstruction of the binaural sound filed has been proposed. The approach proposed herein, namely exploitation of the multiple channel transfer characteristics identified by source separation to reconstruct the real sound field and attenuat noise contribution considerably improve the security and the comfort of the listener.    [1] J. B. Allen, D. A. Berkley, and J. Blauert. Multimicrophone signal processing technique to remove room reverberation from speech signals. Journal of Acoustical Society of America, 62(4):912-915, 1977.    [2] Radu Balan, Alexander Jourjine, and Justinian Rosca. Estimator of independent sources from degenerate mixtures. U.S. Pat. No. 6,343,268 B1, January 2002.    [3] Neal Ashok Bhadkamkar and John-Thomas Calderon Ngo. Directional acoustic signal processor and method therefor. U.S. Pat. No. 6,002,776, December 1999.    [4] Y. Bar-Ness, J. Carlin, and M. Steinberg. Bootstrapping adaptive cross-pol canceller for satellite communication. In Proc. IEEE Int. Conf. Communication, pages 4F5.1-4F5.5, 1982.    [5] S. F. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. on Acoustics, Speech and Signal Processing, 27:113-120, April 1979.    [6] D. Bradwood. Cross-coupled cancellation systems for improving cross-polarisation discrimination. In Proc. IEEE Int. Conf. Antennas Propagation, volume 1, pages 41-45, 1978.    [7] Richard Brander. Bilateral signal processing prothesis. U.S. Pat. No. 5,991,419, November 1999.    [9] Mithat Can Dogan and Stephen Deane Steams. Cochannel signal processing system U.S. Pat. No. 6,018,317, January 2000.    [10] Justianian Rosca, Christian Darken, Thomas Petsche, and Inga Holube. Blind source separation for hearing aids. European Patent Office Patent 99,310,611.1, December 1999.    [11] Alexander Jourjine, Scott T. Rickard, and Ozgur Yilmaz. Method and apparatus for demixing of degenerate mixtures. U.S. Pat. No. 6,430,528 B1, August 2002.    [12] Walter S. Koroljow and Gary L. Gibian. Hybrid adaptive beamformer. U.S. Pat. No. 6,154,552, November 2000.    [13] Eric Lindemann. Dynamic intensity beamforming system for noise reduction in a binaural hearing aid. U.S. Pat. No. 5,511,128, April 1996.    [14] Albert S. Feng, Charissa R. Lansing, Chen Liu, William O'Brien, and Bruce C. Wheeler. Binaural signal processing system and method. U.S. Pat. No. 6,222,927 B1, April 2001.    [15] Y. Kaneda and T. Tohyama. Noise suppression signal processing using 2-point received signals. Electronics and Communications, 67a(12):19-28, 1984.    [16] B. Le Bourquin and G. Faucon. Using the coherence function for noise reduction. IEE Proceedings, 139(3):484-487, 1997.    [17] G. C. Carter, C. H: Knapp, and A. H. Nuttall. Estimation of the magnitude square coherence function via overlapped fast Fourier transform processing. IEEE Trans. on Audio and Acoustics, 21(4):337-344, 1973.    [18] Y. Ephrahim and H. L. Van Trees. A signal subspace approach for speech enhancement IEEE Trans. on Speech and Audio Proc., 3:251-266, 1995.    [19] R. Vetter. Method and system for enhancing speech in a noisy environment. U.S. Patent US 2003/0014248 A1 January 2003.    [20] V. Hohmann, J. Nix, G. Grimm and T. Wittkopp. Binaural noise reduction for hearing aids. In ICASSP 2002, Orlando, USA, 2002.    [21] John L. Melanson and Eric Lindemann. Digital signal processing hearing aid. U.S. Pat. No. 6,104,822, August 2000.    [22] Eric Lindemann, John Melanson, and Nikolai Bisgaard. Digital hearing aid system. U.S. Pat. No. 5,757,932, May 1998.    [23] Mead Killion, Fred Waldhauer, Johannes Wittkowski, Richard Goode, and John Allen. Hearing aid having plural microphones and a microphone switching system. U.S. Pat. No. 6,327,370 B1, December 2001.    [24] Thomas Wittkop. Two-channel noise reduction algorithms motivated by models of binaural interaction. PhD thesis, Fachbereich Physik der Universitat Oldenburg, 2000.    [25] Eric Lindemann and John L. Melanson. Binaural hearing aid. U.S. Pat. No. 5,479,522, December 1995.    [26] Jack Goldberg, Mead C. Killion, and Jame R. Hendershot. System and method for enhancing speech intelligibility utilizing wireless communication. U.S. Pat. No. 5,966,639, October 1999.    [27] Raimund Martin. Hearing aid having two hearing apparatuses with optical signal transmission therebetween. U.S. Pat. No. 6,148,087, November 2000.    [28] J. Anemüller. Across-frequency processing in convolutive blind source separation. PhD thesis, Farbereich Physik der Universität Oldenburg, 2000.    [29] Lucas Parra and Clay Spence. Convolutive blind separation of non-stationary sources. IEEE Trans. on Speech and Audio Processing, 8(3):320-327, 2000.    [30] S. Haykin. Adaptive filter theory. Prentice Hall, New Jersey 1996.