In the field of audio or acoustic signal processing, the set of propagation paths between multiple sound sources in a reverberant environment and multiple microphones is typically modeled as a linear Multiple-Input/Multiple-Output (MIMO) filter with finite impulse responses (FIR). In such a MIMO-FIR system each microphone picks up mixtures of the reverberated, i.e. “filtered”, source signals. Generally, the MIMO mixing system itself is not directly accessible, only the received microphone signals.
In general, the MIMO mixing system contains all analyzable information about an acoustic scenario. Once this MIMO mixing system is known, one can, for instance, extract the original source signals, estimate the source positions within a room, analyze the acoustic structure/reflections of the room and estimate the reverberation time of the room.
Hence, for acoustic scene analysis, a very desirable goal is to estimate the MIMO mixing system using only the available microphone signals, which is also known as blind MIMO system identification. A precise blind MIMO system identification can be regarded as the optimal acoustic scene analysis.
It has been shown that there is a fundamental relation between a blind MIMO system identification and a broadband blind source separation (BSS) for convolutive mixtures (see H. Buchner, R. Aichner, and W. Kellermann, “Relation between Blind System Identification and Convolutive Blind Source Separation,” Conf Rec. Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA), Piscataway, N.J., USA, March 2005). Thus, a broadband blind source separation (BSS) for convolutive mixtures can be used to solve the blind MIMO system identification problem, and, hence, the acoustic scene analysis problem.
In broadband blind source separation for convolutive mixtures the original goal is to determine a MIMO-FIR demixing system based only on the available microphone signals using a BSS technique or algorithm from a wide variety of BSS algorithms. FIG. 1 shows the basic setup for a BSS scenario including a mixing system 10 and a demixing system 100. The mixing system 10 can be described by a mixing matrix H, which represents all acoustic propagation paths from the original acoustic sources in the room, i.e. the source signals 11 s1, . . . sp, to the P sensors, e.g. microphones, which pick up the mixture signals 101 x1, . . . , xp. The demixing system 100 can be described by a demixing matrix W representing the digital signal processing system which generates the output signals 103 y1, . . . yp. In case of an ideal demixing system 100, these output signals 103 are identical to the demixed source signals 11. Generally, the MIMO-FIR demixing system 100 has to be continuously adapted due to the potential time dependence of source positions and/or room acoustics and, thus, of the mixing system 10.
From the plurality of available BSS techniques or algorithms the most fundamental BSS technique taking into account multiple microphone channels is called Independent Component Analysis (ICA), originally developed for the simpler case of a memoryless (i.e., instantaneous) mixing matrix (see A. Hyvärinen, J. Karhunen and E. Oja, Independent Component Analysis, Wiley & Sons, New York, 2001). ICA for BSS is based on the practically reasonable assumption of mutual statistical independence between the source signals. The demixing matrix W is then optimized such that its output signals become statistically independent again.
Broadband BSS for convolutive mixtures (using MIMO-FIR demixing systems) requires simultaneous exploitation of multiple fundamental statistical signal properties, namely at least “nonwhiteness” and “nonstationarity” or “nonwhiteness” and “nongaussianity”. These are also the minimal requirements for performing blind MIMO system identification. Exploiting all three fundamental statistical signal properties, namely nonstationarity, nongaussianity and nonwhiteness is desirable and leads to improved versatility and higher convergence speed and accuracy (see H. Buchner, R. Aichner, and W. Kellermann, “Blind Source Separation for Convolutive Mixtures Exploiting Nongaussianity, Nonwhiteness, and Nonstationarity,” Conf. Rec. IEEE Intl. Workshop on Acoustic Echo and Noise Control (IWAENC), Kyoto, Japan, pp. 275-278, September 2003).
A unified algorithmic framework has been developed under the name TRINICON (Triple-N ICA for CONvolutive mixtures) extending ICA to broadband sources, convolutive mixtures, and a systematic incorporation of all three fundamental statistical signal properties (see H. Buchner and W. Kellermann, “TRINICON for dereverberation of speech and audio signals,” In P. A. Naylor and N. D. Gaubitch (eds.), Speech Dereverberation, Springer-Verlag, London, pp. 311-385, Jul. 2010 for a comprehensive review). It has been shown that the TRINICON framework essentially includes all currently known major classes of adaptive MIMO filtering algorithms, including all (convolutive and instantaneous) ICA algorithms, as specializations/approximations of the generic formulation.
All currently known real-time implementations of adaptive acoustic MIMO systems also represent such approximations due to computational complexity restrictions, i.e., they all implement certain subsets of the TRINICON features, such as linear convolutionsor broadband signal processing, and usually consider at most only two of the fundamental statistical signal properties (nonstationarity, nonwhiteness, nongaussianity).
In the light of the above, there is still a need for a computationally efficient realization of broadband BSS for convolutive mixtures and, thus, for optimal acoustic scene analysis, which can exploit all fundamental statistical signal properties simultaneously, namely nonstationarity, nongaussianity and nonwhiteness.