The separation of convolutive mixtures aims to estimate the individual sound signals in the presence of other such signals in reverberant environments. As sound mixtures are almost always convolutive in enclosures, their separation is a useful pre-processing stage for speech recognition and speaker identification problems. Other direct application areas also exist such as in hearing aids, teleconferencing, multichannel audio and acoustical surveillance. Several techniques have been proposed before for the separation of convolutive mixtures, which can be grouped into three different categories: stochastic, adaptive and deterministic.
Stochastic methods, such as the independent component analysis (ICA), are based on a separation criterion that assumes the statistical independence of the source signals. ICA was originally proposed for instantaneous mixtures. It is applied in the frequency domain for convolutive mixtures, as the convolution corresponds to multiplication in the frequency domain. Although faster implementations exist such as the FastICA, stochastic methods are usually computationally expensive due to the several iterations required for the computation of the demixing filters. Furthermore, frequency domain ICA-based techniques suffer from the scaling and permutation issues resulting from the independent application of the separation algorithms in each frequency bin.
The second group of methods are based on adaptive algorithms that optimize a multichannel filter structure according to the signal properties. Depending on the type of the microphone array used, adaptive beamforming (ABF) utilizes spatial selectivity to improve the capture of the target source while suppressing the interferences from other sources. These adaptive algorithms are similar to stochastic methods in the sense that they both depend on the properties of the signals to reach a solution. It has been shown that the frequency domain adaptive beamforming is equivalent to the frequency domain blind source separation (BSS). These algorithms need to adaptively converge to a solution which may be suboptimal. They also need to tackle with all the targets and interferences jointly. Furthermore, the null beamforming applied for the interference signal is not very effective under reverberant conditions due to the reflections, creating an upper bound for the performance of the BSS.
Deterministic methods, on the other hand, do not make any assumptions about the source signals and depend solely on the deterministic aspects of the problem such as the source directions and the multipath characteristics of the reverberant environment. Although there have been efforts to exploit direction-of-arrival (DOA) information and the channel characteristics for solving the permutation problem, these were used in an indirect way, merely to assist the actual separation algorithm, which was usually stochastic or adaptive.
A deterministic approach that leads to a closed-form solution is very desirable from the computational point of view. However, no such method with satisfactory performance has been proposed so far. There are two reasons for this. Firstly, the knowledge of the source directions is not sufficient for good separation, because without adaptive algorithms, the source directions can be exploited only by simple delay-and-sum beamformers. However, due to the limited number of microphones in an array, the spatial selectivity of such beamformers is not sufficient to perform well under reverberant conditions. Secondly, the multipath characteristics of the environment can not be found with sufficient accuracy while using non-coincident arrays, as the channel characteristics are different at each sensor position which in turn makes it difficult to determine the room responses from the mixtures.
Almost all of the source separation methods employ non-coincident microphone arrays to the extent that the existence of such an array geometry is an inherent assumption by default in the formulation of the problem. The use of a coincident microphone array was previously proposed to exploit the directivities of two closely positioned directional microphones (J. M. Sanchis and J. J. Rieta, “Computational Cost Reduction using coincident boundary microphones for convolutive blind signal separation”Electronics Lett., vol. 41, no. 6 pp. 374-376 March 2005). However, the construction of the solution disregarded the fact that the reflections are weighted with different directivity factors according to their arrival directions for two directional microphones pointing at different angles. Therefore, the method was, in fact, not suitable for convolutive mixtures. In literature, coincident microphone arrays have been investigated mostly for intensity vector calculations and sound source localization (H. E. de Bree, W. F. Druyvesteyn, E. Berenschot, and M. Elwenspoek, “Three dimensional sound intensity measurements using Microflown particle velocity sensors”, in Proc. 12th IEEE Int. Conf. on Micro Electro Mech. Syst., Orlando, Fla., USA, January 1999, pp. 124-129; J. Merimaa and V. Pulkki, “Spatial impulse response rendering I: Analysis and synthesis,” J. Audio Eng. Soc., vol. 53, no. 12, pp. 1115-1127, December 2005; B. Gunel, H. Hacihabiboglu, and A. M. Kondoz, “Wavelet-packet based passive analysis of sound fields using a coincident microphone array,” Appl. Acoust., vol. 68, no. 7, pp. 778-796, July 2007).