In various real-world applications, a convolutive mixing model is usually used to approximate the relationships between the original source signals and the mixed signals captured by the sensors such as microphones. A linear convolution can be approximated by a circular convolution if the frame size of the discrete Fourier transform (DFT) is much larger than the channel length. Then a convolutive mixing process can be represented as multiplication in the frequency domainY(ω,t)=W(ω,t)X(ω,t)  (1)where Y(ω,t) represents the estimated sources [Y1(ω,t), . . . , YN(ω,t)]T, X(ω, t) is the microphone signals [X1(ω,t), . . . , XN(ω,t)]T, and W(ω,t) (also called as unmixing filter for frequency bin ω) denotes the unmixing matrix including unmixing filter coefficients.
For example, in a two-input-two-output (TITO) case, by assuming a unit gain for direct channels, the unmixing matrix can be represented as
                              W          ⁡                      (                          ω              ,              t                        )                          =                  [                                                    1                                                                                  W                    12                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                                                                            W                    21                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                              1                                              ]                                    (        2        )            
Then the separated signals can be represented asY1(ω,t)=X1(ω,t)+W12(ω,t)X2(ω,t)  (3)Y2(ω,t)=X2(ω,t)+W21(ω,t)X1(ω,t)  (4)
Various approaches have been proposed to estimate the unmixing filters. One class of algorithms obtain separation based on second order statistics (SOS) by requiring only non-correlated sources rather than the stronger condition of independence. By exploring additional constraints such as non-stationarity, sufficient conditions for separation can be achieved for SOS systems. One SOS-based approach is the so-called adaptive decorrelation filter (ADF) formulation, which is based on the classical adaptive noise cancellation (ANC) scheme. (See E. Weinstein, M. Feder, and A. V. Oppenheim, “Multi-channel signal separation by decorrelation,” IEEE Trans. Speech Audio Processing, vol. 1, pp. 405-413, April 1993) Other SOS-based approaches have also been proposed, where a cost function for evaluating the decorrelation between separated signals is defined, and values of coefficients of unmixing filters are calculated adaptively by performing a gradient descent process on the cost function. (See Parra L., Spence C., “Convolutive blind source separation of non-stationary sources”, IEEE Trans. on Speech and Audio Processing pp. 320-327, May 2000.)