Mixed audio signals presented in multi-channel format (e.g., stereo, beamforming, 5.1, 7.1 and the like, are created by mixing different audio sources in a studio, or generated from a plurality of recordings of various audio sources in a real environment. Source separation is useful for a wide range of audio processing applications. For example, when recording an auditory scene using one or more microphones, it is preferred that sound source dependent information be separated for uses in a variety of subsequent audio processing tasks. The examples of such applications include re-mixing/re-authoring applications, spatial audio coding, 3D sound analysis and synthesis, and rendering the sources in an extended play-back environment (rather than the original mixed audio signals). Other applications require source parameters to enable source-specific analysis and post-processing, such as pitch correction, time warping, sound effects, boosting, attenuating, or leveling certain sources.
Source separation consists of recovering either the source signals or their spatial images given the mixed signal. Most existing approaches transform signals into time-frequency domain via the short-time Fourier transform (STFT) and approximate the mixing process in each frequency bin by a complex-valued mixing matrix or spatial covariance matrix. Source separation is then achieved by estimating the mixing matrices or spatial covariance in all frequency bins and deriving the source STFT coefficients. An example method of recovering source signals is by way of estimating the mixing matrices and thereafter deriving the source STFT coefficients as described in A. Ozerov, C. Fevotte, “Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation,” IEEE Trans ASLP Vol. 18, No. 3, 2010, the contents of which are incorporated in their entirety herein (referred to as “reference 1” hereafter). Another example method of recovering spatial images of sources is by way of estimating the spatial covariance and deriving the source STFT coefficients as described in Ngoc Q. K. Duong, E. Vincent, R. Gribonvoal, “Spatial Covariance Models for Under-determined Reverberant Audio Source Separation,” IEEE Workshop on Application of Signal Processing to Audio and Acoustics, 2009, the contents of which are incorporated in their entirety herein (referred to as “reference 2” hereafter).