A mixture of audio signals, notably a multi-channel audio signal such as a stereo, 5.1 or 7.1 audio signal, is typically created by mixing different audio sources in a studio, or generated by recording acoustic signals simultaneously in a real environment. The different audio channels of a multi-channel audio signal may be described as different sums of a plurality of audio sources. The task of source separation is to identify the mixing parameters which lead to the different audio channels and possibly to invert the mixing parameters to obtain estimates of the underlying audio sources.
When no prior information on the audio sources that are involved in a multi-channel audio signal is available, the process of source separation may be referred to as blind source separation (BSS). In the case of spatial audio captures, BSS includes the steps of decomposing a multi-channel audio signal into different source signals and of providing information on the mixing parameters, on the spatial position and/or on the acoustic channel response between the originating location of the audio sources and the one or more receiving microphones.
The problem of blind source separation and/or of informed source separation is relevant in various different application areas, such as speech enhancement with multiple microphones, crosstalk removal in multi-channel communications, multi-path channel identification and equalization, direction of arrival (DOA) estimation in sensor arrays, improvement over beam-forming microphones for audio and passive sonar, movie audio up-mixing and re-authoring, music re-authoring, transcription and/or object-based coding.
Real-time online processing is typically important for many of the above-mentioned applications, such as those for communications and those for re-authoring, etc. Hence, there is a need in the art for a solution for separating audio sources in real-time, which raises requirements with regards to a low system delay and a low analysis delay for the source separation system. Low system delay requires that the system supports a sequential real-time processing (clip-in/clip-out) without requiring substantial look-ahead data. Low analysis delay requires that the complexity of the algorithm is sufficiently low to allow for real-time processing given practical computation resources.
The present document addresses the technical problem of providing a real-time method for source separation. It should be noted that the method described in the present document is applicable to blind source separation, as well as for semi-supervised or supervised source separation, for which information about the sources and/or about the noise is available.