Audio content of multi-channel format (such as stereo, surround 5.1, surround 7.1, and the like) is created by mixing different audio signals in a studio, or generated by recording acoustic signals simultaneously in a real environment. The mixed audio signal or content may include a number of different sources. Source separation is a task to identify information of each of the sources in order to reconstruct the audio content, for example, by a mono signal and metadata including spatial information, spectral information, and the like.
When recording an auditory scene using one or more microphones, it is preferred that audio source dependent information is separated such that it may be suitable for use in a great variety of subsequent audio processing tasks. As used herein, the term “audio source” refers to an individual audio element that exists for a defined duration of time in the audio content. An audio source may be dynamic or static. For example, an audio source may be a human, an animal or any other sound source in a sound field. Some examples of the audio processing tasks may include spatial audio coding, remixing/re-authoring, 3D sound analysis and synthesis, and/or signal enhancement/noise suppression for various purposes (e.g., the automatic speech recognition). Therefore, improved versatility and better performance can be achieved by a successful audio source separation.
When no prior information of the audio sources involved in the capturing process is available (for instance, the properties of the recording devices, the acoustic properties of the room, and the like), the separation process can be called blind source separation (BSS). The blind source separation is relevant to various application areas, for example, speech enhancement with multiple microphones, crosstalk removal in multichannel communications, multi-path channel identification and equalization, direction of arrival (DOA) estimation in sensor arrays, improvement over beam-forming microphones for audio and passive sonar, music re-mastering, transcription, object-based coding, or the like.
There is a need in the art for a solution for audio source separation from audio content without prior information.