Dereverberation and audio source separation is a major challenge in a number of applications, such as multi-channel audio acquisition, speech acquisition, or up-mixing of mono-channel audio signals. Applicable techniques can be classified into single-channel techniques and multi-channel techniques.
Single-channel techniques can be based on a minimum statistics principle and can estimate an ambient part and a direct part of the audio signal separately. Single-channel techniques can further be based on a statistical system model. Common single-channel techniques, however, suffer from a limited performance in complex acoustic scenarios and may not be generalized to multi-channel scenarios.
Multi-channel techniques can aim at inverting a multiple input/multiple output (MIMO) finite impulse response (FIR) system between a number of audio signal sources and microphones, wherein each acoustic path between an audio signal source and a microphone can be modelled by an FIR filter. Multi-channel techniques can be based on higher order statistics and can employ heuristic statistical models using training data. Common multi-channel techniques, however, suffer from a high computational complexity and may not be applicable in single-channel scenarios.
In the document Herbert Buchner et al., “Trinicon for dereverberation of speech and audio signals”, Speech Dereverberation, Signals and Communication Technology, pages 311-385, Springer London, 2010, an approach to estimate an ideal inverse system is described.
In the document Andreas Walther et al., “Direct-Ambient Decomposition and Upmix of Surround Signals”, Institute of Electrical and Electronics Engineers (IEEE) Workshop on Applications of Signal Processing to Audio and Acoustics, 2011, an approach to estimate diffuse and direct audio components is described.