Applications, such as Acoustic Echo Cancellation (AEC) or Listening Room Equalization (LRE) involve the identification of acoustic Multiple-Input/Multiple-Output (MIMO) systems. In practice, multichannel acoustic system identification suffers from the strongly cross-correlated loudspeaker signals typically occurring when rendering virtual acoustic scenes with more than one loudspeaker: the computational complexity grows with at least the number of acoustical paths through the MIMO system, which is NL·NM for NL loudspeakers and NM microphones. Robust fast-converging algorithms for multichannel filter adaptation, such as the Generalized Frequency Domain Adaptive Filtering [GFDAF] [BBK05] even have a complexity of NL3 when robustly solving the involved linear systems of equations for cross-correlated loudspeaker signals by a Cholesky decomposition [GVL96]. Even more, if the number of loudspeakers is larger than the number of virtual sources NS (i.e. the number of spatially separated sources with independent signals), the acoustic paths from the loudspeakers to the microphones of the LEMS cannot be determined uniquely. As this so-called non-uniqueness problem [BMS98] is inevitable in practice, an infinitely large set of possible solutions for the LEMS exists, from which only one corresponds to the true LEMS.
In the past decades, nonlinear [MHBO1] or time-variant [HBK07, SHK13] pre-processing of the loudspeaker signals has been proposed to address the non-uniqueness problem while even slightly increasing the computational burden. On the other hand, the concept of WDAF alleviates both the computational complexity and the non-uniqueness problem [SK14] and is optimum for uniform, concentric, circular loudspeaker and microphone arrays. To this end, WDAF employs a spatial transform which decomposes sound fields into elementary solutions of the acoustic wave equation and allows approximate models and sophisticated regularization in the spatial transform domain [SK14]. Another approach known as Source-Domain Adaptive Filtering (SDAF) [HBSIO] performs a data-driven spatio-temporal transform on the loudspeaker and microphone signals in order to allow an effective modeling of acoustic echo paths in the resulting highly time-varying transform domain. Yet, the identified system does not represent the LEMS, but is a signal dependent approximation. Another adaptation scheme is called Eigenspace Adaptive Filtering (EAF), which is actually approximated by WDAF [SB R06]. In the aforementioned approach, an N 2-channel acoustic MIMO system with NL=NM=N would correspond to exactly N paths after transformation of the signals into the system's eigenspace. The method of [HB13] describes an iterative approach for estimating the involved eigenspaces of the LEMS. None of these approaches employs side information from an object-based rendering system. Even WDAF only exploits prior knowledge about a transform-domain LEMS, while assuming special transducer placements (uniform circular concentric loudspeaker and microphone arrays).