The present invention relates to audio signal processing and, in particular, to an apparatus and method for listening room equalization.
Audio signal processing becomes more and more important. Several audio reproduction techniques, e.g. wave field synthesis (WFS) or Ambisonics, make use of loudspeaker array equipped with a plurality of loudspeakers to provide a highly detailed spatial reproduction of an acoustic scene. In particular, wave field synthesis is used to achieve a highly detailed spatial reproduction of an acoustic scene to overcome the limitations of a sweet spot by using an array of e.g. several tens to hundreds of loudspeakers. More details on wave field synthesis can, for example, be found in:    [1] A. J. Berkhout, D. De Vries, and P. Vogel, “Acoustic control by wave field synthesis”, J. Acoust. Soc. Am., vol. 93, pp. 2764-2778, May 1993.
For audio reproduction techniques, such as wave field synthesis (WFS) or Ambisonics, the loudspeaker signals are typically determined according to an underlying theory, so that the superposition of sound fields emitted by the loudspeakers at their known positions describes a certain desired sound field. Typically, the loudspeaker signals are determined assuming free-field conditions. Therefore, the listening room should not exhibit significant wall reflections, because the reflected portions of the reflected wave field would distort the reproduced wave field. In many scenarios, the necessitated acoustic treatment to achieve such room properties may be too expensive or impractical.
An alternative to acoustical countermeasures is to compensate for the wall reflections by means of a listening room equalization (LRE), often termed listening room compensation. Listening room equalization is particularly suitable to be employed with massive multichannel reproduction systems. To this end, the reproduction signals are filtered to pre-equalize the Multiple-Input-Multiple-Output (MIMO) room system response from the loudspeakers at the positions of multiple microphones, ideally achieving an equalization at any point in the listening area. However, the typically large number of reproduction channels of the WFS make the task of listening room equalization challenging for both, computational and algorithmic reasons.
Given a loudspeaker configuration which provides enough control over the wave field, as e.g. used for WFS, it is possible to prefilter the loudspeaker signals in a way so that the desired wave field is reproduced even in the presence of wall reflections. To this end, a microphone array is placed in the listening room and the equalizers are determined in a way so that the resulting overall MIMO system response is equal to the desired (free-field) impulse response (see [3], [10], [11]). As the room properties may change, e.g. due to changes in room temperature, opened doors or by large moving objects in the room, the need for adaptively determined equalizers is created, see, for example:    [12] Omura, M.; Yada, M.; Saruwatari, H.; Kajita, S.; Takeda, K.; Itakura, F.: Compensating of room acoustic transfer functions affected by change of room temperature. In: Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings., 1999 IEEE International Conference on Bd. 2 IEEE, 1999, S. 941-944,
A corresponding LRE system comprises a building block for identifying the LEMS based on observations of loudspeaker signals and microphone signals and another part for determining the equalizer coefficients, see, e.g. [8]. In the single channel case, it is possible to formulate a direct solution for both, identification and equalizer determination. There are different challenges connected to the task of LRE for multichannel systems: Listening room equalization should be achieved in a spatial continuum and not only at the microphone positions to achieve spatial robustness, see [11]. The problem is often underdetermined or ill-conditioned, and the computational effort for adaptive filtering may be tremendous, see, for example:    [16] Spors, S.; Buchner, H.; Rabenstein, R.; Herbordt, W.: Active Listening Room Compensation for Massive Multichannel Sound Reproduction Systems Using Wave-Domain Adaptive Filtering. In: J. Acoust. Soc. Am. 122 (2007), July, Nr. 1, S. 354-369.
Although a loudspeaker array as typically used for WFS provides sufficient control over the wave field to potentially solve the first problem mentioned, the large number of reproduction channels increases the two other mentioned problems, making a system for WFS as presented by [8] unrealistic for typical real-world scenarios.
Although the precise spatial control over the synthesized wave field makes a WFS system particularly suitable for LRE, its many reproduction channels constitute a major challenge for the development of such a system. As the MIMO loudspeaker-enclosure microphone system (LEMS) may be expected to change over time, it has to be continuously identified by adaptive filtering. As known from acoustic echo cancellation (AEC), this problem may be underdetermined or at least ill-conditioned when using multiple reproduction channels, see, for example,    [2] J. Benesty, D. R. Morgan, and M. M. Sondhi, “A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation”, IEEE Trans. Speech Audio Process, vol. 6, no. 2, pp. 156-165, March 1998.
Additionally, the inverse filtering problem underlying LRE may be expected to be ill-conditioned as well. Besides these algorithmic problems, the large number of reproduction channels also leads to a large computational effort for both, the system identification and the determination of the equalizing prefilters. As the MIMO system response of the LEMS can only be measured for the microphone positions, and as equalization should be achieved in the entire listening area, the spatial robustness of the solution for the equalizers has to be additionally ensured.
LRE according to the state of the art aims for an equalization at multiple points in the listening room, see, for example,    [11] P. A. Nelson, F. Orduna-Bustamante, and H. Hamada, “Inverse filter design and equalization zones in multichannel sound reproduction”, IEEE Trans. Speech Audio Process, vol. 3, no. 3, pp. 185-192, May 1995.
However, this approach disregards the wave propagation, and so, the results obtained suffer from a low spatial robustness.
Wave-domain adaptive filtering (WDAF) (see [7], 15]) was proposed for various adaptive filtering tasks in audio signal processing overcoming the mentioned problems for LRE. This approach uses fundamental solutions of the wave-equation as basis functions for the signal representation for adaptive filtering. As a result, the considered MIMO system may be approximated by multiple decoupled SISO systems (e.g. single channels). This reduces the computational demands for adaptive filtering considerably and additionally improves the conditioning of the underlying problem. At the same time, this approach implicitly considers wave propagation, so solutions are obtained which achieve an LRE within a spatial continuum. See the according patent application:    [6] Buchner, H.; Herbodt, W.; Spors, S; Kellermann, W.: US-Patent Application: Apparatus and Method for Signal Processing. Pub. No.: US 2006 0262939 A1, November 2006.
However, it can be shown that the involved simplified model involving multiple decoupled SISO systems is not able to sufficiently model the LEMS behaviour when a more complex acoustic scene is reproduced, see, for example:    [14] Schneider, M.; Kellermann, W.: A Wave-Domain Model for Acoustic MIMO Systems with Reduced Complexity. In: Proc. Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA). Edinburgh, UK, May 2011.
In    [15] S. Spors, H. Buchner, and R. Rabenstein, “A novel approach to active listening room compensation for wave field synthesis using wave-domain adaptive filtering” in Proc. Int. Conf. Acoust. Speech, Signal Process (ICASSP), May 2004, vol. 4, pp. IV-29 IV-32it is explained that, according to the state of the art, to realize listening room equalization, a number of M loudspeaker input signals are filtered, such that M filtered loudspeaker signals are obtained. Moreover, it is furthermore described in [15], that according to the state of the art, all of the M loudspeaker input signals are taken into account for generating each of the M filtered loudspeaker signals.
Furthermore, in [15] it is proposed as an alternative to such state-of-the-art concepts, that each one of a number of N filtered loudspeaker signals should be generated based on only a single one of the N loudspeaker input signals in the wave domain. By this, a simplified filter structure is achieved. To this end, [15] proposes, that the LEMS may be approximated so that a very simple equalizer structure results. According to the concept proposed in [15], system identification is never an underdetermined problem. However, the model of [15] produces a residual error due to model limitations.
The concept proposed in [15] provides a simplified model that is, due to its simplified structure, realizable in real-word scenarios. However, the simplified structure of this concept also has the disadvantage, that the listening room equalization provided is not sufficient in many practically relevant reproduction scenarios.