An audio reproduction system is affected by imperfect loudspeaker dynamics and room acoustics. The audio system may furthermore have loudspeakers placed in inappropriate positions. For example, sound material intended for a 5.1 surround system is to be reproduced by loudspeakers in standardized positions, but the number and positioning of loudspeakers in a home or in a car may differ from the specified setting. All of these problems are frequently encountered in home-cinema and audio systems and they are particularly hard to solve for car audio systems with their often awkward loudspeaker positions and difficult acoustic environments.
For example, consider the tuning process of car audio systems, that today proceeds in several steps. First, crossover filters are set and each loudspeaker is equalized on a per-channel basis; then the delay and level for each channel is set to reach a desired sound stage (spatial sound perception); additional adjustments to filter responses are made with respect to the combined acoustic loudspeaker responses; finally, parameters for up-mixing are adjusted. Up-mixing here refers to the process of distributing stereo or discrete 5.1 material to the N loudspeakers in the car.
The end goal of the tuning process for cars or home hifi/cinema systems can be described in terms of a target sound field in the listening environment. The target sound field is in general continuous in space.
In this context it is generally an objective to design a set of pre-compensating filters for a multichannel audio system, with N loudspeaker inputs. It is desirable to jointly optimize the filters to provide a unified joint solution to all of the above design steps: equalizer design, crossover design, delay and level calibration, sum-response optimization and up-mixing. As a result, listeners positioned at any of P>1 listening regions should ideally be given the illusion of being in another acoustic environment that has L sound sources (virtual loudspeakers) that are located at prescribed positions in a prescribed room acoustics. To make the solution practical, the volume of the listening positions should allow for some head movement of the listener. The best possible approximation of this goal should be attained for a given sound reproduction system, with given loudspeaker numbers, positions and properties. In particular, the solution should not require the loudspeakers to be located in particular positions with respect to the listeners and also not require them to consist of arrays with prescribed spatial properties.
In the literature, there are essentially three different theoretical approaches to the problem of reconstructing sound fields, none of which solves the above described problem in an adequate way.                1. Wave Field Synthesis (WFS), which is based on Huygens Principle, or the Kirchhoff-Helmholtz integral representation of sound fields [1]. This method can re-create the complete sound field in one single continuous region in space. However, it is based on ideal assumptions regarding the transducers and the acoustic environment where the reproduction takes place, assuming a large number of ideal transducers and an ideal room acoustics. In practical systems, these assumptions are never fulfilled.        2. High Order Ambisonics (HOA), based on a Fourier-Bessel series expansion of the original and desired sound fields in spherical coordinates [2]. It aims at sound field reconstruction within one single spherical region and is thus not suitable for reproduction over arbitrary spatial regions. The filter design has to be performed for each frequency separately [3]. For multiple frequencies, this would result in filters for which there is no control of the time domain properties. The paper [4] presents a design that uses a circular array of loudspeakers to produce a target sound field in one sub-region inside the circle, while silence is produced in three other regions, for one single frequency. This solution, and HOA techniques in general are unsuited for our purposes, because their lack of control of time-domain signal properties.        3. Multipoint Mean Square Error (MSE) based methods, in which the error between the desired and the reconstructed sound field is minimized on a discrete grid of measurement points [5]. Such methods have been proposed for reproducing sources at virtual positions, as perceived at the ear positions of a listener [6],[7],[9], where, typically, two measurement positions are used per listening position, locat at the ear positions and the required number of loudspeakers is twice as large as the number of listening positions. Such solutions are basically based on so-called cross-talk cancellation or inversion of the acoustic channel matrices. They are known to be extremely sensitive to the position of the listener, and this non-robustness makes them unsuitable for practical applications. Another special application is that of making specialized recordings with microphones placed in particular positions, and then re-creating those sound signals in other positions [8],[10]. That objective differs from ours, where the recordings are arbitrary, but should be perceived as being played over a new set of loudspeakers, in a different room. MSE optimization is in general implemented by frequency-domain methods [11], which provide little control of the time domain properties of the resulting filters, in particular the “pre-response” or “pre-ringing” part of compensated systems. This lack of control of time-domain aspects reduces control of the spatial aspects, such as wave front angles of arrival at different positions.        
The Linear Quadratic Control method for audio precompensation controller design presented in [12] provides means for attaining precise control of the time-domain properties as well as the frequency domain properties of the compensated system. However, the particular solution presented in [12] is based on a filter structure with a nonzero and fixed parallel path between the inputs and the outputs of the precompensator. This would be an inappropriate structural constraint on a solution to the above stated multichannel design problem; there is here no reason for one virtual source to be assigned to one particular subset of loudspeakers via a fixed part of a precompensation controller.
The design schemes available in prior art are thus not adequate for the stated design goal.