A system for generating or reproducing sound, including amplifiers, cables and loudspeakers, will always affect the spectral properties of the sound, often in unwanted ways. The reverberation of the room where the equipment is placed adds further modifications. Sound reproduction with very high quality can be attained by using matched sets of cables, amplifiers and loudspeakers of the highest quality, but this is cumbersome and very expensive. The increasing computational power of PCs and digital signal processors has introduced new possibilities for modifying the characteristics of a sound generating or sound reproducing system. The dynamic properties of the sound generating system may be measured and modeled by recording its response to known test signals, as well known from the literature. A precompensation filter, R in FIG. 1, is then placed between the original sound source and the audio equipment. The filter is calculated and implemented to compensate for the measured properties of the sound generating system, symbolized by H in FIG. 1. In particular, it is desirable that the phase and amplitude response of the compensated system is close to a prespecified ideal response, symbolized by D in FIG. 1. In other words, it is thus required that the compensated sound reproduction y(t) matches the ideal yref(t) to some given degree of accuracy. The pre-distortion generated by the precompensator R cancels the distortion due to the system H, such that the resulting sound reproduction has the sound characteristic of D. Up to the physical limits of the system, it is thus, at least in theory, possible to attain a superior sound quality, without the high cost of using extreme high-end audio equipment. The aim of the design could, for example, be to cancel acoustic resonances caused by imperfectly built loudspeaker cabinets. Another application could be to minimize low-frequency resonances due to the room acoustics, in different places of the listening room. Yet another aim could be to obtain tonal balance and good staging.
The problem of removing undesired distortions introduced by the electro-acoustical signal path of a sound generating system is commonly called equalization, and also sometimes called dereverberation. An aim could be that the reproduced sound y(t) at a particular listening position should exactly equal the original sound w(t), but we allow it to be delayed by d samples to improve the attainable result. It is then desired that y(t)=w(t−d). Equalization by the use of digital filters has been extensively studied for about two decades, with an increasing concern in recent years for the problem of spatial robustness: A behavior close to the desired should be attained not only at one single measuring point in space, but within an extended spatial volume. Of particular importance for the present work are the time-domain properties of the impulse response of the compensated system: Differences of the dynamic responses at different listening positions may result in an adequate result for some listening positions, while the response deviates at other positions. In particular, significant sound energy may arrive before the intended delay d. Such “pre-ringings” or “pre-echoes” are considered very undesirable if their amplitudes are too large. Parts of the impulse response that are later than the target delay d may also be affected differently at different listening positions. Such “post-ringings” may significantly color the perceived spectrum and tonal balance of the sound.
In the literature, the work on robustness of equalization essentially falls into three categories.
In the first category, the goal of the filter design is a complete signal dereverberation at a single position in a room. A subsequent robustness analysis then investigates the equalizer performance at other spatial positions, or under slightly modified acoustic circumstances. It is well known that this kind of filter design is highly non-robust and causes severe signal degradation when the receiver position changes [1], and even for fixed receiver position, due to the “weak nonstationarity” of the acoustical paths in the room [2].
In the second category, the design objective is not a complete dereverberation, but rather a reduction of linear distortion under the constraint that audio performance should not be degraded by changes of listening position. The standard approach in this category is to design a filter based on averaging and/or smoothing of one or several transfer functions and then perform a robustness analysis of the filter [3]. Such methods, and in particular the complex smoothing operation proposed in [3], provide no possibilities to predict and explicitly control the amount of pre-ringing in the compensated system.
The third category imposes robustness directly on the design by employing a multi-point error criterion to optimize the sound reproduction in a number of spatial positions, either by using measured room transfer functions (RTFs) [4] or by direct adaptation of the inverse [5]. The optimization is in general based on minimum mean square error (MSE) criteria, or the sum of the power spectral densities of the compensation errors at different listening positions. MSE and power spectral density criteria do unfortunately not take the time domain properties of the compensated system into account adequately. Errors due to pre-ringings and post-ringings may result in the same MSE, although their perceptual effect can be very different. There also exists a fundamentally different multi-point scenario, where signals are filtered on the receiver side by a unique equalizer at each receiver point. Spatial robustness in this setting has been studied in [6] and [7]. This approach is however not applicable in the present pre-compensation setting, where a single filter operating on the input of a sound generating system, is designed to equalize the audio response in an extended volume in space.
Equalizers can be designed to compensate for distortions of the received energy at different frequencies. This type of filter will below be called a minimum phase inverse, or resulting in a minimum phase equalizer or magnitude equalizer. A minimum phase inverse compensates for magnitude distortions of the received signal, but does not take the phase properties (the delays of individual frequencies) of the signal into account. In the time domain, a minimum phase inverse will never create pre-ringings at any listening positions, but it may create severe post-ringings. It may even make phase and delay distortions more severe, as compared to the uncompensated system.
Both phase and magnitude distortions can be taken into account by using linear-quadratic Gaussian feedforward filter design or Wiener design, as outlined in e.g. [8]. This method has been used in [9] for designing a general class of audio precompensators. See [10]-[11] for some other FIR-filter-based methods. Design methods and resulting filters that are intended to compensate for both magnitude and phase distortions of the sound generating system will be called mixed phase methods, resulting in mixed phase equalizers. When a mixed phase equalizer is designed to compensate for non-minimum phase zeros of a transfer function, and these zeros differ in the design model and the true system, pre-ringings will unfortunately occur. The currently known mixed-phase designs provide inadequate tools for limiting the resulting pre-ringing effects.
Many researchers have concluded that mixed phase equalizers seem less robust than minimum phase equalizers from a perceptual standpoint. Their inevitable side effects in the form of pre-ringing are perceived to be more objectionable than post-ringings. Since minimum-phase equalizers create no pre-ringings, a common strategy at present for robust and perceptually acceptable equalization is therefore to use minimum phase filters only. This solution is, however, unsatisfying, as it is known to generate large phase distortions in the form of post-ringings and it cannot handle the non-minimum phase part of the audio response at all. The reference [12] proposes as a solution to limit the delay d sufficiently to make the pre-ringing inaudible. This is ineffective, since a small delay limit will for many audio systems severely restrict the ability to perform useful phase correction, in particular at the low frequencies where this is most perceptually important.