1. Field of Technology and Background
A time-varying modification of a loudness is commonly referred to as a dynamic range control (DRC), and it is typically used to amplify quiet audio signals so that they become clearly audible. A static modification of a spectral content is commonly referred to as an equalization (EQ), and it is typically used to amplify some parts of the spectrum according to user preferences, or in order to compensate for a non-ideal response of a transducer such as a loudspeaker. Consequently, the DRC can be used to maximize the loudness of a music track or a ringing-tone whereas the EQ can be used to implement a ‘bass-boost’.
2. Problem Formulation
When a digital audio source is played back to a listener, the result is often unsatisfactory because the audio source is intended for a playback under different conditions. For example, a user will find it difficult to hear a quiet part of a song on a portable music player while walking down a busy city street. Similarly, the user will find it difficult to hear the bass in a music track when using a set of poor headphones.
The invention addresses the problem that in practice the original audio source is often not appropriate for a user's acoustic environment and hardware used for a playback. Better results can be achieved if the audio source is processed according to requirements of the user. In particular, it is advantageous to be able to produce a consistent loudness for the user of a portable device, and to ensure that the acoustic output is never too quiet and heard clearly by the user.
3. Prior Art
Equalization (EQ)
The purpose of the EQ is to modify a signal's magnitude spectrum. The phase response of the EQ is important only in the sense that it must not vary too quickly as a function of frequency. As a rule of thumb the difference between the maximum and minimum of the group delay function should not be greater than 3 ms. As long as this constraint is satisfied the phase response is not important.
The desired magnitude response of the EQ is usually defined by a set of gains, for example five, where each gain specifies a target magnitude response within a certain frequency band. The frequency bands are usually unevenly spaced so that they are relatively narrow at low frequencies and relatively wide at high frequencies. The output from the EQ can then be calculated either by cascading a set of peak and shelving filters or by adding the outputs from a set of low-pass, high-pass, and band-pass filters. Cascading is the most natural choice since the resulting magnitude response is easy to predict (it is the product of the individual magnitude responses) whereas adding can cause unpredictable interference unless the phase response of the individual filters are the same. Linear phase FIR (finite impulse response) filters are very expensive to run in the lower frequency bands, so IIR (infinite impulse response) filters are most commonly used in practice, the 2nd order IIR filters in particular. There are different ways to implement a cascade of peak and shelving filters. One simple method mixes the output from an all-pass filter with a direct signal as described by P. A. Regalia and S. K. Mitra, “Tunable Digital Frequency Response Equalization Filters”, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, pp. 118-120, January 1987. A more sophisticated method based on the same technique can be used to design a so-called “multi-level filter” described by R. Ansari, “Multi-level IIR Digital Filters”, IEEE Trans. Circuits and Systems, Vol. Cas-33, pp. 337-341, March 1986.
When a number of signals are added together, their sum depends on the phase of the individual signals. It is disclosed in U.S. Pat. No. 5,892,833 “Gain and Equalization System and Method”, by C. Maag, L. Parker and Q. Jensen, that it is possible to achieve a low group delay as well as a good approximation to the target magnitude response by adding together the outputs from a number of the IIR filters. It is described in “Multirate Systems and Filter Banks”, Section 4.6.5 by P. P. Vaidyanathan, Prentice Hall, 1993, how to use a polyphase implementation to make an adjustable multi-level filter. The output from the filter is a sum of the outputs from a filterbank, and when the elements of the filterbank are the polyphase components of Mth band filters (also called Nyquist filters) the overall frequency response is guaranteed to be smooth everywhere (no unpredictable phasing artifacts occur in the transition regions).
The main problem with the methods mentioned above is that they are not very suitable when the bandwidths of the individual filters are very different, and since the perception of a pitch by a human ear is roughly logarithmic it is desirable to let the EQ modify the spectral content on a logarithmic frequency scale rather than a linear frequency scale. A technique exists, referred to as frequency warping, which allows the characteristics of the FIR filter to be mapped onto an approximately logarithmic frequency scale (frequency warping can also be applied to the IIR filters but they become extremely sensitive to a noise and round-off errors, and they are rarely used in practice).
It is described by C. Asavathiratham, P. E. Beckmann, A. V. Oppenheim, “Frequency Warping in the Design and Implementation of Fixed-Point Audio Equalizers”, pp. 55-58, Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, Oct. 17-20, 1999, how the frequency warping, implemented with a 16-bit fixed-point precision, can be used to equalize the response of a loudspeaker. It is described by A. Makur, S. K. Mitra, “Warped Discrete-Fourier Transform: Theory and Applications”, pp. 1086-1093, IEEE Trans. Circuits and Systems I: Fundamental Theory and Applications, Vol. 48, No. 9. September 2001, how to implement a warped Discrete Fourier Transform (WDFT). The WDFT is a block transform that can be used to implement the EQ if it is followed by an inverse WDFT.
Dynamic Range Control (DRC)
There are two types of DRCs: a full-band and a multi-band. The full-band DRC applies a single time-varying gain to an input signal whereas a multi-band DRC uses a set of time-varying gains to adjust signal level within a number of frequency bands. The multi-band DRC essentially runs time-varying EQs whose gains are calculated from the input signal, which means that the multi-band DRC contains an EQ as one of its components. The performance of the EQ inside of the DRC is even more important than when the EQ is used as a stand-alone application since artifacts, such as phasing, are more clearly audible when the gains are time-varying than when they are constant.
The gain applied by the DRC is calculated from the level of the input signal. The full-band DRC estimates the total input level whereas the multi-band DRC estimates the level in each frequency band. The level estimate is converted to a gain from a so-called compression curve, which specifies the output level, in dBs, as a function of the input level, in dBs. The gain is not converted instantly, rather it converges exponentially to its target value with a time constant that depends on whether the current gain is to be increased or decreased. If the gain is decreased the time constant is referred to as an attack time. If the gain is increased the time constant is referred to as a release time. The release time is typically at least an order of magnitude greater than the attack time, and both the attack time and release time are typically shorter at high frequencies than at low frequencies. A look-ahead delay is inserted in order to compensate for the inherent delay in the processing necessary for the level estimation, and it also allows the DRC to anticipate sudden changes in the input signal level. For example, if a quiet section is followed by a loud transient, the DRC can turn down the gain a few milliseconds in advance of the transient so that the overall loudness remains roughly constant.
The above description of a DRC is readily available in textbooks (see for example Chapter 7 of “Digital Audio signal Processing”, by U. Zölzer, John Wiley & Sons, 1997).