Dynamics compression (also known as dynamic range compression) may be used during audio mastering, broadcast or playback to adjust the gain as a function of time, in order to achieve a desired distribution of output levels. This type of compression (not to be confused with data compression algorithms such as mp3) helps keep the volume in the “sweet spot” between too soft and too loud, reducing the user's need to manually adjust (or “ride”) the volume. Dynamics compression allows the quiet portions of the program to remain audible, even in a relatively noisy environment, while keeping the louder parts from becoming disturbingly loud.
Especially in the case of real-time broadcast audio, dynamics compression is the art of compromise. The system is trying to cope with input levels that may vary from one source to another, from one program to the next, or even from one moment to the next within the same program. The goal is for the quiet parts to be fully audible during low-level late night playback, without the loud parts becoming loud enough to wake sleeping housemates. Furthermore, it is desired to be able to listen in a noisy environment, such as a car or airplane, without having to choose between cranking the volume or missing the quiet parts. It is also desired to be able to watch TV without misunderstanding the dialogue, being blasted off the sofa, or constantly adjusting the volume control. Furthermore, there is a desire to tame hyper-compressed commercials that not only sound louder than uncompressed signals with the same peak level, but actually sound louder than uncompressed signals normalized to the same RMS value.
FIG. 1 is a block diagram of a typical feed-forward single-band dynamics compressor (C). Single-band (also known as wideband) dynamics compressors apply the same processing to the entire frequency range. The RMS level of an input signal 10 is extracted 12 and converted 14 to a logarithmic expression. Next, a transfer function (or characteristic) 16 maps the input signal level to a desired output level. FIG. 2A shows an exemplary transfer function 16. FIG. 2B illustrates the gain curve of the transfer function of FIG. 2A which results from the mapping of the input signal level to a desired output level. When the input level is less than the compression threshold 18 (−30 dB in this example), ignoring the noise gate region, each additional dB of input level produces one additional dB of output level. However, when the input level exceeds the compression threshold 18, each additional dB of input level only produces l/r dB of additional output level, where r is the compression ratio (in FIG. 2A, r=3).
In FIG. 1, the gain is then converted 20 back from the logarithmic to a linear representation, smoothed 22 and applied 24 to a copy of the input signal 10 which has been delayed (z−n) 26 to compensate for the delay of the gain computation path 28. Finally, some post-gain 30 may be applied 32 to help compensate for some of the gain loss due to the compression. More information on single-band dynamics compressors may be obtained from the following references (the disclosures of which are hereby incorporated by reference): G. W. McNally, “Dynamic Range Control of Digital Audio Signals,” J. Audio Engineering Society, Vol. 32, No. 5, 1984 May; Udo Zölzer, Digital Audio Signal Processing, John Wiley & Sons Ltd., 1997, pp. 207-219; and Earl Vickers, “Automatic Long-term Loudness and Dynamics Matching,” presented at the AES 111th Convention, New York, 2001.
A multiband dynamics compressor divides the frequency range into a plurality of frequency bands and then processes the input signal through each frequency band independently. In many implementations, a different set of processing parameters (such as compression ratios) is applied to each frequency band. FIG. 3 is a block diagram of a typical feed-forward multiband dynamics compressor. The input signal 10 is applied to a frequency band splitting block 30 which divides the signal 10 into a plurality of signals 32 each having a range of frequencies and being limited to a certain frequency band, there being some degree of frequency overlap between adjacent ones of the frequency bands. In a frequency-domain implementation, frequency band splitting block 30 may consist of a fast Fourier transform (FFT) or short-time Fourier transform (STFT), with an optional analysis window. In a time-domain implementation, frequency band splitting block 30 may consist of one or more crossover filters. Each signal 32 is then applied to one of a plurality of included feed-forward single-band compressors (C1-C5) 34. Each compressor 34 may, for example, have a configuration like that of the compressor (C) shown in FIG. 1. The output signals 36 from the plurality of compressors 34 are then combined 38 together to generate a compressed signal output 40. In a frequency-domain implementation, combining block 38 may consist of an inverse fast Fourier transform (IFFT) or inverse short-time Fourier transform (ISTFT), which may include an overlap-add implementation with optional synthesis window. In a time-domain implementation, combining block 38 may consist of a summer.
Multiband dynamics compressors are powerful, versatile tools for audio mastering, broadcast and playback. When used properly, multiband dynamics compressors have a number of advantages over single-band dynamics compressors, foremost being the fact that loud sounds in one band will not trigger artifacts such as “pumping” or “breathing” in the other bands. However, multiband dynamics compressors have a known problem relating to frequency response.
It has been noted that multiband compressors have a continually changing frequency response. The reason for this is because the included single band compressors (C1-C5) attenuate each band independently based on that band's current input energy. This compression operation may result in unwanted changes to the long-term average spectral balance. For example, the bass may be attenuated in relation to other frequencies. See, U.S. Pat. No. 4,249,042, the disclosure of which is hereby incorporated by reference.
Unlike single-band compressors, multiband compressors have the advantage that loud sounds in one frequency region will not cause attenuation (“pumping” or “breathing”) at other frequencies. However, the frequency response changes over time; in fact, this is how multiband compressors do their job. The continually changing spectral response is not necessarily a problem, though it can be if the changes are too extreme. Even if the short-term frequency response changes are not objectionable in themselves, they can still result in undesirable changes to the long-term spectral balance.
For example, if the same compression threshold (for example, −20 dBFS) is used for each band, the energy in the low-frequency band(s) may consistently exceed this threshold whenever loud bass notes are played, while the threshold may rarely be exceeded in the high-frequency bands. As a result, the long-term spectral balance will be changed, because the bass will be attenuated (compressed) much more than the mid-range and treble.
The spectral centroid, or center of gravity, is closely correlated to a sound's perceptual brightness and can be used as a simple measure of the long-term spectral balance. The spectral centroid (sc) can be defined as:
                              sc          =                                                    ∑                f                            ⁢                                                          ⁢                              f                ⁢                                                                        X                    ⁡                                          (                      f                      )                                                                                                                                            ∑                f                            ⁢                                                          ⁢                                                                X                  ⁡                                      (                    f                    )                                                                                                      ,                            (        1        )            
where f is the fast Fourier transform (FFT) frequency bin number and X is the complex frequency response. See, Andrew Horner, James Beauchamp, and Richard So, “A Search for Best Error Metrics to Predict Discrimination of Original and Spectrally Altered Musical Instrument Sounds,” J. Audio Engineering Society, Vol. 54, Issue 3, March 2006, the disclosure of which is hereby incorporated by reference.
U.S. Pat. No. 4,249,042 teaches: “In multiband systems, since the bands operate independently, the instantaneous frequency response is seldom flat and, moreover, continually changing. Sometimes this results in pleasing sounds, but generally only in small, poor quality radios. In better audio equipment, the results of this varying frequency response produce unnatural sounds. Thus, to some extent the undesirable quality in the wideband compressor of audible modulation is traded for the problem of a shifting frequency response in the multiband systems.” See, col. 1, lines 38-47.
The solution presented by U.S. Pat. No. 4,249,042 involved a three-band analog compressor in which the frequency band with the highest predictable energy (typically the mid-range band) was used as a master band to control the gain in the other (slave) bands. The compressor operated as a wideband system unless the energy in the low or high frequency bands exceeded a threshold, in which case additional attenuation was applied to the appropriate frequency band.
While this technique was intended to preserve the advantages of wideband and multiband compressors without their disadvantages, some of the problems may persist. For example, a loud mid-frequency instrument, such as a saxophone, might trigger wideband compression, causing noticeable attenuation to high-frequency instruments such as high hats when the saxophone begins to play. In addition, the low-frequency band might be attenuated separately when loud bass notes are played; this may modify the long-term average spectral balance, causing the bass to seem proportionally weaker compared to the original sound. The solution presented by U.S. Pat. No. 4,249,042 is thus not entirely satisfactory.
A need accordingly exists in the art for a spectral balance compensation that may be implemented in either the time or frequency domain. Both real-time and non-real-time processing would preferably be supported.