Dynamics compression may be used during audio mastering, broadcast or playback to adjust the gain as a function of time, in order to achieve a desired distribution of output levels. This type of compression (not to be confused with data compression algorithms such as mp3) helps keep the volume in the “sweet spot” between too soft and too loud, reducing the user's need to manually adjust (or “ride”) the volume. Dynamics compression allows the quiet portions of the program to remain audible, even in a relatively noisy environment, while keeping the louder parts from becoming disturbingly loud.
FIG. 1 is a block diagram of a typical feed-forward single-band dynamics compressor (C). Single-band (also known as wideband) dynamics compressors apply the same processing to the entire frequency range. The RMS level of an input signal 10 is extracted 12 and converted 14 to a logarithmic representation. Next, a transfer function (or characteristic) 16 maps the input signal level to a desired output level. FIG. 2A shows an exemplary transfer function 16. FIG. 2B illustrates the gain curve of the transfer function of FIG. 2A which results from the mapping of the input signal level to a desired output level. When the input level is less than the compression threshold 18 (−30 dB in this example), ignoring the noise gate region, each additional dB of input level produces one additional dB of output level. However, when the input level exceeds the compression threshold 18, each additional dB of input level only produces 1/r dB of additional output level, where r is the compression ratio (in FIG. 2A, r=3).
The gain is then converted 20 back from the logarithmic to a linear representation, smoothed 22 and applied 24 to a copy of the input signal 10 which has been delayed (z−n) 26 to compensate for the delay of the gain computation path 28. Finally, some post-gain 30 may be applied 32 to help compensate for some of the gain loss due to the compression. More information on single-band dynamics compressors may be obtained from the following references (the disclosures of which are hereby incorporated by reference): G. W. McNally, “Dynamic Range Control of Digital Audio Signals,” J. Audio Engineering Society, Vol. 32, No. 5, 1984 May; Udo Zölzer, Digital Audio Signal Processing, John Wiley & Sons Ltd., 1997, pp. 207-219; and Earl Vickers, “Automatic Long-term Loudness and Dynamics Matching,” presented at the AES 111th Convention, New York, 2001.
A multiband dynamics compressor divides the frequency range into a plurality of frequency bands and then processes the input signal through each frequency band independently. In many implementations, a different set of processing parameters (such as compression ratios) is applied to each frequency band. FIG. 3 is a block diagram of a typical feed-forward multiband dynamics compressor. The input signal 10 is applied to a band splitting block 30 which divides the signal 10 in the frequency domain into a plurality of signals 32 each being limited to a certain frequency band, there being some degree of frequency overlap between adjacent ones of the frequency bands. In a frequency-domain implementation, frequency band splitting block 30 may consist of a fast Fourier transform (FFT) or short-time Fourier transform (STFT), with an optional analysis window; in a time-domain implementation, frequency band splitting block 30 may consist of one or more crossover filters. Each signal 32 is then applied to one of a plurality of included feed-forward single-band compressors (C1-C5) 34. Each compressor 34 may, for example, have a configuration like that of the compressor (C) shown in FIG. 1. The output signals 36 from the plurality of compressors 34 are then combined 38 together to generate a signal output 40. In a frequency-domain implementation, combining block 38 may consist of an inverse fast Fourier transform (IFFT) or inverse short-time Fourier transform (ISTFT), which may include an overlap-add implementation with optional synthesis window; in a time-domain implementation, combining block 38 may consist of a summer.
Multiband dynamics compressors are powerful, versatile tools for audio mastering, broadcast and playback. When used properly, multiband dynamics compressors have a number of advantages over single-band dynamics compressors, foremost being the fact that loud sounds in one band will not trigger artifacts such as “pumping” or “breathing” in the other bands. However, multiband dynamics compressors have a known problem relating to frequency response.
It has been noted that while multiband compressors may have a generally flat spectral response to broad-band inputs, the narrow-band response exhibits peaks near the edges of the frequency bands. In other words, as shown in FIG. 4, when excited by a (time-varying) narrow-band input, such as a swept sinusoid, the magnitude response of the multiband dynamics compressor like that of FIG. 3 displays unwanted peaks 42 at the boundaries 44 between frequency bands 46. Specifically, FIG. 4 illustrates the magnitude response 48 of a multiband dynamics compressor of the type shown in FIG. 3 to a swept sinusoid 50 with band boundaries 44 located at 500 Hz and 2000 Hz. The unwanted peak 42 phenomenon is caused by the fact that, when a sinusoid's frequency is near the boundary 44 between two compressor frequency bands 46, only a portion of the sinusoid's energy is allocated to each frequency band 46. Therefore, the energy seen in each of the adjacent frequency bands 46 at the boundary 44 is not fully representative of the energy of the sinusoid signal 50. This triggers less compression of the sinusoid 50 at and near the boundary 44 than would occur at other frequencies located in the middle of the frequency bands 46. In other words, less dynamics compression is triggered with respect to the narrowband component at the boundary 44 between frequency bands 46 than if the entire energy of the narrowband component were allocated to a single one of the adjacent frequency bands. As a result, the response is not shift-invariant with respect to frequency. Those skilled in the art refer to this as a “3 dB bump” in the response; but is it recognized that the size of the peak 42 is a function of the compression ratio and can be as much as 6 dB.
It will be noted that absent the application of a non-linear compression, the energy would simply be split into two bands and then recombined with perfect reconstruction. The problem arises from the combination of the splitting of the energy into multiple bands and the applied non-linearity of the compressor. This problem occurs in all conventional multiband compressors, whether analog or digital, time-domain or frequency-domain.
In time-domain compressors, the bands are separated by cross-over filters having gain curves with a finite slope. For ease of calculating the sum of the band outputs, the use of Linkwitz-Riley crossovers, which have a gain of about −6 dB at the crossover frequency and in-phase outputs at all frequencies, is assumed. Similar compressor artifacts occur with other filters, such as odd-order Butterworth crossovers.
For an input sinusoid whose frequency is in the middle of a compressor frequency band, with RMS level X (in dBFS), threshold L (dBFS), compression ratio r, and slope S=1/r (slope of the transfer function's compressor line segment), the output RMS level is:
                              Y          c                =                  {                                                                                          L                    +                                                                  (                                                  X                          -                          L                                                )                                            ⁢                      s                                                        ,                                                                              X                  >                  L                                                                                    X                                                                                  X                    <=                    L                                    ,                                                                                        (        1        )            and the compressor gain in dB is:Gc=Yc−X=max(X−L,0)(s−1),  (2)where the subscript c signifies the case of a sinusoid near the center of the compressor band. For simplicity of notation, the time index has been omitted.
In the case of an input sinusoid midway between two compressor frequency bands, with the same input RMS level and a filter gain of F (in dBFS) at the crossover frequency, the output RMS level in each compressor band is given by:
                              Y          b                =                  {                                                                                          L                    +                                                                  (                                                  F                          +                          X                          -                          L                                                )                                            ⁢                      s                                                        ,                                                                                                  F                    +                    X                                    >                  L                                                                                                      F                  +                  X                                                                                                                        F                      +                      X                                        <=                    L                                    ,                                                                                        (        3        )            and the compressor gain in each of the bands b is:Gb=Yb−(F+X)=max(F+X−L,0)(s−1)  (4)
Since the Linkwitz-Riley crossover filters add in-phase at all frequencies, the total output RMS level of the sum of the two bands (in dB) is:
                                          Y            s                    =                                    20              ⁢                                                log                  10                                (                                  2                  ·                                      10                                                                  Y                        b                                            20                                                                      )                                      =                                          20                ⁢                                  log                  10                                ⁢                2                            +                              Y                b                                                    ,                            (        5        )            where the subscript s signifies the sum of the two bands.
For sinusoids that exceed the threshold (after cross-over filtering) and are thus compressed, the difference between the within-band and between-band cases is given byd=Ys−Yc=6+sF,  (6)assuming the input level X exceeds threshold L by more than the filter gain F.
Reference is now made to FIGS. 5A and 5B. As an example, consider an X=0 dBFS sinusoid and a compressor with threshold L=−20 dB, compression ratio r=2 and slope s=0.5. If the input sinusoid is positioned in the middle of one of the compressor's frequency bands 46, as shown in FIG. 5A, the gain Gc would be 20*(−0.5)=−10 dB, for an output level of Yc=−10 dB.
On the other hand, if the sinusoid's frequency equals the crossover frequency (boundary 44) between two frequency bands 46, as shown in FIG. 5B, the crossover filter gain F will be −6 dB (with a Linkwitz-Riley crossover), so each adjacent frequency band 46 will see an RMS level of X+F=−6 dB. In this case, the gain Gb will equal (−6+20)*−0.5=−7 dB, and the output level Yb in each band will be −6−7=−13 dB.
Since the Linkwitz-Riley crossover filters add in-phase at all frequencies, the amplitude of the sum of the two bands would be:
                              Y          s                =                              20            ⁢                                          log                10                            (                              2                ·                                  10                                                            -                      13                                        20                                                              )                                =                                    -              7                        ⁢                                                  ⁢                          dB              .                                                          (        7        )            
The result is that the frequency response at the crossover frequency is d=6+(0.5*−6)=3 dB higher than the response elsewhere. The worst case is with an infinite compression ratio (s=0), in which case the response would have a 6 dB peak at the crossover frequency. Thus, it has been shown how the “3 dB peak” is produced at or near the boundary 44 between two adjacent frequency bands 46.
For compressors implemented in the frequency domain, one might assume that the bands would have a perfect brick-wall response, thereby preventing this problem. However, this assumption does not take into account the effect of the short-time Fourier transform's analysis filter (i.e., input window, in the time domain), which splatters some of the sinusoid's energy into each of the adjacent bands. As a result, a similar crossover peak occurs with frequency-domain compressors (as seen previously in FIG. 4).
In summary, less compression is applied to a sinusoid at the crossover frequency (boundary 44) between two frequency bands 46 than would have been applied had the sinusoid been in the middle of one of the compressor frequency bands. As a result, the response is not shift-invariant with respect to frequency. Note: this issue is not readily apparent in response to typical wideband audio signals.
The foregoing problem has been examined previously in terms of frequency-domain sampling of the power spectrum and it has been noted that a power-symmetric or perfect reconstruction filter bank does not present a satisfactory solution. Rather, a proposed solution (prior art) is to decrease the frequency-domain sampling interval (i.e., to increase the number of bands and the overlap between bands). This solution is analogized to what happens in the human ear. Each hair cell in the cochlea is tuned to a particular frequency by the cochlea's mechanical resonance, which functions as sort of a mechanical Fourier transform. While the hair cells are very closely spaced in frequency, each one is sensitive to a bandwidth of roughly ⅓ octave, so they are heavily overlapped. The proposed solution, however, does not totally eliminate the problem, but with sufficiently overlapped bands the problem can be greatly minimized. Unfortunately, this approach requires a heavily oversampled filter bank, which can be relatively expensive to implement. There accordingly exists a need in the art for a more economic and successful solution.