The object of audio dynamics processing is to alter the relative level or dynamics of an audio signal to be within some desired limits. This is generally achieved by creating some type of time-varying measure of an audio signal's level (rms level or peak level, for example) and then computing and applying a signal modification (a gain change, for example) that is a function of the level measure. Dynamics processors sharing such a mode of operation are set forth in International Patent Application PCT/US 2005/038579 of Alan Jeffrey Seefeldt, published as WO 2006/047600 on May 4, 2006 and include automatic gain controls (AGCs), dynamic range controls (DRCs), expanders, limiters, noise gates, etc. The Seefeldt application designates the United States among other entities. The application is hereby incorporated by reference in its entirety.
FIG. 1 depicts a high-level block diagram of a generic audio dynamics processor that processes an audio signal (a single channel of a multichannel audio signal or an audio signal having only one channel). The processor may be considered to have two paths: an upper “signal” path 2 and a lower “control” path 4. On the lower control path, the level of the audio signal is measured by a measuring device or process (“Level Measure”) 6 and this measurement, a measure of the signal level, is then used by a dynamics control device or process (“Dynamics Control”) 8 to compute one or more signal modification parameters. Such parameters function as signal modification control signals and are used to modify the audio signal according to a dynamics processing function, which function may be a desired dynamics processing profile such as shown in FIG. 3b, described below. As shown, the modification parameters are derived from the input audio signal. Alternatively, the modification parameters may be derived from the processed (output) audio or from a combination of the input and output audio signals. In the audio signal path 2, the modification parameters generated by the Dynamics Control 8 are applied to the audio to control the modification of the audio, thereby generating the processed audio. The application of the modification parameters to an audio signal may be accomplished in many known ways and is shown generically by the multiplier symbol 12. In the audio signal path 2, the audio may be delayed by a delay device or process (“Delay”) 10 to compensate for any delay associated with the level estimation and dynamics control processes.
When dealing with complex multichannel audio material, care must be taken in computing and applying the signal modifications in order to avoid the introduction of perceptible artifacts. A basic dynamics processor receiving a multichannel audio signal input might compute a signal level that is representative of all channels combined in total and then apply the same modification to all channels based on such a total level measure. In modifying all channels in the same way, such an approach has the advantage of maintaining the relative levels among all channels, thereby preserving the spatial image (including, for example, the location of virtual images panned between channels as well as perceived diffuseness). Such an approach may work well if the applied modifications are not overly aggressive.
However, problems may arise when the desired modifications are more severe. Consider a multichannel audio signal (5.1 channels, for example) to which a dynamic range controller with a very high compression ratio is applied. With such a processor, signals above the compression threshold are attenuated significantly to bring the signal level closer to the threshold. Assume that the audio signal contains relatively constant-level background music in all channels for which the total level after combining all channels is below the compression threshold. Assume further that a brief but loud segment of dialog is introduced into the center channel. Due to the dialog, the total level of all channels combined now exceeds the compression threshold and the entire signal is therefore attenuated. Once the dialog is finished, the signal level falls back below the compression threshold and no attenuation is applied. As a result, the background music from the left, right, left surround, and right surround channels is heard to fluctuate in level or “pump” down and back up in accordance with the dialog in the center channel. The effect can be very unnatural sounding and disturbing for a listener. This type of artifact, a type of cross-modulation or intermodulation, is well recognized in the field of audio dynamics processing, and a typical prior art solution involves applying dynamic range control independently to each channel. Although such a solution may correct the aforementioned problem, it may have the disadvantage of altering the spatial image of the audio. In particular, virtual sources panned between two channels may appear to “wander” due to differing amounts of attenuation applied to the two channels. Thus, there is a need for a solution that addresses both the pumping and the unstable image problems.
Analogous problems exist when considering the spectrum of a single channel of audio. Consider a single channel that contains a sustained string note at mid to high frequencies for which the signal level is below the compression threshold. Now consider a very loud bass drum hit introduced at the low frequencies causing the signal level to momentarily increase above the compression threshold. The entire signal is momentarily attenuated resulting in the strings being perceived to pump unnaturally down and up in level in accordance with the bass drum. A typical prior art solution to this problem is to break the audio signal into multiple frequency bands and then apply dynamic range control independently to each band. This reduces the pumping problem, but may alter the perceived spectral balance or timbre. Thus, there is a need for a solution that reduces pumping while reducing changes in the perceived spectral balance.