1. Technical Field
The present invention generally relates to audio signal processing, more particularly, to enhancing audio streams and recordings by restoring or accentuating their dynamic range.
2. Description of the Related Art
Following the adage that ‘louder is better’, it has become common practice in the recording industry to master and release recordings with higher levels of loudness. With the advent of digital media formats such as CDs, music was encoded with a maximum peak level defined by the number of bits that can be used to represent the encoded signal. Once the maximum amplitude of a CD is reached, the perception of loudness can be increased still further through signal processing techniques such as multiband dynamic range compression, peak limiting and equalization. Using such digital master tools, sound engineers can maximize the average signal level by compressing transient peaks (such as drum hits) and increasing the gain of the resulting signal. Extreme uses of dynamic range compression can introduce clipping and other audible distortion to the waveform of the recording. Modern albums that use such extreme dynamic range compression therefore sacrifice quality of musical reproduction to loudness. The practice of increasing music releases' loudness to match competing releases can have two effects. Since there is a maximum loudness level available to recording (as opposed to playback, in which the loudness is limited by the playback speakers and amplifiers), boosting the overall loudness of a song or track eventually creates a piece that is maximally and uniformly loud from beginning to end. This creates music with a small dynamic range (i.e., little difference between loud and quiet sections), oftentimes such an effect is viewed as fatiguing and void of the artist's creative expression.
The other possible effect is distortion. In the digital realm, this is usually referred to as clipping. Digital media cannot output signals higher than the digital full scale, so whenever the peak of a signal is pushed past this point, it results in the wave form becoming clipped. When this occurs, it can sometimes produce an audible click. However, certain sounds like drum hits will reach their peak for only a very short time, and if that peak is much louder than the rest of the signal, this click will not be heard. In many cases, the peaks of drum hits are clipped but this is not detected by the casual listener.
FIGS. 1a and 1b provide a visual representation of deleterious mastering techniques. The audio recording waveforms depicted in FIGS. 1a and 1b represent an originally mastered track and a version of the same track that has been mastered using different techniques. FIG. 1a represents the original recording, the presence of numerous peaks indicates a high dynamic range that is representative of the kinds of dynamics present in the original performance. This recording provides for a vibrant listening experience as certain percussive notes, such as drum hits, will sound punchy and clear. In contrast, the recording depicted in FIG. 1b is remastered for a louder commercial CD release. Most of the peaks present in the original recording are compressed or even clipped, and the dynamic range of the recording has been compromised as a result. This increasingly aggressive use of dynamic range compression at the mastering stage of commercial music has spawned much backlash from consumers, producers and artists.
Approaches discussed in the audio industry for addressing this issue concentrate on questioning the mastering techniques that are at the origin of the issue. One such example is described in Bob Katz. Mastering Audio, Second Edition: The Art and the Science. Katz describes how recordings can be mastered for loudness without distorting the final result using calibrated monitoring of the processing signal and using more moderate compression parameters. While most mastering engineers would concur with Katz's approach is often superseded by demands of the studio management. Even if more conservative mastering techniques do become the new norm, it does not resolve the problem for the body of existing recordings already mastered and distributed to end-users.
Existing processing techniques for modifying the dynamics of an audio recording are known in the art. One such process is loudness leveling where differences between the perceived loudness of audio materials, which have been subjected to varying degrees of dynamic range compression, are normalized to some predetermined level. However, these approaches are used to normalize the average loudness of consecutive tracks played from various sources and do not make any attempt to restore the dynamic range of overly dynamic range compressed content. As a result, compressed media can sound even more devoid of dynamic expression when played at lower prescribed listening levels.
Another known technique is applying an upward expander as described in U.S. Pat. No. 3,978,423 issued to Bench, titled Dynamic Expander. An upward expander, applies a time-varying gain to the audio signal according to a fixed ‘expansion curve’ whereby the output signal level is greater than the input level above a selected threshold. As a result, the amplitude of the louder portions of the source signal is increased. However, this can result in originally dynamic soundtracks having overemphasized transients in the output signal.
Another known technique is dynamic spectral equalization, where lower and higher frequency bands are boosted when transients are detected. As a result, a more dynamic output is yielded. Dynamic spectral equalization is described in X Rodet, F Jaillet, Detection and Modeling of Fast Attack Transients (2001), Proceedings of the International Computer Music Conference; U.S. Pat. No. 7,353,169 issued to Goodwin et al, titled Transient Detection and Modification in Audio Signals; and U.S. patent application Ser. No. 11/744,465 issued to Avendano et. al., titled Method for Enhancing Audio Signals. Unlike the previous approaches, these dynamic enhancement techniques exclusively affect signal transients. However, it affects all signal transients, even those that already exhibit high dynamics. Dynamic spectral equalization generally applies processing to all audio signal content, whether or not it is needed. This can result in an overly dynamic processed output for certain types of audio content
U.S. Pat. No. 6,453,282, issued to Hilpert et al. outlines a method of transience detection in the discrete-time audio domain. Such time-domain methods are less reliable when analyzing heavily dynamic range compressed material as changes in energy due to transients becoming less apparent when looking at the signal as a whole. This leads to the misclassification of transient signals and results in yielding false positives.
In view of the ever increasing interest to improve the rendering of audio recordings, there is a need in the art for improved audio processing.