Loudness leveling, or the automatic reduction of the range between loud and soft sections of an audio signal, is desirable in many situations, particularly with respect to broadcast television networks. Poor authoring practices lead to wide variation in loudness levels within a television program, advertisement or both, and programs or advertisements with significantly different loudness levels are frequently concatenated. Television viewers often find themselves adjusting the volume control on their television or sound playback system to compensate for these variations; however, the reaction time of such viewers is often not fast enough to avoid annoyance. In addition, some programs (such as movies) have a very high dynamic range. These ranges are often too wide for home listening, where the volume position required to hear dialogue may result in very loud levels for sound effects and music, which can disturb others in the household.
Existing methods for loudness leveling include compressor and limiter methods. These methods integrate audio signal level or power over time. The shorter the integration time, the faster the algorithm can measure and adjust short term fluctuations in loudness. The longer the integration time, the more the average loudness is affected, but short term fluctuations persist. These methods typically operate by gain adjusting the whole audio signal, i.e., all frequencies at once. This can result in audible artifacts, such as “breathing” or “pumping”. More recently, psychoacoustic methods for loudness measurement and adjustment have been developed, such as those described in U.S. Patent Application Publication No. 2007/0092089 A1 to Seefeldt et al., published Apr. 26, 2007, herein incorporated by reference in its entirety. These algorithms use spectral analysis and models of human hearing to adjust the audio in ways that vary across frequency and vary with measured loudness level. These methods work well in adjusting short term loudness on millisecond to second timescales, with very few audible artifacts. US2010/0046765A1 describes a television device comprising a first section configured to estimate a long-term arithmetic average of loudness, and a second section configured to suppress short bursts of loudness. US2007/0291959A1 describes a method for calculating and adjusting the perceived loudness. US2008/0253586A1 describes a system and method for controlling audio loudness. US2010/0272290A1 describes a method for improving loudness consistency at program boundaries. US2011/0150242A1 describes a method for adaptive loudness leveling. WO2007/127023A1 describes an audio gain control using specific loudness based auditory event detection.
Separate from loudness levels, methods also exist for objectively measuring the perceived loudness of audio signals. Examples include A-, B-, and C-weighted power measures, as well as psychoacoustic models of loudness, such as those described in “Acoustics—Method for calculating loudness level,” ISO 532 (1997), and in U.S. Patent Application Publication No. 2007/0092089 A1. Weighted power measures operate by taking an input audio signal, applying a known filter that emphasizes more perceptibly sensitive frequencies, while deemphasizing less perceptibly sensitive frequencies, then averaging the power of the filtered signal over a predetermined length of time. The recently developed ITU-R BS.1770-2 objective loudness measurement standard uses a weighting filter similar to B-weighting, and removes parts of the audio signal that are quiet or silent from the final average power calculation.
Psychoacoustic methods are typically more complex and aim to better model the workings of the human ear. Such psychoacoustic methods divide the signal into frequency bands that mimic the frequency response and sensitivity of the ear, then manipulate and integrate such bands while taking into account psychoacoustic phenomena, such as frequency and temporal masking, as well as the non-linear perception of loudness with varying signal intensity. The aim of all such methods is to derive a numerical measurement that closely matches the subjective impression of the audio signal. These methods are typically useful for measuring the longer term perceived loudness of an audio signal, e.g., where the audio signal length is 30 seconds or more, and typically minutes or hours. Over many years, the development and acceptable of these objective measurement algorithms has been accompanied by subjective testing, i.e., comparing the objective algorithm's measurements to human listening.
Recently, there has been a growing need for broadcast television audio signals to maintain a consistent loudness, particularly with respect to commercials. This need has been driven by government regulation, such as by Federal Communications Commission Publication No. FCC 11-84, “Notice of Proposed Rulemaking: Implementation of the Commercial Advertisement Loudness Mitigation (CALM) Act”. Since broadcasters have a mixture of well-authored content with known average loudness and dynamics and unknown content with unknown average loudness and possibly wide dynamics, they frequency use loudness leveling equipment in-line with the real-time audio signal that eventually makes its way to the television viewer. However, loudness levelers are typically optimized for short-term behavior to minimize artifacts when level-adjusting the audio signal, and as a result, the leveled audio signal is not necessarily consistent when measured using longer term measures. That is, the measured loudness of the sections of the leveled audio, e.g., 30 seconds or more, is not consistent.