Programming such as television programs or theatrical feature films is, in many cases, produced with variable loudness and wide dynamic range to convey emotion or a level of excitement in a given scene. For example, a movie may include a scene with the subtle chirping of a cricket and another scene with the blasting sound of a shooting cannon. Interstitial material such as commercial advertisements, on the other hand, is very often intended to convey a coherent message, and is, thus, often produced at a constant loudness, narrow dynamic range, or both. Other type of content, such as news gathering, documentaries, children programming, modern music, classical music, live talk-shows, etc., may have inconsistent loudness levels or unpredictable loudness ranges.
Conventionally, annoying disturbances occurred at the point of transition between the various programming, and often between the programming and the interstitial material. This is commonly known as the “loudness inconsistency problem” or the “loud commercial problem.” In some cases, even when switching between programming and interstitial material that had matched average loudness and dynamic range, the loudness of the programming may decrease for artistic reasons for a period of time, possibly enough time to cause users to increase the volume of the audio. When this quieter-than-average section of the program switched to interstitial material that matched the original average loudness of the programming, the interstitial material may be too loud due to the increase in volume by the user.
This loudness inconsistency problem is experienced by TV viewers, radio listeners, and any other media user (such as web media, streaming, mobile, OTT, portable player, in-flight entertainment, etc.) when the reproduced content (or a sequence of different content) generates inconsistent, uncomfortable, or annoying sound pressure levels. Another example is a feature film being transmitted on TV or on a mobile device. Because of the way the film was initially produced for the theatrical representation, the modulation of its loudness levels would exceed the hearing comfort zone when reproduced in a home environment via a consumer device such as a TV set or a mobile device. The viewer/listener would have to repeatedly control the volume level of the device in order to make soft levels audible (like dialogs) and loud levels not annoying (like action scenes with loud music and sound effects).
Conventionally, processes addressing the loudness inconsistency problem modified the audio itself and at its source, thus making the processes irreversible. However, not all viewers may desire to have the programming audio changed in such a way. Furthermore, the user device could be used to retransmit the live stream to any other consumer device, rather than to reproduce the content itself. Consequently, reducing the dynamic range for fulfilling the audio characteristics of the receiver would generate a useless audio quality degradation in case the final reproduction device was capable of supporting larger dynamics or frequency range. Due to the variety of possible distribution platforms, predicting how a programming would be ultimately reproduced is no longer possible, and any audio processing applied beforehand could result inappropriate to the specific listening scenario.
Also conventionally, processes addressing the loudness inconsistency problem introduced sound artifacts or alterations to the spectral balance of the source content. These issues diminish the listener's experience.