The problem of varying mixing and playback levels of audio content is addressed in the movie industry by specifying the SMPTE (Society of Motion Picture and Television Engineers) recommendations which guarantee a consistent playback level across movie theaters and for different content. The SMPTE recommendations ensure that audio content is re-produced at a consistent level pleasant to consumers.
The situation in broadcast is more challenging, given that the individual playback systems of users are not controlled by technicians and due to the more complex distribution channels and networks for broadcast. With the introduction of digital broadcast, the industry established the concept of time-varying-metadata which enables to control gain-values at the receiving end to tailor content to a specific listening environment. An example is the metadata included in Dolby Digital which includes general loudness normalization information (“dialnorm”) for dialogues, as well as gain-words (“dynrng” and “compr”) to reduce the dynamic range of a program. It should be noted that throughout this specification and in the claims, references to Dolby Digital shall be understood to encompass both the Dolby Digital and Dolby Digital Plus coding systems. Such systems are specifically powerful for situations where the operating modes at the receiver relating to the listening environment and the listening preferences are specified. By way of example, the dialnorm standard allows the specification of a so called “line mode” and “RF mode” for Dolby Digital. The “RF mode” is designed for peak limiting situations where the decoded program is intended for delivery through an RF input on a television, such as through the antenna output of a set-top box. The “Line mode” provides less compression of the dynamic range than the “RF mode” and also allows user adjustment of the low-level boost and high-level cut parameters within a home decoder. The adjustment or “scaling” of the boost and cut areas allows the user to customize the audio reproduction for their specific listening environment. These technologies are also part of today's audio/video discs like DVD and Blu-ray.
An important distribution channel for audio content is still the CD which contains 16-bit PCM data without any metadata. The peak-normalization typically used for CD's is said to be the main reason for the so called “loudness war” which has led to reduced dynamic range of audio content with high average audio levels. However, consumer behavior changed over recent years with coded content (e.g. content in data-reduced formats such as mp3) becoming more popular and important for content distribution and storage. Such formats allow for virtually unlimited dynamic range which content owners and audio enthusiasts can take advantage of. In addition, the increasing popularity of mobile phones, smart phones and other portable electronic devices as personal media players has created new challenges in designing high quality playback devices that meet customer expectations of consistent audio leveling and best audio quality under various listening conditions. The large number of content in personal music collections (often exceeding thousands of files) as well as the broad range of audio formats such as mp3, HE-AAC, OGG, WMA, and Dolby Digital further complicate the problem of providing audio playback devices with consistent audio leveling.