Modern distribution of audio signals to consumers necessarily involves the use of data rate reduction or compression techniques to lower the required amount of data required to deliver these audio signals to consumers while causing minimal impact to the original audio quality. Systems including AC-3, DTS, MPEG-2 AAC and HE AAC are examples of common audio coding systems that use data reduction techniques. For the purposes of this invention, only the AC-3 system will be used as an example, but the invention is applicable to any coding system and is applicable to television, radio, internet, or any other means of program distribution or transmission.
Audio metadata, also known as data about the audio data, is also included with these systems to describe the encoded audio. This data is multiplexed in with the compressed or coded audio data and delivered to consumers where it is extracted and applied to the audio in a user-adjustable manner. One such metadata parameter is called dialnorm and is intended to indicate the average loudness of a program. Other parameters such as dynrng and compr, collectively referred to as Dynamic Range Control or DRC, are intended to control the variation between the quietest and loudest parts of an audio signal.
Programs are in many cases produced with loudness and dynamic range that varies to convey emotion or the level of excitement in a given scene, while interstitial or commercial material is very often produced to convey a message and may be at a constant loudness. In some cases these program and commercial elements can differ substantially in average loudness and dynamic range and many consumer environments are not conducive to these large variations in loudness or dynamic range. Artistic intent while perhaps appropriate in more carefully controlled situations can cause audibility problems and result in viewer or listener complaints. This is commonly referred to as the “loud commercial problem” but can be caused as much by excessive dynamic range as mismatched loudness.
An additional complicating factor is the desire and sometimes the legal requirement for maintaining the integrity of the original audio as some viewers and even regulatory bodies may require that the program audio not be changed in any way. Because of this, processes applied to the audio should ideally be reversible.
Prior art has described two general types of systems capable of controlling audio dynamic range: AGC-type systems that detect and adjust the level of applied audio signals in a permanent and non-reversible manner, and systems that use side-chain data or metadata to allow the original audio to be carried to consumers and be modified using the metadata.