The inclusion of metadata along with audio signals has allowed for significant improvements in the user listening experience. For a pleasant user experience, it is generally desirable for the general sound level or loudness of different programs to be consistent. However, the audio signals of different programs usually originate from different sources, are mastered by different producers and may contain diverse content ranging from speech dialog to music to movie soundtracks with low-frequency effects. This possibility for variance in the sound level makes it a challenge to maintain the same general sound level across such a variety of programs during playback. In practical terms, it is undesirable for the listener to feel the need to adjust the playback volume when switching from one program to another in order to adjust one program to be louder or quieter with respect to another program because of differences in the perceived sound level of the different programs. Techniques to alter the audio signals in order to maintain a consistent sound level between programs are generally known as signal leveling. In the context of dialog audio tracks, a measure relating to the perceived sound level is known as the dialog level, which is based on an average weighted level of the audio signal. Dialog level is often specified using a dialnorm parameter, which indicates a level in decibels (dB) with respect to digital full scale.
In the past, broadcasters working with audio signals had particular problems with audio signals such as soundtracks whose audio levels fell above or below that of other programming, particularly audio that may vary substantially with time, such as dialog. With the development of digital audio, multi-channel audio and particularly the ability to include metadata along with the audio signal, producers and audio engineers now have a wide range of options to specify settings, which can be embedded in the signal as metadata in order to precisely specify playback levels for various playback systems. These settings can even be provided at the postproduction stage, so broadcasters can deliver a very consistent audio signal and ensure that the most important audio elements come through to the end user.
Similarly, when mixing audio signals, it is also desirable for a pleasant user experience to also maintain the same perceived sound level when mixing audio input signals into a single signal. One technique to realize this goal is for the input signals to include mixing metadata that specifies how the signal should be scaled when mixed.
Many current audio standards allow the content producer to include associated audio signals coupled with the main audio signal including time-varying metadata along with associated audio signals. For example, a content producer could provide a track with director's comments with such an associated audio signal. The metadata accompanying the associated signal specifies exactly how the content producer wishes for the audio signal of the main track to be adjusted during mixing for combined playback. For example, E-AC-3 (Dolby Digital Plus) and High-Efficiency Advanced Audio Coding (HE-AAC) are two examples of standards that provide such mixing metadata. For details, see “ETSI TS 102 366 v1.2.1 (2008-08): Digital Audio Compression (AC-3, Enhanced-AC-3) Standard”, which describes E-AC-3 (Dolby Digital Plus); or see “ETSI TS 101 154 V1.9.1 (2009-09): Digital Video Broadcasting (DVB); Specification for the use of Video and Audio Coding in Broadcasting Applications based on the MPEG-2 Transport Stream”, which describes High-Efficiency Advanced Audio Coding (HE-AAC). Both are hereby incorporated in their entirety by reference.
However, a user may wish to diverge from the producer-provided settings, which are dictated by the metadata transmitted along with the associated signal. For example, a user who activates the director's comments while watching a movie may at some point during playback decide that he would rather hear the original dialog that the producer may have indicated in the metadata to be attenuated in the mixing in order to not override the director's comments.
Thus, there is a need to provide an adjustment that allows the user to adjust the mixing of the input audio signals while also providing for a pleasant user experience by also maintaining the perceived sound level of the mixed signal. Furthermore, there is also a need to provide adjustment of the mixing of the input audio signals while maintaining a consistent perceived sound level for the mixed signal, even if the scaling information from the metadata and an external user input may be time-changing, so that there is no need to perform additional leveling on the mixed signal.