The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
The dynamic range of an audio signal is generally the ratio between the largest and smallest possible values of the sound embodied in the signal, and is usually measured as a decibel (base-10) value. In many audio processing systems, dynamic range control (or dynamic range compression, DRC) is used to reduce the level of loud sounds and/or amplify the level of quiet sounds to fit wide dynamic range source content into a narrower recorded dynamic range that can be more easily stored and reproduced using electronic equipment. For audio/visual (AV) content, a dialog reference level may be used to define the “null” point for compression through the DRC mechanism. DRC acts to boost content below the dialog reference level and cut content above the reference level.
In a known audio encoding system, metadata associated with the audio signal is used to set the DRC level based on the type and intended usage of the content. The DRC mode sets the amount of compression applied to the audio signal and defines the output reference level of the decoder. Such systems may be limited to two DRC level settings that are programmed into the encoder and selected by the user. For example, a dialnorm (dialog normalization) value of −31 dB (Line) is traditionally used for content that is played back on an AVR or full dynamic range capable devices, and a dialnorm value of −20 dB (RF) is used for content played back on television sets or similar devices. This type of system allows for a single audio bitstream to be used in two common but very different playback scenarios through the use of two different sets of DRC metadata. Such systems, however, are limited to the preset dialnorm values and are not optimized for playback in the wide variety of different playback devices and listening environments that are now possible through the advent of digital media and Internet-based streaming technology.
In current metadata-based audio encoding systems, a stream of audio data may include both audio content (e.g., one or more channels of audio content) and metadata indicative of at least one characteristic of the audio content. For example, in an AC-3 bitstream there are several audio metadata parameters that are specifically intended for use in changing the sound of the program delivered to a listening environment. One of the metadata parameters is the dialnorm parameter, which indicates the mean loudness level of dialog (or average loudness of the content) occurring in an audio program, and is used to determine audio playback signal level.
During playback of a bitstream comprising a sequence of different audio program segments (each having a different dialnorm parameter), an AC-3 decoder uses the dialnorm parameter of each segment to perform a type of loudness processing which modifies the segment's playback level or loudness such that the perceived loudness of the segment's dialog is at a consistent level. Each encoded audio segment (item) in a sequence of encoded audio items would (in general) have a different dialnorm parameter, and the decoder would scale the level of each of the items such that the playback level or loudness of the dialog for each item is the same or very similar, although this might require application of different amounts of gain to different ones of the items during playback.
In some embodiments, the dialnorm parameter is set by a user, and is not generated automatically, although there is a default dialnorm value if no value is set by the user. For example, a content creator may make loudness measurements with a device external to an AC-3 encoder and then transfer the result (indicative of the loudness of the spoken dialog of an audio program) to the encoder to set the dialnorm value. Thus, there is reliance on the content creator to set the dialnorm parameter correctly.
There are several different reasons why the dialnorm parameter in an AC-3 bitstream may be incorrect. First, each AC-3 encoder has a default dialnorm value that is used during the generation of the bitstream if a dialnorm value is not set by the content creator. This default value may be substantially different than the actual dialog loudness level of the audio. Second, even if a content creator measures loudness and sets the dialnorm value accordingly, a loudness measurement algorithm or meter may have been used that does not conform to the recommended loudness measurement method, resulting in an incorrect dialnorm value. Third, even if an AC-3 bitstream has been created with the dialnorm value measured and set correctly by the content creator, it may have been changed to an incorrect value by an intermediate module during transmission and/or storage of the bitstream. For example, it is not uncommon in television broadcast applications for AC-3 bitstreams to be decoded, modified and then re-encoded using incorrect dialnorm metadata information. Thus, a dialnorm value included in an AC-3 bitstream may be incorrect or inaccurate and therefore may have a negative impact on the quality of the listening experience.
Further, the dialnorm parameter does not indicate the loudness processing state of corresponding audio data (e.g. what type(s) of loudness processing that has been performed on the audio data). Additionally, presently deployed loudness and DRC systems, such as systems in Dolby Digital (DD) and Dolby Digital Plus (DD+) systems, were designed to render the AV content in a consumer's living room or a movie theater. To adapt such content for playback in other environments and listening equipment (e.g., a mobile device), post-processing must be applied ‘blindly’ in the playback device to adapt the AV content for that listening environment. In other words, a post-processor (or a decoder) assumes that the loudness level of the received content is at a particular level (e.g., −31 or −20 dB) and the post-processor sets the level to a pre-determined fixed target level suitable for a particular device. If the assumed loudness level or the pre-determined target level is incorrect, the post-processing may have the opposite of its intended effect; i.e., the post-processing may make the output audio less desirable for a user.
The disclosed embodiments are not limited to use with an AC-3 bitstream, an E-AC-3 bitstream, or a Dolby E bitstream, however for convenience such bitstreams will be discussed in conjunction with a system that includes loudness processing state metadata. Dolby, Dolby Digital, Dolby Digital Plus, and Dolby E are trademarks of Dolby Laboratories Licensing Corporation. Dolby Laboratories provides proprietary implementations of AC-3 and E-AC-3 known as Dolby Digital and Dolby Digital Plus, respectively.