1. Field of the Invention
The invention pertains to audio signal processing, and more particularly, to assessment of metadata associated with audio data bitstreams. Some embodiments of the invention are useful for assessing metadata associated with audio data that have been encoded in accordance with one of the formats known as Dolby Digital (AC-3), Dolby Digital Plus, and Dolby E, or another encoding format (e.g., MPEG-4 AAC). Dolby, Dolby Digital, Dolby Digital Plus and Dolby E are trademarks of Dolby Laboratories Licensing Corporation.
2. Background of the Invention
A typical stream of audio data (e.g., an AC-3 bitstream) includes both audio content (e.g., one or more channels of audio content) and metadata indicative of at least one characteristic of the audio content.
US Patent Application Publication No. US 2009/0063159 A1, by Brett G. Crockett, assigned to the assignee of the present invention and published on Mar. 5, 2009 (“Crockett”), describes methods and systems for verifying and correcting metadata associated with AC-3 bitstreams and other audio data streams. Crockett describes methods for determining whether the “DIALNORM” metadata parameter of an AC-3 bitstream is correct, including (in an output AC-3 bitstream) verification information indicative of whether the DIALNORM parameter is correct, and (if the DIALNORM parameter is not correct) including in the output AC-3 bitstream a corrected version of the DIALNORM parameter and optionally also corrected versions of related metadata parameters (corrected versions of the COMPR and DYNRNG parameters). The disclosure of Crockett (US Patent Application Publication No. US 2009/0063159 A1) in its entirety is hereby incorporated by reference into the present disclosure.
The metadata verification and correction methods described in Crockett are intended to be implemented in a processor (e.g., a decoder) with an aim to detect incorrect metadata in an input audio stream and to correct (within the processor) incorrect metadata so that the audio can be played back using the corrected metadata as intended by the content creator. The methods would thus be performed in a manner hidden from the user. The user would not know whether the metadata in the input audio stream was determined to be correct or incorrect. In contrast, the present invention (which would typically be implemented in test or measurement products) assesses metadata associated with an audio bitstream to generate output (e.g., a single number, referred to as a “metadata score”) indicative of metadata quality, in order to inform a user (e.g., a broadcaster) of the quality of the metadata. The output generated in accordance with the invention would typically be used to identify and fix metadata issues in systems (e.g., broadcast systems) employed to generate and/or disseminate the bitstream.
In typical implementations in test or measurement products, embodiments of the invention provide output (e.g., data indicative of a single number) indicative of the quality (e.g., correctness) of multiple metadata parameters included in an audio bitstream (e.g., an encoded audio bitstream that has been or is to be broadcast or otherwise disseminated), and optionally also output indicative of detailed information about the quality of each of two or more metadata parameters of the bitstream. The output is useful to enable or assist a user (e.g., a broadcaster) to diagnose where problems occur within a system which generates and/or disseminates the bitstream (e.g., a broadcast chain).
Although the invention is not limited to use with AC-3 encoded audio, for convenience it will be described in embodiments in which it assesses metadata of an AC-3 encoded audio bitstream. An AC-3 encoded bitstream comprises metadata and one to six channels of audio content. The audio content is audio data that has been compressed using perceptual audio coding. The metadata includes several audio metadata parameters (described below) that are intended for use in changing the sound of a program delivered to a listening environment.
Details of AC-3 (also known as Dolby Digital) coding are well known and are set forth many published references including the following:
ATSC Standard A52/A: Digital Audio Compression Standard (AC-3), Revision A, Advanced Television Systems Committee, 20 Aug. 2001;
Flexible Perceptual Coding for Audio Transmission and Storage,” by Craig C. Todd, et al, 96th Convention of the Audio Engineering Society, Feb. 26, 1994, Preprint 3796;
“Design and Implementation of AC-3 Coders,” by Steve Vernon, IEEE Trans. Consumer Electronics, Vol. 41, No. 3, August 1995;
“The AC-3 Multichannel Coder” by Mark Davis, Audio Engineering Society Preprint 3774, 95th AES Convention, October, 1993;
“High Quality, Low-Rate Audio Transform Coding for Transmission and Multimedia Applications,” by Bosi et al, Audio Engineering Society Preprint 3365, 93rd AES Convention, October, 1992; and
U.S. Pat. Nos. 5,583,962; 5,632,005; 5,633,981; 5,727,119; and 6,021,386.
Details of Dolby Digital Plus coding are set forth in “Introduction to Dolby Digital Plus, an Enhancement to the Dolby Digital Coding System,” AES Convention Paper 6196, 117th AES Convention, Oct. 28, 2004.
Details of Dolby E coding are set forth in “Efficient Bit Allocation, Quantization, and Coding in an Audio Distribution System”, AES Preprint 5068, 107th AES Conference, August 1999 and “Professional Audio Coder Optimized for Use with Video”, AES Preprint 5033, 107th AES Conference August 1999.
Details of MPEG-2 AAC coding are also well known and are set forth in ISO/IEC 13818-7:1997(E) “Information technology—Generic coding of moving pictures and associated audio information —, Part 7: Advanced Audio Coding (AAC),” International Standards Organization (April 1997); “MP3 and AAC Explained” by Karlheinz Brandenburg, AES 17th International Conference on High Quality Audio Coding, August 1999; and “ISO/IEC MPEG-2 Advanced Audio Coding” by Bosi, et. al., AES preprint 4382, 101st AES Convention, October 1996.
Each frame of an AC-3 encoded audio bitstream contains audio content and metadata for 1536 samples of digital audio. For a sampling rate of 48 kHz, this represents 32 milliseconds of digital audio or a rate of 31.25 frames per second of audio.
Each AC-3 frame is divided into sections, including: a Synchronization Information (SI) section which contains a synchronization word (SW) and the first of two error correction words (CRC1); a Bitstream Information (BSI) section which contains most of the metadata; six Audio Blocks (AB0 to AB5) which contain data compressed audio content (and can contain metadata); waste bits (W) which contain any unused bits left over after the audio content is compressed; an Auxiliary (AUX) information section which contains more metadata; and the second of two error correction words (CRC2). AC-3 frames and the sections of an AC-3 frame are described in more detail below.
In an AC-3 bitstream there are several audio metadata parameters that are specifically intended for use in changing the sound of the program delivered to a listening environment. Three of the metadata parameters relate to playback signal level and dynamic range: DIALNORM, COMPR and DYNRNG.
The DIALNORM parameter is intended to indicate the mean level of dialog occurring an audio program, and is used to determine audio playback signal level. During playback of a bitstream comprising a sequence of different audio program segments (each having a different DIALNORM parameter), an AC-3 decoder uses the DIALNORM parameter of each segment to modify the playback level or loudness of such that the perceived loudness of the dialog of the sequence of segments is at a consistent level. Each encoded audio segment (item) in a sequence of encoded audio items would (in general) have a different DIALNORM parameter, and the decoder would scale the level of each of the items such that the playback level or loudness of the dialog for each item is the same or very similar, although this might require application of different amounts of gain to different ones of the items during playback.
The COMPR and DYNRNG parameters (sometimes referred to hereinafter as “dynamic range compression” or “dynamic range control” parameters) are used to determine dynamic range of the audio playback signal. One or neither, but not both, of the COMPR and DYNRNG parameters is used in decoding, depending on a decoding mode.
DIALNORM typically is set by a user, and is not generated automatically, although there is a default DIALNORM value if no value is set by the user. For example, a content creator may make loudness measurements with a device external to an AC-3 encoder and then transfer the result (indicative of the loudness of the spoken dialog of an audio program) to the encoder to set the DIALNORM value. Thus, there is reliance on the content creator to set the DIALNORM parameter correctly. The COMPR and DYNRNG parameters, although related to the DIALNORM parameter, are typically calculated automatically during encoding in response to a user-set DIALNORM parameter value and one of a number of dynamic range compression profiles (or no profile, which results in application of DIALNORM but allows reproduction of the full dynamic range).
Other metadata parameters of an AC-3 bitstream include “downmixing” parameters (CLEV, CMIXLEV, SLEV, SURMIXLEV, MIXLEVEL and MIXLEVEL2) and parameters indicative of the number of audio channels of the bitstream (e.g., ACMOD and BSMOD). The downmixing metadata provides instructions to a decoder for downmixing an original 5.1 channels of audio content to a fewer number of reproduction channels.
The DIALNORM parameter allows for uniform reproduction of spoken dialog when decoding an AC-3 bitstream, e.g., to maintain a uniform subjective level of spoken dialog in the reproduced sound perceived by a listener. The reproduction system gain becomes a function of both the listener's desired reproduction sound pressure level for dialog, and the DIALNORM value. An AC-3 decoder typically employs the DIALNORM value in the digital domain within the decoder to scale gain, which results in adjustment of the playback gain.
There are several different reasons why the DIALNORM parameter in an AC-3 bitstream may be incorrect. First, each AC-3 encoder has a default DIALNORM value that is used during the generation of the bitstream if a DIALNORM value is not set by the content creator. This default value, commonly chosen as −27 dB, may be substantially different than the actual dialog loudness level of the audio. Second, even if a content creator measures loudness and sets the DIALNORM value accordingly, a loudness measurement algorithm or meter may have been used that does not conform to the recommended AC-3 loudness measurement method, resulting in an incorrect DIALNORM value. Third, even if an AC-3 bitstream has been created with the DIALNORM value measured and set correctly by the content creator, it may have been changed to an incorrect value during transmission and/or storage of the bitstream. For example, it is not uncommon in television broadcast applications for AC-3 bitstreams to be decoded, modified and then re-encoded using incorrect DIALNORM metadata information. Thus, a DIALNORM value included in an AC-3 bitstream may be incorrect or inaccurate and therefore may have a negative impact on the quality of the listening experience.
There is a need for a way to assess the quality of multiple parameters (e.g., the DIALNORM value and at least one other metadata parameter) in an AC-3 bitstream (e.g., to assess whether they have been set correctly, and have not changed during distribution and transmission) and provide output indicative of the metadata quality (e.g., output useful by broadcasters or other users to identify and fix metadata issues in their systems). More generally, there is a need for a way to assess whether multiple metadata parameters in an audio bitstream are correct (e.g., have been set correctly by a content creator or generated correctly during encoding, and have not changed during distribution and transmission) and provide output indicative of the quality of the metadata parameters (e.g., output useful by broadcasters or other users to identify and fix metadata issues in systems which generate or disseminate such a bitstream).