Details of Dolby Digital coding are set forth in the following references:
ATSC Standard A52/A: Digital Audio Compression Standard (AC-3), Revision A, Advanced Television Systems Committee, 20 Aug. 2001. The A/52A document is available on the World Wide Web at http://www.atsc.org/standards.html.
“Flexible Perceptual Coding for Audio Transmission and Storage,” by Craig C. Todd, et al, 96th Convention of the Audio Engineering Society, Feb. 26, 1994, Preprint 3796;
“Design and Implementation of AC-3 Coders,” by Steve Vernon, IEEE Trans. Consumer Electronics, Vol. 41, No. 3, August 1995.
“The AC-3 Multichannel Coder” by Mark Davis, Audio Engineering Society Preprint 3774, 95th AES Convention, October, 1993.
“High Quality, Low-Rate Audio Transform Coding for Transmission and Multimedia Applications,” by Bosi et al, Audio Engineering Society Preprint 3365, 93rd AES Convention, October, 1992.
U.S. Pat. Nos. 5,583,962; 5,632,005; 5,633,981; 5,727,119; 5,909,664; and 6,021,386.
Details of Dolby Digital Plus coding are set forth in “Introduction to Dolby Digital Plus, an Enhancement to the Dolby Digital Coding System,”AES Convention Paper 6196, 117th AES Convention, Oct. 28, 2004.
Details of Dolby E coding are set forth in “Efficient Bit Allocation, Quantization, and Coding in an Audio Distribution System”, AES Preprint 5068, 107th AES Conference, August 1999 and “Professional Audio Coder Optimized for Use with Video”, AES Preprint 5033, 107th AES Conference August 1999.
An overview of various perceptual coders, including Dolby encoders, MPEG encoders, and others is set forth in “Overview of MPEG Audio: Current and Future Standards for Low-Bit-Rate Audio Coding,” by Karlheinz Brandenburg and Marina Bosi, J. Audio Eng. Soc., Vol. 45, No. 1/2, January/February 1997.
All of the above-cited references are hereby incorporated by reference, each in its entirety.
Many methods exist for objectively measuring the perceived loudness of audio signals. Examples of methods include weighted power measures (such as LeqA, LeqB, LeqC) as well as psychoacoustic-based measures of loudness such as “Acoustics—Method for Calculating Loudness Level,”ISO 532 (1975). Weighted power loudness measures process the input audio signal by applying a predetermined filter that emphasizes more perceptibly sensitive frequencies while deemphasizing less perceptibly sensitive frequencies, and then averaging the power of the filtered signal over a predetermined length of time. Psychoacoustic methods are typically more complex and aim to model better the workings of the human ear. This is achieved by dividing the audio signal into frequency bands that mimic the frequency response and sensitivity of the ear, and then manipulating and integrating these bands while taking into account psychoacoustic phenomenon such as frequency and temporal masking, as well as the non-linear perception of loudness with varying signal intensity. The aim of all objective loudness measurement methods is to derive a numerical measurement of loudness that closely matches the subjective perception of loudness of an audio signal.
Perceptual coding or low-bitrate audio coding is commonly used to data compress audio signals for efficient storage, transmission and delivery in applications such as broadcast digital television and the online Internet sale of music. Perceptual coding achieves its efficiency by transforming the audio signal into an information space where both redundancies and signal components that are psychoacoustically masked can be easily discarded. The remaining information is packed into a stream or file of digital information. Typically, measuring the loudness of the audio represented by low-bitrate coded audio requires decoding the audio back into the time domain (e.g., PCM), which can be computationally intensive. However, some low-bitrate perceptual-coded signals contain information that may be useful to a loudness measurement method, thereby saving the computational cost of fully decoding the audio. Dolby Digital (AC-3), Dolby Digital Plus, and Dolby E are among such audio coding systems.
The Dolby Digital, Dolby Digital Plus, and Dolby E low-bitrate perceptual audio coders divide audio signals into overlapping, windowed time segments (or audio coding blocks) that are transformed into a frequency domain representation. The frequency domain representation of spectral coefficients is expressed by an exponential notation comprising sets of an exponent and associated mantissas. The exponents, which function in the manner of scale factors, are packed into the coded audio stream. The mantissas represent the spectral coefficients after they have been normalized by the exponents. The exponents are then passed through a perceptual model of hearing and used to quantize and pack the mantissas into the coded audio stream. Upon decoding, the exponents are unpacked from the coded audio stream and then passed through the same perceptual model to determine how to unpack the mantissas. The mantissas are then unpacked, combined with the exponents to create a frequency domain representation of the audio that is then decoded and converted back to a time domain representation.
Because many loudness measurements include power and power spectrum calculations, computational savings may be achieved by only partially decoding the low-bitrate coded audio and passing the partially decoded information (such as the power spectrum) to the loudness measurement. The invention is useful whenever there is a need to measure loudness but not to decode the audio. It exploits the fact that a loudness measurement can make use of an approximate version of the audio, such approximation not usually being suitable for listening. An aspect of the present invention is the recognition that a coarse representation of the audio, which is available without fully decoding a bitstream in many audio coding systems, can provide an approximation of the audio spectrum that is usable in measuring the loudness of the audio. In Dolby Digital, Dolby Digital Plus, and Dolby E audio coding, exponents provide an approximation of the power spectrum of the audio. Similarly, in certain other coding systems, scale factors, spectral envelopes, and linear predictive coefficients may provide an approximation of the power spectrum of the audio. These and other aspects and advantages of the invention will be better understood as the following summary and description of the invention are read and understood.
The invention provides a computationally economical measurement of the perceived loudness of low-bitrate coded audio. This is achieved by only partially decoding the audio material and by passing the partially decoded information to a loudness measurement. The method takes advantage of specific properties of the partially decoded audio information such as the exponents in Dolby Digital, Dolby Digital Plus, and Dolby E audio coding.
A first aspect of the invention measures the loudness of audio encoded in a bitstream that includes data from which an approximation of the power spectrum of the audio can be derived without fully decoding the audio by deriving the approximation of the power spectrum of the audio from the bitstream without fully decoding the audio, and determining an approximate loudness of the audio in response to the approximation of the power spectrum of the audio.
In another aspect of the invention, the data may include coarse representations of the audio and associated finer representations of the audio, in which case the approximation of the power spectrum of the audio may be derived from the coarse representations of the audio.
In a further aspect of the invention, the audio encoded in a bitstream may be subband encoded audio having a plurality of frequency subbands, each subband having a scale factor and sample data associated therewith, and in which the coarse representations of the audio comprise scale factors and the associated finer representations of the audio comprise sample data associated with each scale factor.
In yet a further aspect of the invention, the scale factor and sample data of each subband may represent spectral coefficients in the subband by exponential notation in which the scale factor comprises an exponent and the associated sample data comprises mantissas.
In yet a further aspect of the invention, the audio encoded in a bitstream may be linear predictive coded audio in which the coarse representations of the audio comprise linear predictive coefficients and the finer representations of the audio comprise excitation information associated with the linear predictive coefficients.
In still a further aspect of the invention, the coarse representations of the audio may comprise at least one spectral envelope and the finer representations of the audio may comprise spectral components associated with the at least one spectral envelope.
In still yet a further aspect of the invention, determining an approximate loudness of the audio in response to the approximation of the power spectrum of the audio may include applying a weighted power loudness measure. The weighted power loudness measure may employ a filter that deemphasizes less perceptible frequencies and averages the power of the filtered audio over time.
In yet another aspect of the invention, determining an approximate loudness of the audio in response to the approximation of the power spectrum of the audio may include applying a psychoacoustic loudness measure. The psychoacoustic loudness measure may employ a model of the human ear to determine specific loudness in each of a plurality of frequency bands similar to the critical bands of the human ear. In a subband coder environment, the subbands may be similar to the critical bands of the human ear and the psychoacoustic loudness measure may employ a model of the human ear to determine specific loudness in each of the subbands.
Aspects of the invention include methods practicing the above functions, means practicing the functions, apparatus practicing the methods, and a computer program, stored on a computer-readable medium for causing a computer to perform the methods practicing the above functions.