Portable handheld devices, e.g. PDAs, smart phones, mobile phones, and portable media players, typically comprise audio and/or video rendering capabilities and have become important entertainment platforms. This development is pushed forward by the growing penetration of wireless or wireline transmission capabilities into such devices. Due to the support of media transmission and/or storage protocols, such as the High-Efficiency Advanced Audio Coding (HE-AAC) format, media content can be continuously downloaded and stored onto the portable handheld devices, thereby providing a virtually unlimited amount of media content.
HE-AAC is a lossy data compression scheme for digital audio defined as MPEG-4 Audio profile in ISO/IEC 14496-3. It is an extension of Low Complexity AAC (AAC LC) optimized for low-bitrate applications such as streaming audio. HE-AAC version 1 profile (HE-AAC v1) uses spectral band replication (SBR) to enhance the compression efficiency in the frequency domain. HE-AAC version 2 profile (HE-AAC v2) couples SBR with Parametric Stereo (PS) to enhance the compression efficiency of stereo signals. It is a standardized and improved version of the AACplus codec.
With the introduction of digital broadcast, the concept of time-varying-metadata which enables to control gain values at the receiving end in order to tailor content to a specific listening environment was established. An example is the metadata included in Dolby Digital which includes general loudness normalization information (“dialnorm”) for dialogues. It should be noted that throughout this specification and in the claims, references to Dolby Digital shall be understood to encompass both the Dolby Digital and Dolby Digital Plus coding systems.
One possibility to assure consistency of loudness levels across different content types and media formats is loudness normalization. A prerequisite for loudness normalization is the estimation of the signal loudness. One approach to loudness estimation has been proposed in the ITU-R BS.1770-1 recommendation.
The ITU-R BS.1770-1 recommendation is an approach to measure the loudness of a digital audio file, while taking a psychoacoustic model of the human hearing into account. It proposes to preprocess the audio signal of each channel with a filter for modeling head effects and a high-pass filter. Then, the power of the filtered signal is estimated over the measurement interval. For multichannel audio signals the loudness is calculated as the logarithm of the weighted sum of the estimated power values of all channels.
One drawback of the ITU-R BS.1770-1 recommendation is that all signal types are handled equally. A long period of silence would lower the loudness result; however this silence may not affect the subjective loudness impressions. An example for such a pause could be the silence between two songs.
A simple, yet effective method to work around this problem is to only take, subjectively significant, parts of the signal into account. This method is called gating. The significance of signal parts may be determined based on a minimum energy, a loudness level threshold or other criteria. Examples for different gating methods are silence gating, adaptive threshold gating, and speech gating.
For gating, a Discrete Fourier Transform (DFT) and other operations on the audio signal are typically performed. However, this causes additional processing effort which is undesirable. Furthermore, the classification of audio signals into different classes for gating the loudness calculation is typically imperfect, thus resulting in misclassifications impacting the loudness calculation.
Accordingly, there is a need for improved audio classification to enhance gating and loudness calculation. Furthermore, it is desired to reduce the computational effort in gating.