Many methods exist for objectively measuring the perceived loudness of audio signals. Examples of methods include A, B and C weighted power measures as well as psychoacoustic models of loudness such as “Acoustics—Method for calculating loudness level,” ISO 532 (1975). Weighted power measures operate by taking the input audio signal, applying a known filter that emphasizes more perceptibly sensitive frequencies while deemphasizing less perceptibly sensitive frequencies, and then averaging the power of the filtered signal over a predetermined length of time. Psychoacoustic methods are typically more complex and aim to better model the workings of the human ear. They divide the signal into frequency bands that mimic the frequency response and sensitivity of the ear, and then manipulate and integrate these bands taking into account psychoacoustic phenomenon such as frequency and temporal masking, as well as the non-linear perception of loudness with varying signal intensity. The goal of all methods is to derive a numerical measurement that closely matches the subjective impression of the audio signal.
Many loudness measurement methods, especially the psychoacoustic methods, perform a spectral analysis of the audio signal. That is, the audio signal is converted from a time domain representation to a frequency domain representation. This is commonly and most efficiently performed using the Discrete Fourier Transform (DFT), usually implemented as a Fast Fourier Transform (FFT), whose properties, uses and limitations are well understood. The reverse of the Discrete Fourier Transform is called the Inverse Discrete Fourier Transform (IDFT), usually implemented as an Inverse Fast Fourier Transform (IFFT).
Another time-to-frequency transform, similar to the Fourier Transform, is the Discrete Cosine Transform (DCT), usually used as a Modified Discrete Cosine Transform (MDCT). This transform provides a more compact spectral representation of a signal and is widely used in low-bit rate audio coding or compression systems such as Dolby Digital and MPEG2-AAC, as well as image compression systems such as MPEG2 video and JPEG. In audio compression algorithms, the audio signal is separated into overlapping temporal segments and the MDCT transform of each segment is quantized and packed into a bitstream during encoding. During decoding, the segments are each unpacked, and passed through an inverse MDCT (IMDCT) transform to recreate the time domain signal. Similarly, in image compression algorithms, an image is separated into spatial segments and, for each segment, the quantized DCT is packed into a bitstream.
Properties of the MDCT (and similarly the DCT) lead to difficulties when using this transform when performing spectral analysis and modification. First, unlike the DFT that contains both sine and cosine quadrature components, the MDCT contains only the cosine component. When successive and overlapping MDCT's are used to analyze a substantially steady state signal, successive MDCT values fluctuate and thus do not accurately represent the steady state nature of the signal. Second, the MDCT contains temporal aliasing that does not completely cancel if successive MDCT spectral values are substantially modified. More details are provided in the following section.
Because of difficulties processing MDCT domain signals directly, the MDCT signal is typically converted back to the time domain where processing can be performed using FFT's and IFFT's or by direct time domain methods. In the case of frequency domain processing, additional forward and inverse FFTs impose a significant increase in computational complexity and it would be beneficial to dispense with these computations and process the MDCT spectrum directly. For example, when decoding an MDCT-based audio signal such as Dolby Digital, it would be beneficial to perform loudness measurement and spectral modification to adjust the loudness directly on the MDCT spectral values, prior to the inverse MDCT and without requiring the need for FFT's and IFFT's.
Many useful objective measurements of loudness may be computed from the power spectrum of a signal, which is easily estimated from the DFT. It will be demonstrated that a suitable estimate of the power spectrum may also be computed from the MDCT. The accuracy of the estimate generated from the MDCT is a function of the smoothing time constant utilized, and it will be shown that the use of smoothing time constants commensurate with the integration time of human loudness perception produces an estimate that is sufficiently accurate for most loudness measurement applications. In addition to measurement, one may wish to modify the loudness of an audio signal by applying a filter in the MDCT domain. In general, such filtering introduces artifacts to the processed audio, but it will be shown that if the filter varies smoothly across frequency, then the artifacts become perceptually negligible. The types of filtering associated with the proposed loudness modification are constrained to be smooth across frequency and may therefore be applied in the MDCT domain.