With the introduction of portable digital media players, the compact disk for music storage and audio delivery over the Internet, it is now common to store, buy and distribute music and other audio content in digital audio formats. The digital audio formats empower people to enjoy having hundreds or thousands of music songs available on their personal computers (PCs) or portable media players.
Perceptual Transform Coding
The coding of audio utilizes coding techniques that exploit various perceptual models of human hearing. For example, many weaker tones near strong ones are masked so they do not need to be coded. In traditional perceptual audio coding, this is exploited as adaptive quantization of different frequency data. Perceptually important frequency data are allocated more bits and thus finer quantization and vice versa.
For example, transform coding is conventionally known as an efficient scheme for the compression of audio signals. The input audio is digitally time sampled. In transform coding, a block of the input audio samples is transformed (e.g., via the Modified Discrete Cosine Transform or MDCT, which is the most widely used), processed, and quantized. The quantization of the transformed coefficients is performed based on the perceptual importance (e.g. masking effects and frequency sensitivity of human hearing), such as via a scalar quantizer.
When a scalar quantizer is used, the importance is mapped to relative weighting, and the quantizer resolution (step size) for each coefficient is derived from its weight and the global resolution. The global resolution can be determined from target quality, bit rate, etc. For a given step size, each coefficient is quantized into a level which is zero or non-zero integer value.
At lower bitrates, there are typically a lot more zero level coefficients than non-zero level coefficients. They can be coded with great efficiency using run-length coding, which may be combined with an entropy coding scheme such as Huffman coding.
Overlapping Transforms and Variable Window Frame Sizes
Many audio compression systems utilize the Modulated Lapped Transform (MLT, also known as Modified Discrete Cosine Transform or MDCT) to perform the time-frequency analysis in audio transform coding. MLT reduces blocking artifacts introduced into the reconstructed audio signal by quantization. More particularly, when non-overlapping blocks are independently transform coded, quantization errors will produce discontinuities in the signal at the block boundaries upon reconstruction of the audio signal at the decoder. For audio, a periodic clicking effect is heard.
The MLT reduces the blocking effect by overlapping blocks. In the MLT, a “window” of 2M samples from two consecutive blocks undergoes a modulated cosine transform. M transform coefficients are returned. The window is then shifted by M samples and the next set of M transform coefficients is computed. Thus, each window overlaps the last M samples of the previous window. The overlap enhances the continuity of the reconstructed samples despite the alterations of transform coefficients due to quantization.
Some audio compression systems vary the size of window over time to accommodate the changing nature of the audio. Audio coders typically partition the input audio signal into fixed-sized “frames,” each of which is a unit of coding (e.g., coding tables and/or parameters may be sent in a header section of each frame). In audio compression systems using time-varying MLT, each frame may contain one or more “windows” of variable size, where each window is a unit of the MLT. In general, larger windows are beneficial to coding efficiency, whereas smaller size windows provide better time resolution. Accordingly, the decisions of where and what windows sizes to employ are critical to compression performance and auditory quality of the encoded signal.
One problem in audio coding is commonly referred to as “pre-echo.” Pre-echo occurs when the audio undergoes a sudden change (referred to as a “transient”). In transform coding, particular frequency coefficients commonly are quantized (i.e., reduced in resolution). When the transform coefficients are later inverse-transformed to reproduce the audio signal, this quantization introduces quantization noise that is spread over the entire block in the time domain. This inherently causes rather uniform smearing of noise within the coding frame. The noise, which generally is tolerable for some part of the frame, can be audible and disastrous to auditory quality during portions of the frame where the masking level is low. In practice, this effect shows up most prominently when a signal has a sharp attack immediately following a region of low energy, hence the term “pre-echo.” “Post-echo” that occurs when the signal transition from high to low energy is less of a problem to perceptible auditory quality due to a property of the human auditory system.
Overlapped Transforms And Lossless Coding
Overlapping transforms also can be used for lossless coding. Many lossless coding techniques operate on audio signal data in the time domain. However, lossless coding also can be performed in the frequency domain, by simply performing entropy coding or other lossless coding of the transform coefficients resulting after application of the lapped transform without quantization. Such frequency domain lossless coding enables lossy and lossless compression versions of an audio signal to be more readily derived together. But, frequency domain lossless compression requires the transform to be reversible. Further for consistent computations as is needed to make the transform reversible, the transform should have an integer implementation.
Reversible integer-integer transforms pose a difficulty in that they require implementations using square transform matrices with a determinant of 1, which is not compatible for overlapping transform designs whose implementation uses rectangular sub-blocks in the transform matrix. Previously known reversible overlapping transforms also have typically required the same subframe configuration for all subframes of the audio signal, which is not compatible with audio codecs that employ variable subframe sizes, such as to reduce pre-echo as discussed above.
In previous reversible transform implementations, transforms such as MDCT and MLT are treated as 2N×N transforms, where the 2N×N transform is made reversible. While this procedure is fine when all subframes are of the same size, they do not work particularly well when subframe sizes vary. In addition, overlapped orthogonal transforms have an implementation which is hard to understand and modify.