Transform coding is a compression technique used in many audio compression systems. Uncompressed digital audio is typically represented as a stream of amplitude samples of an audio signal taken at regular time intervals. For example, a typical format for audio on compact disks consists of a stream of sixteen-bit samples per channel of the audio (e.g., the original analog audio signal from a microphone) captured at a rate of 44.1 KHz. Each sample is a sixteen-bit number representing the amplitude of the audio signal at the time of capture. Other digital audio systems may use various different amplitude and time resolutions of audio sampling.
Uncompressed digital audio can consume considerable storage and transmission capacity. Transform coding reduces the size of digital audio by transforming the time-domain representation of the audio into a frequency-domain (or other like transform domain) representation, and then reducing resolution of certain generally less perceptible frequency components of the frequency-domain representation. This generally produces much less perceptible degradation of the audio signal compared to reducing amplitude or time resolution of audio in the time domain.
More specifically, a typical transform coding technique divides the uncompressed digital audio's stream of time-samples into fixed-size subsets or blocks, each block possibly overlapping with other blocks. A linear transform that does time-frequency analysis is applied to each block, which converts the time interval audio samples within the block to a set of frequency (or transform) coefficients generally representing the strength of the audio signal in corresponding frequency bands over the block interval. For compression, the transform coefficients may be selectively quantized (i.e., reduced in resolution, such as by dropping least significant bits of the coefficient values or otherwise mapping values in a higher resolution number set to a lower resolution), and also entropy or variable-length coded into a compressed audio data stream. At decoding, the transform coefficients will inversely transform to nearly reconstruct the original amplitude/time sampled audio signal.
Many audio compression systems, such as MPEG2 Advanced Audio Coding (AAC) and Windows Media Audio (WMA), utilize the Modulated Lapped Transform (MLT, also known as Modified Discrete Cosine Transform or MDCT) to perform the time-frequency analysis in audio transform coding. MLT reduces blocking artifacts introduced into the reconstructed audio signal by quantization. More particularly, when non-overlapping blocks are independently transform coded, quantization errors will produce discontinuities in the signal at the block boundaries upon reconstruction of the audio signal at the decoder. For audio, a periodic clicking effect is heard.
The MLT reduces the blocking effect by overlapping blocks. In the MLT, a “window” of 2M samples from two consecutive blocks undergoes a cosine transform. Only the first M transform coefficients are returned. The window is then shifted by M samples and the next set of M transform coefficients is computed. Thus, each window overlaps the last M samples of the previous window. The overlap enhances the continuity of the reconstructed samples despite the alterations of transform coefficients due to quantization.
Some audio compression systems vary the size of window over time to accommodate the changing nature of the audio. Audio coders typically partition the input audio signal into fixed-sized “frames,” each of which is a unit of coding (e.g., coding tables and/or parameters may be sent in a header section of each frame). In audio compression systems using time-varying MLT, each frame may contain one or more “windows” of variable size, where each window is a unit of the MLT. In general, larger windows are beneficial to coding efficiency, whereas smaller size windows provide better time resolution. Accordingly, the decisions of where and what windows sizes to employ are critical to compression performance and auditory quality of the encoded signal. The topic of time-varying MLT is discussed, inter alia, by Seymour Shlien, “The Modulated Lapped Transform, Its Time-Varying Forms, And Its Application To Audio Coding Standards,” IEEE Trans. of Speech and Audio Processing, Vol. 5, No. 4, pp. 359-366 (July 1997); Ricardo L. de Queiroz and K. R. Rao, “Time-Varying Lapped Transforms And Wavelet Packets,” IEEE Trans. Signal Processing, vol. 41, pp 3293-3305, 1993; and Cormac Herley, Jelena Kovacevic and Martin Vetterli, “Tilings Of The Time-Frequency Plane: Construction Of Arbitrary Orthogonal Bases And Fast Tiling Algorithms,” IEEE Trans. Signal Processing, vol. 41, pp. 3341-3359, 1993.
One problem in audio coding is commonly referred to as “pre-echo.” Pre-echo occurs when the audio undergoes a sudden change (referred to as a “transient”). In transform coding, particular frequency coefficients commonly are quantized (i.e., reduced in resolution). When the transform coefficients are later inverse-transformed to reproduce the audio signal, this quantization introduces quantization noise that is spread over the entire block in the time domain. This inherently causes rather uniform smearing of noise within the coding frame. The noise, which generally is tolerable for some part of the frame, can be audible and disastrous to auditory quality during portions of the frame where the masking level is low. In practice, this effect shows up most prominently when a signal has a sharp attack immediately following a region of low energy, hence the term “pre-echo.” “Post-echo” that occurs when the signal transition from high to low energy is less of a problem to perceptible auditory quality due to a property of the human auditory system.
One example of an audio compression system that uses a time-varying MLT is MPEG AAC. In MPEG MC, two window sizes of the MLT transform are allowed, long and short. As shown in FIG. 1, the encoder selects between long window and short window modes for each frame. During the switch between modes, a transition window is used. (In FIG. 1, the boundary filter shapes of these transform windows are simplified for illustration purposes only, and not accurate.) In other words, for a particular frame, the encoder encodes the transform coefficients of the MLT transform of one long window, or of eight short windows of identical size. A transition window is used when switching between modes. The mode with small size windows can be chosen to increase time-resolution of the MLT during transients in the audio input.