1. Technical Field
The invention is related to audio compression, and in particular, to a system and method that provides transform domain compression of audio signals using an integer-reversible modulated lapped transform (MLT) to transform audio signals into the transform domain in combination with a backwards-adaptive entropy coder to compress the resulting transform coefficients of the audio signal to produce a compressed bitstream.
2. Related Art
Personal digital music libraries are becoming larger as the popularity of portable media players continues to grow. However, the audio files in such libraries are often compressed to limit storage requirements. For example, a typical 4-minute stereo music track, when stored in a raw CD format, requires around 42 MBytes of storage space. As such, a 5,000 track library (averaging 4 minutes per song) requires over 200 GBytes to store the uncompressed audio. Consequently, such audio libraries are typically compressed using lossless and/or lossy encoders to limit overall storage requirements. Further, when transferring music files to a portable digital music player or the like, those music files are often transcoded from a lossless mode to a lossy mode due to storage limitations on the portable device.
There are a large number of well known audio compression techniques. Many of these techniques are based on the use of forward-adaptive prediction followed by forward-adaptive entropy coding wherein the prediction and encoding parameters are pre-computed and then applied to an entire block of signal samples. For example, one such technique operates by decomposing the audio into short blocks (typically with 256 samples), then applying linear prediction (LP) or a low-order polynomial predictor to the blocks. The prediction residuals are encoded then using the well known Golomb-Rice (GR) encoder to produce a compressed bitstream. To allow decoding of the compressed bitstream, each block in the compressed bitstream includes a header area that stores an index to the kind of prediction used, the values of the prediction coefficients, and the value of the GR parameter, followed by the encoded residuals. In a related implementation, a “near-lossless” mode is enabled by right-shifting the samples in each block by n bits, where n is adaptively changed from block-to block, to maintain a specified signal-to-noise ratio per block.
Unfortunately, there are significant disadvantages to using predictive coding for audio compression. For example, in many audio segments there are periodic tones which cannot be efficiently predicted by low-order predictors. The use of very high order predictors is not a feasible solution, since in short audio frames there is typically not enough data for reliable convergence of algorithms for finding optimal prediction coefficients. Similarly, the use of pitch predictors (as in speech coders) does not work well with music since there are frequently several simultaneous tones. In addition, with lossy compression, most conventional lossy compression techniques use a transform front-end. Consequently, the only way to transcode an encoded audio signal (encoded using predictive coding) from a lossless into a lossy format requires full decoding of the lossless samples followed by a full re-encoding of the audio signal using transform-based lossy encoding.
Frequency-domain coding using fast transforms has been used to address some of the disadvantages of using predictive coding to compress audio signals. For example, if an audio frame has dominant tones, than most of the energy in the frequency domain is concentrated in a few transform coefficients, allowing for efficient compression. Further, if the same transform that is used for lossy coding is also used for lossless coding, fast transcoding can be achieving by simply decoding the transform coefficients and then re-encoding those coefficients using a lossy coder without ever needing to fully decode into the time domain signal. Consequently, the use of frequency-domain coding (also referred to as “transform coding”) allows codecs to transcode compressed audio signals from lossless to lossy modes entirely in the frequency domain, without requiring any transform computations for the transcoding operations.
A number of conventional lossless transform coding techniques, while working reasonably well for transcoding operations, fail to provide good compression characteristics. Specifically, with lossless compression using transform coding, the transforms must be exactly reversible in integer arithmetic. Some well known direct approaches for integer transforms have applied a lifting-based integer-invertible (or integer-reversible) technique that works well for short-length transforms such as those used in image compression, but for larger transform lengths such as those used for audio compression (e.g., 256 to 4096 samples), the accumulation of rounding errors leads to a significant drop in lossless compression, or excessive noise in lossy compression.
Some of these problems have been addressed using “matrix lifting” techniques which allow the computation of an integer-reversible modulated lapped transform (MLT), also known as a modified discrete cosine transform (MDCT). Even for large block sizes, these matrix lifting-based techniques are capable of computing integer MLTs whose coefficient values are generally within a relatively small error range relative to corresponding real-valued MLT coefficients. As a result, both compression performance for lossless compression and reduction of noise in lossy compression is improved.
Unfortunately, as is known to those skilled in the art, typical matrix lifting-based transform coding techniques require coding parameters to be computed or estimated from the input data and added to the compressed bitstream as side information. As a result, additional computation is required, resulting in increased computational overhead. Further, compression performance is reduced by the necessity to add that side information to the bitstream.