1. Field of the Invention
The invention relates to the field of data compression. More specifically, the invention relates to audio compression.
2. Background of the Invention
To allow typical computing systems to process (e.g., store, transmit, etc.) audio signals, various techniques have been developed to reduce (compress) the amount of data representing an audio signal. In typical audio compression systems, the following steps are generally performed: (1) a segment or frame of an audio signal is transformed into a frequency domain; (2) transform coefficients representing (at least a portion of) the frequency domain are quantized into discrete values; and (3) the quantized values are converted (or coded) into a binary format. The encoded/compressed data can be output, stored, transmitted, and/or decoded/decompressed.
To achieve relatively high compression/low bit rates (e.g., 8 to 16 kbps) for various types of audio signals (e.g., speech, music, etc.), some compression techniques (e.g., CELP, ADPCM, etc.) limit the number of components in a segment (or frame) of an audio signal which is to be compressed. Unfortunately, such techniques typically do not take into account relatively substantial components of an audio signal. Thus, such techniques result in a relatively poor quality synthesized (decompressed) audio signal due to loss of information.
One method of audio compression that allows relatively high quality compression/decompression involves transform coding (e.g., discrete cosine transform, Fourier transform, etc.). Transform coding typically involves transforming an input audio signal using a transform method, such as low order discrete cosine transform (DCT). Typically, each transform coefficient of a portion (or frame) of an audio signal is quantized and encoded using any number of well-known coding techniques. Transform compression techniques, such as DCT, generally provide a relatively high quality synthesized signal, since they have a relatively high-energy compaction of spectral components of an input audio signal.
Most audio signal compression algorithms are based on transform coding. Some examples of transform coders include Dolby AC-2, AC-3, MPEG LII and LIII, ATRAC, Sony MiniDisc, and Ogg Vorbis I. These coders employ modified discrete cosine transfer (MDCT) transforms with different frame lengths and overlap factors.
Increasing frame length leads to better frequency resolution. As a result, high compression ratios can be achieved for stationary audio signals by increasing frame length. However, transform frequency coefficient quantization errors are spread over the entire length of a frame. The pursuit of higher compression with larger frame length results in “echo”, which appears when sound attacks present in an audio signal input. This means that frame length, or frequency resolution, should be vary depending on the input audio signals. In particular, the transform length should be shorter during sound attacks and longer for stationary signals. However, a sound attack may only occupy part of an entire signal bandwidth.
Large transform length also leads to large computational complexity. Both the number of computations and the dynamic range of transform coefficients increase if transform length increases, hence higher computational precision is required. Audio data representation and arithmetic operations must be performed with at least 24 bit precision if the frame is greater than or equal to 1024 samples, hence 16-bit digital signal processing cannot be used for encoding/decoding algorithms.
In addition, conventional MDCT provides identical frequency resolution over an entire signal, even though different frequency resolutions are appropriate for different frequency ranges. To accommodate the perceptual ability of the human ear, higher frequency resolution is needed for low-frequency ranges and lower frequency resolution is needed for high-frequency ranges.
Furthermore, the amplitude transfer function of conventional MDCT is not “flat” enough. There are significant irregularities near frequency range boundaries. These irregularities make it difficult to use MDCT coefficients for psycho-acoustic analysis of the audio signal and to compute bit allocation. Conventional audio codes compute auxiliary spectra (typically with FFT, which is computationally expensive) for constructing a psycho-acoustic model (PAM).