1. Field of the Invention
The invention relates to the field of signal processing. More specifically, the invention relates to the field of audio data compression and decompression utilizing subband decomposition (audio is used herein to refer to one or more types of sound such as speech, music, etc.).
2. Background Information
To allow typical signal/data processing devices to process (e.g., store, transmit, etc.) audio signals efficiently, various techniques have been developed to reduce or compress the amount of data required to represent an audio signal. In applications wherein real-time processing is desirable (e.g., telephone conferencing over a computer network, digital (wireless) communications, multimedia over a communications medium, etc.), such compression techniques may be an important consideration, given limited processing bandwidth and storage resources.
In typical audio compression systems, the following steps are generally performed: (1) a segment or frame of an audio signal is transformed into a frequency domain; (2) the transform coefficients representing the frequency domain, or a portion thereof, are quantized into discrete values; and (3) the quantized values are converted (or coded) into a binary format. The encoded/compressed data can be output, stored, transmitted, and/or decoded/decompressed.
To achieve relatively high compression/low bit rates (e.g., 8 to 16 kbps) for various types of audio signals some compression techniques (e.g., CELP. ADPCM, etc.) limit the number of components in a segment (or frame) of an audio signal which is to be compressed. Unfortunately, such techniques typically do not take into account relatively substantial components of an audio signal. Thus, such techniques typically result in a relatively poor quality synthesized audio signal due to the loss of information.
One method of audio compression that allows relatively high quality compression/decompression involves transform coding. Transform coding typically involves transforming a frame of an input audio signal into a set of transform coefficients, using a transform, such discrete cosine transform (DCT), modified discrete cosine transform (MDCT), Fourier and Fast Fourier Transform (FFT). etc. Next, a subset of the set of transform coefficients, which typically represents most of the energy of the input audio signal (e.g., over 90%), is quantized and encoded using any number of well-known coding techniques. Transform compression techniques, such as DCT, generally provide a relatively high quality synthesized signal, since a relatively high number of spectral components of an input audio signal are taken into consideration.
Past transform audio compression techniques may have some limitations. First, transform techniques typically perform a relatively large amount of computation, and may also use relatively high bit rates (e.g., 32 kbps), which may adversely affect compression ratios. Second, while the selected subset of coefficients may accumulatively contain approximately 90% of the energy of an input audio signal, the discarded coefficients may be needed for relatively high quality reproduction. However, a substantial amount of bits may be required to transform encode all of the coefficients representing a frame of the input audio signal. Finally, an audible "echo" or other type of distortion may result in an audio signal that is synthesized from transform coding techniques. One cause of echo is the limitations of transform coding techniques to approximate satisfactorily a fast-varying signal (e.g., a drum "attack"). As a result, quantization error for one or a few transform coefficients may spread over and adversely affect an entire frame, or portion thereof, of a transform encoded audio signal.
To illustrate distortion, such as echo, in a transform encoded synthesized signal, reference is made to FIGS. 1A and 1B. FIG. 1A a graphical representation of a frame of an input (i.e., original/unprocessed) audio signal. FIG. 1B depicts a synthesized signal that generated by transform encoding and synthesizing the input signal of FIG. 1A. In FIGS. 1A and 1B, the horizontal (x) axis represents time, while the vertical (y) axis represents amplitude. As shown, the synthesized signal contains relatively substantial distortion (e.g., echo) from the time period 0 to 175 (sometimes referred to as pre-echo, since the distortion precedes the signal (or harmonic) "attack" at time=.about.175) and 375 to 475 (sometimes referred to as post-echo, since the distortion follows the signal "attack" at time=.about.175), relative to the corresponding input signal of FIG. 1A.
While some past systems, such as ISO/MPEG audio codes, have employed techniques to diminish distortion due to transform coding, such as pre-echo, such techniques typically rely on an increased number of bits to encode the input signal. As such, compression ratios may be diminished as a result of past distortion reduction techniques.
Thus, what is desired is a system that achieves relatively high quality audio data compression, while achieving relatively low bit rates (e.g., high compression ratios). It is further desirable to detect and reduce distortion (e.g., noise, echo, etc.) that may result, for example, by generating a transform encoded synthesized signal, while providing a relatively low bit rate.