In modern audio/speech digital signal communication system, a digital signal is compressed at an encoder, and the compressed information or bitstream can be packetized and sent to a decoder frame by frame through a communication channel. The system of both encoder and decoder together is called codec. Speech/audio compression may be used to reduce the number of bits that represent speech/audio signal thereby reducing the bandwidth and/or bit rate needed for transmission. In general, a higher bit rate will result in higher audio quality, while a lower bit rate will result in lower audio quality.
Audio coding based on filter bank technology is widely used. In signal processing, a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency subband of the original input signal. The process of decomposition performed by the filter bank is called analysis, and the output of filter bank analysis is referred to as a subband signal having as many subbands as there are filters in the filter bank. The reconstruction process is called filter bank synthesis. In digital signal processing, the term filter bank is also commonly applied to a bank of receivers, which also may down-convert the subbands to a low center frequency that can be re-sampled at a reduced rate. The same synthesized result can sometimes be also achieved by undersampling the bandpass subbands. The output of filter bank analysis may be in a form of complex coefficients; each complex coefficient having a real element and imaginary element respectively representing a cosine term and a sine term for each subband of filter bank.
(Filter-Bank Analysis and Filter-Bank Synthesis) is one kind of transformation pair that transforms a time domain signal into frequency domain coefficients and inverse-transforms frequency domain coefficients back into a time domain signal. Other popular transformation pairs, such as (FFT and iFFT), (DFT and iDFT), and (MDCT and iMDCT), may be also used in speech/audio coding.
In the application of filter banks for signal compression, some frequencies are perceptually more important than others. After decomposition, perceptually significant frequencies can be coded with a fine resolution, as small differences at these frequencies are perceptually noticeable to warrant using a coding scheme that preserves these differences. On the other hand, less perceptually significant frequencies are not replicated as precisely, therefore, a coarser coding scheme can be used, even though some of the finer details will be lost in the coding. A typical coarser coding scheme may be based on the concept of Bandwidth Extension (BWE), also known High Band Extension (HBE). One recently popular specific BWE or HBE approach is known as Sub Band Replica (SBR) or Spectral Band Replication (SBR). These techniques are similar in that they encode and decode some frequency sub-bands (usually high bands) with little or no bit rate budget, thereby yielding a significantly lower bit rate than a normal encoding/decoding approach. With the SBR technology, a spectral fine structure in high frequency band is copied from low frequency band, and random noise may be added. Next, a spectral envelope of the high frequency band is shaped by using side information transmitted from the encoder to the decoder. A specific SBR technology with several post-processing modules has recently been employed in the international standard named as MPEG4 USAC wherein MPEG means Moving Picture Experts Group and USAC indicates Unified Speech Audio Coding.
In some applications, post-processing or controlled post-processing at a decoder side is used to further improve the perceptual quality of signals coded by low bit rate coding or SBR coding. Sometimes, several post-processing or controlled post-processing modules are introduced in a SBR decoder.