In modern audio/speech digital signal communication systems, a digital signal is compressed at an encoder, and the compressed information or bitstream can be packetized and sent to a decoder frame by frame through a communication channel. The system of both encoder and decoder together is called codec. Speech/audio compression may be used to reduce the number of bits that represent speech/audio signal thereby reducing the bandwidth and/or bit rate needed for transmission. In general, a higher bit rate will result in higher audio quality, while a lower bit rate will result in lower audio quality.
Audio coding based on filter bank technology is widely used. In signal processing, a filter bank is an array of band-pass filters that separates the input signal into multiple components, each one carrying a single frequency subband of the original input signal. The process of decomposition performed by the filter bank is called analysis, and the output of filter bank analysis is referred to as a subband signal having as many subbands as there are filters in the filter bank. The reconstruction process is called filter bank synthesis. In digital signal processing, the term filter bank is also commonly applied to a bank of receivers, which also may down-convert the subbands to a low center frequency that can be re-sampled at a reduced rate. The same synthesized result can sometimes be also achieved by undersampling the bandpass subbands. The output of filter bank analysis may be in a foam of complex coefficients; each complex coefficient having a real element and imaginary element respectively representing a cosine term and a sine term for each subband of filter bank.
(Filter-Bank Analysis and Filter-Bank Synthesis) is one kind of transformation pair that transforms a time domain signal into frequency domain coefficients and inverse-transforms frequency domain coefficients back into a time domain signal. Other popular transformation pairs, such as (FFT and iFFT), (DFT and iDFT), and (MDCT and iMDCT), may be also used in speech/audio coding.
In the application of filter banks for signal compression, some frequencies are perceptually more important than others. After decomposition, perceptually significant frequencies can be coded with a fine resolution, as small differences at these frequencies are perceptually noticeable to warrant using a coding scheme that preserves these differences. On the other hand, less perceptually significant frequencies are not replicated as precisely; therefore, a coarser coding scheme can be used, even though some of the finer details will be lost in the coding. A typical coarser coding scheme may be based on the concept of Bandwidth Extension (BWE), also known High Band Extension (HBE). One recently popular specific BWE or HBE approach is known as Sub Band Replica (SBR) or Spectral Band Replication (SBR). These techniques are similar in that they encode and decode some frequency sub-bands (usually high bands) with little or no bit rate budget, thereby yielding a significantly lower bit rate than a normal encoding/decoding approach. With the SBR technology, a spectral fine structure in high frequency band is copied from low frequency band, and random noise may be added. Next, a spectral envelope of the high frequency band is shaped by using side information transmitted from the encoder to the decoder. A specific SBR technology with several post-processing modules has recently been employed in the international standard named as MPEG4 USAC wherein MPEG means Moving Picture Experts Group and USAC indicates Unified Speech Audio Coding.
In order to have good sound quality at a low bit rate for speech coding, the speech signal in the low frequency band is often encoded and decoded with a popular technology known as Code-Excited Linear Prediction (CELP) or Algebraic Code-Excited Linear Prediction (ACELP). CELP or ACELP is based on an analysis-by-synthesis approach, which minimizes a weighted error in a closed loop. An analysis-by-synthesis approach is also commonly called a closed loop approach. In the frequency domain, the closed loop approach requires a best match between a coded fine spectrum and an original fine spectrum. On the other hand, in the time domain, the closed loop approach requires a best match between a coded signal waveform and an original signal waveform.
The closed loop approach focuses on coding perceptually more important areas, thereby making the quantization noise less audible and increasing the perceptual quality of a coded speech signal. However, an open-loop approach is often used to code a high band signal. The open-loop approach requires an energy matching between a coded signal and an original signal, which is easier than a fine closed loop matching. Therefore, a lower bit rate than the closed-loop approach may be used. If BWE or SBR is used to code a high band signal, the closed loop approach is not used to determine the best parameters of the BWE or SBR. Rather, the open-loop approach is used to calculate the parameters of the BWE or SBR, since there is no way to perform the closed loop approach for the BWE or SBR. This is because the high band fine spectrum is generated at a decoder and it may not match the original high band fine spectrum in detail. The open-loop approach is, therefore, appropriate for the BWE or SBR as it requires an energy match between the original signal and the coded signal.