In prior art audio compression schemes, such as perceptual audio coding (PAC), audio is typically coded as the output of a filterbank. The filterbank provides a frequency or a time-frequency representation of the signal. Additionally, the filterbank outputs are quantized using a quantization function based on a psychoacoustic model, wherein the psychoacoustic model accounts for the non-linear frequency sensitivity of the human ear (destination) by using a non-linear frequency resolution (bark scale) in the quantizer. However, often there are non-linearities involved at the signal production stage (i.e., in the source), which result in interdependencies between the low and high frequency components of a signal. The linear filterbanks employed in PAC or similar codecs (e.g., modified cosine discrete transform (MDCT) and/or wavelets) are not capable of taking advantage of such redundancies in the signal which arise due to non-linearities at the signal production stage.
Furthermore, though the linear filterbank used in PAC or similar codecs (i.e., wavelet/MDCT) does a good job of de-correlating the signal in time domain, however, significant correlation often exists in the frequency domain representation of the signal. This correlation may be both short term (i.e., between samples located in adjacent frequency bins) and long term (i.e., between frequency bins which are far apart in frequency). This is particularly true for musical instruments and voiced speech which have a clearly defined harmonic structure. Thus, conventional audio coding schemes make little, if any, effort of taking advantage of this correlation.
Furthermore, in prior art PAC systems, several features, such as Huffman scale factor quantization or multidimensional peaks, had to be permanently selected or deselected prior to the system being deployed in the field. Additionally, the present invention's enhanced PAC algorithm incorporates techniques for efficient coding of higher frequency components in the signal. These techniques are often suitable for only a segment of higher frequencies. Furthermore, separate systems that incorporated PAC with differing pre-selected feature sets were not functionally interoperable.
High quality speech is produced via various coding techniques, one of which is code-excited linear prediction or CELP. The CELP coder is a model wherein the vocal tract and excitation is modeled via short-term synthesis filters, and the glottal excitation is modeled via long-term synthesis filters. Thus, the CELP encoder synthesizes speech via these short-term and long-term synthesis filters in a feedback loop.
A basic CELP coder is illustrated in FIG. 1. The long-term predictor is referred to as the pitch predictor, as its exploits the pitch periodicity in a speech signal. In prior art systems, a pitch predictor such as a one-tap pitch predictor is used, wherein the predictor transfer function (in the case of a one tap pitch predictor) is given by:P1(Z)=ΣβZZpwhere p is the pitch period, and β is the predictor tap.
On the other hand, the short-term predictor (often referred to as linear prediction coding (LPC) predictor) is an nth order predictor with a transfer function of:
            P      2        ⁡          (      Z      )        =            ∑                          n        ⁢                  β        z            ⁢              a        i            ⁢              Z                  -          i                    wherein a1 though an are the predictor coefficients.
As illustrated in FIG. 1, the encoder first buffers the input signal 102 via a frame buffer 104, and long-tern predictor 106 and short-term predictor 108 perform linear predictive analysis and the resulting predictor parameters are quantized and encoded resulting in the output signal 112. It should be noted that the pitch predictor parameters are determined either via closed-loop or open-loop fashion.