In audio signals and in particular in speech signals, there is a high level of correlation between adjacent samples. In order to perform an efficient quantization and encoding of speech signals, such redundancy can be removed prior to encoding.
Speech signals can be efficiently modeled with two slowly time-varying linear prediction filters that model the spectral envelope and the spectral fine structure respectively. The shape of the vocal tract mainly determines the short-time spectral envelope, while the spectral fine structure is mainly due to the periodic vibrations of the vocal cord.
In prior art redundancy in audio signals are often modeled using linear models. A well-known technique for removal of redundancy is through the use of prediction and in particular linear prediction. An original present audio signal sample is predicted from previous audio signal samples, either original ones or predicted ones. A residual is defined as the difference between the original audio signal sample and the predicted audio signal sample. A quantizer searches for a best representation of the residual, e.g. an index pointing to an internal codebook. The representation of the residual and parameters of the linear prediction filter are provided as representations of the original present audio signal sample. In the decoder, the representation can be then used for recreating a received version of the present audio signal sample.
Linear prediction is often used for short-term correlations. In theory, the LP filter could be used at any order. However, usage of large order linear prediction is strongly inadvisable due to numerical stability problems of the Levinson-Durbin algorithm as well as the resulting amount of complexity in terms of memory storage and arithmetical operations. Moreover, the required bit-rate for encoding the LP coefficients prohibits such use. The order of the LP predictors used in practice does not, in general, exceed 20 coefficients. For instance, a standard for wideband speech coding AMR-WB has an LPC filter of order 16.
In order to further reduce the required amount of bit-rate while maintaining the quality, there is a need to properly exploit the periodicity of speech signals in voiced speech segments. To this end, and because linear prediction would in general exploit correlations which are contained in less than a pitch cycle, a pitch predictor is often used on the linear prediction residual. Long-term dependencies in audio signals can thereby be exploited.
Although currently standardized speech codecs deliver an acceptable quality at very low bit-rates, it is believed that the quality may be further enhanced at the cost of very few extra bits. One minor problem with prior-art speech and audio coding algorithms is that the prior art model for speech or audio signals, although being very efficient, does not take into account all the possible redundancies that are present in audio signals. In general audio coding, and in particular in speech coding, there is always a need to lower the needed bit-rate at a given quality or to get a better quality at a given bit-rate.
Furthermore, embedded or layered approaches are today often requested in order to adapt the relation between quality and bit-rate. However, at a given bit-rate, and for a given coding structure, an embedded or layered speech coder will often show a loss in quality when compared to a non-layered coder. In order to experience the same quality with the same coding structure it is often required that the bit-rate is increased.