1. Field of the Invention
The invention relates generally to digital communications, and more particularly, to digital coding and decoding of signals, such as speech and/or audio signals.
2. Related Art
In the field of speech coding, predictive coding is a popular technique. Prediction of the input waveform is used to remove redundancy from the waveform, and instead of quantizing the input waveform directly, the waveform of the residual signal is quantized. The predictor(s) can be either backward adaptive or forward adaptive. Backward adaptive predictors do not require any side information as they are derived from the previously quantized waveform, and therefore can be derived at the decoder. On the other hand, forward adaptive predictor(s) require side information to be transmitted to the decoder as they are derived from the input waveform, which is not available at the decoder. In the field of speech coding two types of predictors are commonly used. The first is called the short-term predictor. It is aimed at removing redundancy between nearby samples in the input waveform. This is equivalent to removing the spectral envelope of the input waveform. The second is often referred as the long-term predictor. It removes redundancy between samples further apart, typically spaced by a time difference that is constant for a suitable duration. For speech this time distance is typically equivalent to the local pitch period of the speech signal, and consequently the long-term predictor is often referred as the pitch predictor. The long-term predictor removes the harmonic structure of the input waveform. The residual signal after the removal of redundancy by the predictor(s) is quantized along with any information needed to reconstruct the predictor(s) at the decoder.
In predictive coding, applying forward adaptive prediction, the necessity to communicate predictor information to the decoder calls for efficient and accurate methods to compress, or quantize, the predictor information. Furthermore, it is advantageous if the methods are robust to communication errors, i.e. minimize the impact to the accuracy of the reconstructed predictor if part of the information is lost or received incorrectly.
The spectral envelope of the speech signal can be efficiently represented with a short-term Auto-Regressive (AR) predictor. Human speech commonly has at most 5 formants in the telephony band (narrowband—100 Hz to 3400 Hz). Typically the order of the predictor is constant, and in popular predictive coding using forward adaptive short-term AR prediction, a model order of approximately 10 for an input signal with a bandwidth of approximately 100 Hz to 3400 Hz is a common value. A 10th order AR-predictor provides an all-pole model of the spectral envelope with 10 poles and is capable of representing approximately 5 formants. For wideband signals (50 Hz to 7000 Hz), typically a higher model order is used in order to facilitate an accurate representation of the increased number of formants. The Nth order short-term AR predictor is specified by N prediction coefficients, which provides a complete specification of the predictor. Consequently, these N prediction coefficients need to be communicated to the decoder along with other relevant information in order to reconstruct the speech signal. The N prediction coefficients are often referred as the Linear Predictive Coding (LPC) parameters.
The Line Spectral Pair (LSP) parameters were introduced by F. Itakura, “Line Spectrum Representation of Linear Predictor Coefficients for Speech Signals”, J. Acoust. Soc. Amer., Vol. 57, S35(A), 1975, and is the subject of U.S. Pat. No. 4,393,272 entitled “Sound Synthesizer”. The LSP parameters are derived as the roots of two polynomials, P(z) and Q(z), that are extensions of the z-transform of the AR prediction error filter. The LSP parameters are also referred as the Line Spectral Frequency (LSF) parameters, and have been shown to possess advantageous properties for quantization and interpolation of the spectral envelope in LPC. This has been attributed to their frequency domain interpretation and close relation with the locations of the formants of speech. The LSP, or LSF, parameters provide a unique and equivalent representation of the LPC parameters, and efficient algorithms have been developed to convert between the LPC and LSF parameters, P. Kabal and R. P. Ramachandran, “The Computation of Line Spectral Frequencies Using Chebyshev Polynomials”, IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 34, No. 6, December 1986.
Popular predictive coding techniques often quantize the LSF representation of the LPC parameters in order to take advantage of the quantization and interpolation properties of the LSF parameters. One additional advantageous property of the LSF parameters is the inherent ordering property. It is known that for a stable LPC filter (Nth order all-pole filter) the roots of the two polynomials P(Z) and Q(Z) are interleaved, referred as “in-order”, or “ordered”. Consequently, stability of the LPC filter can be verified by checking if the ordering property of the LSF parameters is fulfilled, that is, if the LSF parameters are in-order, and representations of unstable filters can be rectified. Commonly, the autocorrelation method, see L. R. Rabiner and R. W. Schafer, “Digital Processing of Speech Signals, Prentice Hall, 1978, Chapter 8, Section 8.1.1 and 8.3.2, is used to estimate the LPC parameters. This method provides a stable LPC filter. However, the quantization of the LSF parameters and transmission of the bits representing the LSF parameters may still result in an unstable quantized LPC filter.
A common method to correct unstable LSF parameters due to both quantization and transmission is to simply reorder LSF pairs that are out of order immediately following quantization at the encoder and reconstruction at the decoder (mapping of the received bits to the LSF parameters). It guarantees that the encoder and decoder will observe the identical quantized LSF parameters if a miss-ordering is due to the quantization, i.e. remain synchronized, and it will prevent the decoder from using an unstable LPC filter if a miss-ordering is due to the transmission, i.e. transmission errors. However, such methods are unable to distinguish, at the decoder, miss-ordering due to quantization and miss-ordering due to transmission errors. Therefore, there is a need for quantization techniques that enable the decoder to identify if miss-ordering is due to transmission errors hereby allowing the decoder to take corrective actions. More generally, there is a need for quantization techniques that facilitate some level of transmission error detection capability while maintaining a high intrinsic quality of the quantization. There is a related need for inverse quantization techniques that exploit the transmission error detection capability to conceal the detected transmission errors. Moreover there is a need to achieve the above with a low computational complexity.