1. Field of the Invention
This invention relates generally to digital communications, and more particularly, to digital coding (or compression) of speech and/or audio signals.
2. Related Art
In speech or audio coding, the coder encodes the input speech or audio signal into a digital bit stream for transmission or storage, and the decoder decodes the bit stream into an output speech or audio signal. The combination of the coder and the decoder is called a codec.
In the field of speech coding, predictive coding is a very popular technique. Prediction of the input waveform is used to remove redundancy from the waveform, and instead of quantizing an input speech waveform directly, a residual signal waveform is quantized. The predictor(s) used in predictive coding can be either backward adaptive or forward adaptive predictors. Backward adaptive predictors do not require any side information as they are derived from a previously quantized waveform, and therefore can be derived at a decoder. On the other hand, forward adaptive predictor(s) require side information to be transmitted to the decoder as they are derived from the input waveform, which is not available at the decoder.
In the field of speech coding, two types of predictors are commonly used. A first type of predictor is called a short-term predictor. It is aimed at removing redundancy between nearby samples in the input waveform. This is equivalent to removing a spectral envelope of the input waveform. A second type of predictor is often referred as a long-term predictor. It removes redundancy between samples further apart, typically spaced by a time difference that is constant for a suitable duration. For speech, this time difference is typically equivalent to a local pitch period of the speech signal, and consequently the long-term predictor is often referred as a pitch predictor. The long-term predictor removes a harmonic structure of the input waveform. A residual signal remaining after the removal of redundancy by the predictor(s) is quantized along with any information needed to reconstruct the predictor(s) at the decoder.
This quantization of the residual signal provides a series of bits representing a compressed version of the residual signal. This compressed version of the residual signal is often denoted the excitation signal and is used to reconstruct an approximation of the input waveform at the decoder in combination with the predictor(s). Generating the series of bits representing the excitation signal is commonly denoted excitation quantization and generally requires the search for, and selection of, a best or preferred candidate excitation among a set of candidate excitations with respect to some cost function. The search and selection require a number of mathematical operations to be performed, which translates into a certain computational complexity when the operations are implemented on a signal processing device. It is advantageous to minimize the number of mathematical operations in order to minimize a power consumption, and maximize a processing bandwidth, of the signal processing device.
Excitation quantization in predictive coding can be based on a sample-by-sample quantization of the excitation. This is referred to as Scalar Quantization (SQ). Techniques for performing Scalar Quantization of the excitation are relatively simple, and thus, the computational complexity associated with SQ is relatively manageable.
Alternatively, the excitation can be quantized based on groups of samples. Quantizing groups of samples is often referred to as Vector Quantization (VQ), and when applied to the excitation, simply as excitation VQ. The use of VQ can provide superior performance to SQ, and may be necessary when the number of coding bits per residual signal sample becomes small (typically less than two bits per sample). Also, VQ can provide a greater flexibility in bit-allocation as compared to SQ, since a fractional number of bits per sample can be used. However, excitation VQ can be relatively complex when compared to excitation SQ. Therefore, there is need to reduce the complexity of excitation VQ as used in a predictive coding environment.
One type of predictive coding is Noise Feedback Coding (NFC), wherein noise feedback filtering is used to shape coding noise, in order to improve a perceptual quality of quantized speech. Therefore, it would be advantageous to use excitation VQ with noise feedback coding, and further, to do so in a computationally efficient manner.
Summary
The present invention includes efficient methods related to excitation quantization in noise feedback coding, for example, in NFC systems, where the short-term shaping of the coding noise is generalized. The methods are described primarily in Section IX.D and in connection with FIGS. 21-31. The methods are based in part on separating an NFC quantization error signal into ZERO-STATE and ZERO-INPUT response contributions. The methods accommodate general shaping of the coding noise while providing an efficient excitation quantization. The present invention provides an efficient method of producing a ZERO-STATE response with the generalized noise shaping.
In an embodiment, the method is performed in a Noise Feedback Coding (NFC) system having a corresponding ZERO-STATE filter structure, the ZERO-STATE filter structure including multiple filters. The method includes: (a) transforming the ZERO-STATE filter structure to a second ZERO-STATE filter structure including only an all-zero filter, the all-zero filter having a filter response substantially equivalent to a filter response of the ZERO-STATE filter structure including multiple filters; and (b) filtering a VQ codevector with the all-zero filter to produce the ZERO-STATE response error vector corresponding to the VQ codevector.
Terminology
Predictor:
A predictor P as referred to herein predicts a current signal value (e.g., a current sample) based on previous or past signal values (e.g., past samples). A predictor can be a short-term predictor or a long-term predictor. A short-term signal predictor (e.g., a short tern speech predictor) can predict a current signal sample (e.g., speech sample) based on adjacent signal samples from the immediate past. With respect to speech signals, such xe2x80x9cshort-termxe2x80x9d predicting removes redundancies between, for example, adjacent or close-in signal samples. A long-term signal predictor can predict a current signal sample based on signal samples from the relatively distant past. With respect to a speech signal, such xe2x80x9clong-termxe2x80x9d predicting removes redundancies between relatively distant signal samples. For example, a long-term speech predictor can remove redundancies between distant speech samples due to a pitch periodicity of the speech signal.
The phrases xe2x80x9ca predictor P predicts a signal s(n) to produce a signal ps(n)xe2x80x9d means the same as the phrase xe2x80x9ca predictor P makes a prediction ps(n) of a signal s(n).xe2x80x9d Also, a predictor can be considered equivalent to a predictive filter that predictively filters an input signal to produce a predictively filtered output signal.
Coding Noise and Filtering Thereof:
Often, a speech signal can be characterized in part by spectral characteristics (i.e., the frequency spectrum) of the speech signal. Two known spectral characteristics include 1) what is referred to as a harmonic fine structure or line frequencies of the speech signal, and 2) a spectral envelope of the speech signal. The harmonic fine structure includes, for example, pitch harmonics, and is considered a long-term (spectral) characteristic of the speech signal. On the other hand, the spectral envelope of the speech signal is considered a short-term (spectral) characteristic of the speech signal.
Coding a speech signal can cause audible noise when the encoded speech is decoded by a decoder. The audible noise arises because the coded speech signal includes coding noise introduced by the speech coding process, for example, by quantizing signals in the encoding process. The coding noise can have spectral characteristics (i.e., a spectrum) different from the spectral characteristics (i.e., spectrum) of natural speech (as characterized above). Such audible coding noise can be reduced by spectrally shaping the coding noise (i.e., shaping the coding noise spectrum) such that it corresponds to or follows to some extent the spectral characteristics (i.e., spectrum) of the speech signal. This is referred to as xe2x80x9cspectral noise shapingxe2x80x9d of the coding noise, or xe2x80x9cshaping the coding noise spectrum.xe2x80x9d The coding noise is shaped to follow the speech signal spectrum only xe2x80x9cto some extentxe2x80x9d because it is not necessary for the coding noise spectrum to exactly follow the speech signal spectrum. Rather, the coding noise spectrum is shaped sufficiently to reduce audible noise, thereby improving the perceptual quality of the decoded speech.
Accordingly, shaping the coding noise spectrum (i.e. spectrally shaping the coding noise) to follow the harmonic fine structure (i.e., long-term spectral characteristic) of the speech signal is referred to as xe2x80x9charmonic noise (spectral) shapingxe2x80x9d or xe2x80x9clong-term noise (spectral) shaping.xe2x80x9d Also, shaping the coding noise spectrum to follow the spectral envelope (i.e., short-term spectral characteristic) of the speech signal is referred to a xe2x80x9cshort-term noise (spectral) shapingxe2x80x9d or xe2x80x9cenvelope noise (spectral) shaping.xe2x80x9d
Noise feedback filters can be used to spectrally shape the coding noise to follow the spectral characteristics of the speech signal, so as to reduce the above mentioned audible noise. For example, a short-term noise feedback filter can short-term filter coding noise to spectrally shape the coding noise to follow the short-term spectral characteristic (i.e., the envelope) of the speech signal. On the other hand, a long-term noise feedback filter can long-term filter coding noise to spectrally shape the coding noise to follow the long-term spectral characteristic (i.e., the harmonic fine structure or pitch harmonics) of the speech signal. Therefore, short-term noise feedback filters can effect short-term or envelope noise spectral shaping of the coding noise, while long-term noise feedback filters can effect long-term or harmonic noise spectral shaping of the coding noise, in the present invention.