1. Field of the Invention
This invention relates generally to digital communications, and more particularly, to the coding and decoding of speech or other audio signals in a digital communications system.
2. Related Art
In speech or audio coding, a coder encodes an input speech or audio signal into a digital bit stream for transmission or storage, and a decoder decodes the bit stream into an output speech or audio signal. The combination of the coder and the decoder is called a codec.
In the field of speech coding, a popular encoding method is predictive coding. Rather than directly encoding the speech signal samples into a bit stream, a predictive encoder predicts the current input speech sample from previous speech samples, subtracts the predicted value from the input sample value, and then encodes the difference, or prediction residual, into a bit stream. The decoder decodes the bit stream into a quantized version of the prediction residual, and then adds the predicted value back to the residual to reconstruct the speech signal. This encoding principle is called Differential Pulse Code Modulation, or DPCM.
In conventional DPCM codecs, the coding noise, or the difference between the input signal and the reconstructed signal at the output of the decoder, is white. In other words, the coding noise has a flat spectrum. Since the spectral envelope of voiced speech slopes down with increasing frequency, such a flat noise spectrum means the coding noise power often exceeds the speech power at high frequencies. When this happens, the coding distortion is perceived as a hissing noise, and the decoder output speech sounds noisy. Thus, white coding noise is not optimal in terms of perceptual quality of output speech.
The perceptual quality of coded speech can be improved by adaptive noise spectral shaping, in which the spectrum of the coding noise is adaptively shaped so that it follows the input speech spectrum to some extent. In effect, this makes the coding noise more speech-like. Due to the noise masking effect of human hearing, such shaped noise is less audible to human ears. Therefore, codecs employing adaptive noise spectral shaping provide better output quality than codecs that produce white coding noise.
In recent and popular predictive speech coding techniques such as Multi-Pulse Linear Predictive Coding (MPLPC) or Code-Excited Linear Prediction (CELP), adaptive noise spectral shaping is achieved by using a perceptual weighting filter to filter the coding noise and then calculating the mean-squared error (MSE) of the filter output in a closed-loop codebook search. However, an alternative method for adaptive noise spectral shaping, known as Noise Feedback Coding (NFC), had been proposed more than two decades before MPLPC or CELP came into existence.
The basic ideas of NFC date back to the work of C. C. Cutler as described in U.S. Pat. No. 2,927,962, issued Mar. 8, 1960 and entitled “Transmission Systems Employing Quantization”. Based on Cutler's ideas, E. G. Kimme and F. F. Kuo proposed a noise feedback coding system for television signals in their paper “Synthesis of Optimal Filters for a Feedback Quantization System,” IEEE Transactions on Circuit Theory, pp. 405-413, September 1963. Enhanced versions of NFC, applied to Adaptive Predictive Coding (APC) of speech, were later proposed by J. D. Makhoul and M. Berouti in “Adaptive Noise Spectral Shaping and Entropy Coding in Predictive Coding of Speech,” IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 63-73, February 1979, and by B. S. Atal and M. R. Schroeder in “Predictive Coding of Speech Signals and Subjective Error Criteria,” IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 247-254, June 1979. Such codecs are sometimes referred to as APC-NFC. More recently, NFC has also been used to enhance the output quality of Adaptive Differential Pulse Code Modulation (ADPCM) codecs, as proposed by C. C. Lee in “An enhanced ADPCM Coder for Voice Over Packet Networks,” International Journal of Speech Technology, pp. 343-357, May 1999.
In noise feedback coding, the difference signal between the quantizer input and output is passed through a filter, whose output is then added to the prediction residual to form the quantizer input signal. By carefully choosing the filter in the noise feedback path (called the noise feedback filter), the spectrum of the overall coding noise can be shaped to make the coding noise less audible to human ears. Initially, NFC was used in codecs with only a short-term predictor that predicts the current input signal samples based on the adjacent samples in the immediate past. Examples of such codecs include the systems proposed by Makhoul and Berouti in their 1979 paper. The noise feedback filters used in such early systems are short-term filters. As a result, the corresponding adaptive noise shaping only affects the spectral envelope of the noise spectrum.
In addition to the short-term predictor, Atal and Schroeder added a three-tap long-term predictor in the APC-NFC codecs proposed in their 1979 paper cited above. Such a long-term predictor predicts the current sample from samples that are roughly one pitch period earlier. For this reason, it is sometimes referred to as the pitch predictor in the speech coding literature. While the short-term predictor removes the signal redundancy between adjacent samples, the pitch predictor removes the signal redundancy between distant samples due to the pitch periodicity in voiced speech. Thus, the addition of the pitch predictor further enhances the overall coding efficiency of the APC systems.
The basic structure of a conventional NFC codec 100 is illustrated in FIG. 1. As shown in that figure, an encoder portion of codec 100 includes a first predictor 102, a first combiner 104, and a quantizer portion 106. Quantizer portion 106 includes a quantizer 110, a second combiner 108, a third combiner 112, and a noise feedback filter 114. A decoder portion of codec 100 includes a fourth combiner 116 and a second predictor 118.
The encoder portion of codec 100 encodes a sampled input speech signal s(n) to produce a quantizer output signal û(n). In particular, input speech signal s(n) is received by first predictor 102 and first combiner 104. First predictor 102 predicts input speech signal s(n) to produce a predicted speech signal. The predicted speech signal is then subtracted from s(n) at combiner 104 to produce a prediction residual signal d(n).
Within quantizer portion 106, second combiner 108 receives prediction residual signal d(n) and combines it with a noise feedback signal from noise feedback filter 114 to produce a quantizer input signal u(n). Quantizer 110 quantizes input signal u(n) to produce quantizer output signal û(n). Third combiner 112 combines, or differences, signals u(n) and û(n) to produce a quantization error signal q(n). Noise feedback filter 114 filters quantization error signal q(n) to produce the previously-described noise feedback signal.
The decoder portion of codec 100 receives quantizer output signal û(n) and decodes it to produce reconstructed speech signal ŝ(n). In particular, fourth combiner 116 combines quantizer output signal û(n) with a predicted reconstructed speech signal provided by second predictor 118 to produce reconstructed speech signal ŝ(n). Second predictor 118 predicts the reconstructed speech signal based on past samples of ŝ(n).
Due to the configuration of codec 100, the final shape of the coding noise is determined by predictor 102 and noise feedback filter 114. Predictors 102 and 118 are each designed to optimally predict input speech or audio signal s(n) and have an identical transfer function of
                                                        P              ^                        ⁡                          (              z              )                                =                                    ∑                              i                =                1                            M                        ⁢                                                            α                  ^                                i                            ⁢                              z                                  -                  i                                                                    ,                            (        1        )            where M is the predictor order and {circumflex over (α)}i is the i-th predictor coefficient. As used herein, the nomenclature {circumflex over (P)}(z) and αi is intended to indicate the use of quantized predictor coefficients, while P(z) and αi indicate the use of non-quantized predictor coefficients.
The noise feedback filter F(z) can have many possible forms. One popular form of F(z) is functionally related to the predictor {circumflex over (P)}(z) as described in equation (1) and is given by
                                          F            ⁡                          (              z              )                                =                                    ∑                              i                =                1                            L                        ⁢                                          f                i                            ⁢                              z                                  -                  i                                                                    ,                            (        2        )            wherein L is the filter order and fi is the i-th filter coefficient, and wherein L=M and fi=δi{circumflex over (α)}i, or F(z)={circumflex over (P)}(z/δ). The variable δ denotes a filter control parameter. Given the NFC codec structure in FIG. 1, and using F(z) as defined above, the final shape of the coding noise may be expressed as
                                                        W              1                        ⁡                          (              z              )                                =                                                    1                -                                  F                  ⁡                                      (                    z                    )                                                                              1                -                                                      P                    ^                                    ⁡                                      (                    z                    )                                                                        =                                                            A                  ^                                ⁡                                  (                                      z                    /                    δ                                    )                                                                              A                  ^                                ⁡                                  (                  z                  )                                                                    ,                            (        3        )            where
                    A        ^            ⁡              (        z        )              =                  1        -                              P            ^                    ⁡                      (            z            )                              =                        ∑                      i            =            0                    M                ⁢                                            a              ^                        i                    ⁢                      z                          -              i                                            ,in which {circumflex over (α)}0=1, {circumflex over (α)}i=−αi,i=1, . . . , M. It has been found in some implementations that using an eighth order predictor and noise feedback filter (L=M=8) and setting δ=0.75 produces satisfactory results in terms of masking coding noise.
From the standpoint of cost and complexity, NFC codec 100 is relatively simple to implement due to its structure and also because it utilizes an all-zero noise feedback filter. However, codec 100 provides limited flexibility for controlling final noise shape due to the way in which the all-zero noise feedback filter must be specified. In other words, because the denominator of W1(z) is fixed and wholly dependent on the design of input predictor {circumflex over (P)}(z), the degree to which final noise shaping can be controlled is somewhat limited.
FIG. 2 shows the structure of an alternative NFC codec 200 for conventional noise feedback coding. Makhoul and Berouti proposed this structure in their 1979 paper cited above. As shown in FIG. 2, codec 200 comprises a quantizer portion 202 that encompasses both encoder and decoder functions. Quantizer portion 202 includes a first combiner 204, a second combiner 208, a third combiner 210, a fourth combiner 216, a quantizer 206, a predictor 212, and a noise feedback filter 214.
Codec 200 operates as follows. An input speech signal s(n) is received by first combiner 204, which combines s(n) with a feedback signal to generate a quantizer input signal u(n). Quantizer 206 quantizes input signal u(n) to produce quantizer output signal û(n). Second combiner 208 combines, or differences, signals u(n) and û(n) to produce a quantization error signal q(n). Noise feedback filter 214 filters quantization error signal q(n) to produce a noise feedback signal which is provided to fourth combiner 216.
Quantizer output signal û(n) is received by third combiner 210 which combines û(n) with a predicted reconstructed speech signal output by predictor 212 to produce a reconstructed speech signal ŝ(n). Predictor 212 predicts the reconstructed speech signal based on past samples of ŝ(n). The output of predictor 212 is also received by fourth combiner 216, which combines it with the noise feedback signal output by noise feedback filter 214 to produce the previously-described feedback signal received by first combiner 204.
Due to the configuration of codec 200, the final shape of the coding noise is determined entirely by N(z). Thus, more flexibility is permitted in controlling the coding noise as compared to codec 100, in which noise shaping is dictated in part by the input predictor {circumflex over (P)}(z). In practice, it has been observed that a desirable noise shape is achieved with codec 200 by defining N(z) with reference to predictor 212 such that the spectral shape of the coding noise is given by
                                                        W              2                        ⁡                          (              z              )                                =                                    N              ⁡                              (                z                )                                      =                                          A                ⁡                                  (                                      z                    /                                          δ                      1                                                        )                                                            A                ⁡                                  (                                      z                    /                                          δ                      2                                                        )                                                                    ,                            (        4        )            wherein A(z/δ1)=1−P(z/δ1) and A(z/δ2)=1−P(z/δ2). The variables δ1 and δ2 denote filter control parameters. Setting δ1=0.5 and δ2=0.85 has produced good noise masking results in some implementations. Note that because N(z) can be specified freely, non-quantized predictor coefficients can be used to implement noise feedback filter 212, whereas noise feedback filter 114 of codec 100 should be implemented using quantized predictor coefficients.
The alternative NFC codec 200 of FIG. 2 provides much greater flexibility for controlling the shaping of coding noise as compared to structure 100 of FIG. 1 because the designer can control both the numerator and denominator of W2(z). However, the cost and complexity of this alternative approach is relatively high as compared to structure 100 because, in part, the noise feedback filter is a pole-zero filter.
What is desired therefore is a technique for combining the benefits of the foregoing NFC implementations. More specifically, what is desired is an NFC implementation that provides the flexibility of codec 200 with respect to controlling the shape of coding noise but nevertheless utilizes the simpler and less costly configuration of codec 100.