In speech coding, it is typically necessary to quantize a signal representing some property of the speech. Quantization is the process of converting a continuous range of values into a set of discrete values; or more realistically in the case of a digital system, converting a larger set of approximately-continuous discrete values into a smaller set of more substantially discrete values. The quantized discrete values are typically selected from predetermined representation levels. Types of quantization include scalar quantization, trellis quantization, lattice quantization, vector quantization, algebraic codebook quantization, and others. The quantization has the effect that the quantized version of the signal requires fewer bits per unit time, and therefore takes less signaling overhead to transmit or less storage space to store.
However, quantization is also a form of distortion of the signal, which may be perceived by an end listener as a kind of noise, sometimes referred to as coding noise. To help alleviate this problem, a noise shaping quantizer may be used to quantize the signal. The idea behind a noise shaping quantizer is to quantize the signal in a manner that weights or biases the noise effect created by the quantization into less noticeable parts of the frequency spectrum, e.g. where the human ear is more tolerant to noise, and/or where the speech energy is high such that the relative effect of the noise is less. That is, noise shaping is a technique to produce a quantized signal with a spectrally shaped coding noise. The coding noise may be defined quantitatively as the difference between input and output signals of the overall quantizing system, i.e. of the whole codec, and this typically has a spectral shape (whereas the quantization error usually refers to the difference between the immediate inputs and outputs of the actual quantization unit, which is typically spectrally flat).
FIG. 1a is a schematic block diagram showing one example of a noise shaping quantizer 11, which receives an input signal x(n) and produces a quantized output signal y(n). The noise shaping quantizer 11 comprises a quantization unit 13, a noise shaping filter 15, an addition stage 17 and a subtraction stage 19. The subtraction stage 19 calculates an error signal in the form of the coding noise q(n) by taking the difference between the quantized output signal y(n) and the input to the quantization unit 13, where n is the sample number. The coding noise q(n) is supplied to the noise shaping filter 15 where it is filtered to produce a filtered output. The addition stage 17 then adds this filtered output to the input signal x(n) and supplies the resulting signal to the input of the quantization unit 13.
The input, output and error signals are represented in FIG. 1a in the time domain as functions of time x(n), y(n) and q(n) respectively (with time being measured in number of samples n). As will be familiar to a person skilled in the art, the same signals can also be represented in the frequency domain as functions of frequency X(z), Y,(z) and Q(z) respectively (z representing frequency). In that case, the noise shaping filter can be represented by a function F(z) in the frequency domain, such that the quantized output signal can be described in the frequency domain as:Y(z)=X(z)+(1+F(z))·Q(z)
The quantization error Q(z) typically has a spectrum that is approximately white (i.e. approximately constant energy across its frequency spectrum). Therefore the coding noise has a spectrum approximately proportional to 1+F(z).
Another example of a noise shaping quantizer 21 is shown schematically in FIG. 1b. The noise shaping quantizer 21 comprises a quantization unit 23, a noise shaping filter 25, an addition stage 27 and a subtraction stage 29. Similarly to FIG. 1a, an error signal in the form of the coding noise q(n) is supplied to the noise shaping filter 25 where it is filtered to produce a filtered output, and the addition stage 27 then adds this filtered output to the input signal x(n) and supplies the resulting signal to the input of the quantization unit 13. However, unlike FIG. 1a, the subtraction stage 29 of FIG. 1b calculates the error q(n) as the coding noise signal, defined as the difference between the quantized output signal y(n) and the input signal x(n), i.e. the input signal before the filter output is added rather than the immediate input to the quantization unit 23. In this case, the quantized output signal y(n) can be described in the frequency domain as:
      Y    ⁡          (      z      )        =            X      ⁡              (        z        )              +                            Q          ⁡                      (            z            )                                    1          -                      F            ⁡                          (              z              )                                          .      
Therefore the coding noise has a spectrum proportional to (1−F(z))−1.
Another example is shown in FIG. 1c, which is a schematic block diagram of an analysis-by-synthesis quantizer 31. Analysis-by-synthesis is a method in speech coding whereby a quantizer codebook is searched to minimize a weighted coding error signal (the codebook defines the possible representation levels for the quantization). This works by trying representing samples of the input signal according to a plurality of different possible representation levels in the codebook, and selecting the levels which produce the least energy in the weighted coding error signal. The weighting is to bias the coding error towards less noticeable parts of the frequency spectrum.
Referring to FIG. 1c, the analysis-by-synthesis quantizer 31 receives an input signal x(n) and produces a quantized output signal y(n). It comprises a controllable quantization unit 33, a weighting filter 35, an energy minimization block 37, and a subtraction stage 39. The quantization unit 33 generates a plurality of possible versions of a portion of the quantized output signal y(n). For each possible version, the subtraction stage 39 subtracts the quantized output y(n) from the input signal x(n) to produce an error signal, which is supplied to the weighting filter 35. The weighting filter 35 filters the error signal to produce a weighted error signal, and supplies this filtered output to the energy minimization block 37. The energy minimization block 37 determines the energy in the weighted error signal for each possible version of the quantized output signal y(n), and selects the version resulting in the least energy in the weighted error signal.
Thus the weighted coding error signal is computed by filtering the coding error with a weighting filter 35, which can be represented in the frequency domain by a function W(z). For a well-constructed codebook able to approximate the input signal, the weighted coding noise signal with minimum energy is approximately white. That means that the coding noise signal itself has a noise spectrum shaped proportional the inverse of the weighting filter: W(z)-1. By defining W(z)=1−F(z), and noting that the quantizer in FIG. 1c searches a codebook to minimize the quantization error between quantizer output and input, it is clear that analysis-by-synthesis quantization can be interpreted as noise shaping quantization.
Once a quantized output signal y(n) is found according to one of the above techniques, indices corresponding to the representation levels selected to represent the samples of the signal are transmitted to the decoder in the encoded signal, such that the quantized signal y(n) can be reconstructed again from those indices in the decoding. In order to efficiently encode these quantization indices, the input to the quantizer is commonly whitened with a prediction filter.
A prediction filter generates predicted values of samples in a signal based on previous samples. In speech coding, it is possible to do this because of correlations present in speech samples (correlation being a statistical measure of a degree of relationship between groups of data). These correlations could be “long-term” correlations between quasi-periodic portions of the speech signal, or “short-term” correlations on a timescale shorter than such periods. The predicted samples are then subtracted from the actual samples to produce a residual signal. This residual signal, i.e. the difference between the predicted and actual samples, typically has a lower energy than the original speech samples and therefore requires fewer bits to quantize. That is, it is only necessary to quantize the difference between the original and predicted signals.
FIG. 1d shows an example of a noise shaping quantizer 41 where the quantizer input is whitened using linear prediction filter P(z). The predictor operates in closed-loop, meaning that a prediction of the input signal is based on the quantized output signal. The output of the prediction filter is subtracted from the quantizer input and added to the quantizer output to form the quantized output signal.
Referring to FIG. 1d, the noise shaping quantizer 41 comprises a quantization unit 42, a prediction filter 44, a noise shaping filter 45, a first addition stage 46, a second addition stage 47, a first subtraction stage 48 and a second subtraction stage 49. The first subtraction stage 48 calculates the coding error (i.e. coding noise) by taking the difference between the quantized output signal y(n) and the input signal x(n), and supplies the coding noise to the noise shaping filter 45 where it is filtered to generate a filtered output. The quantized output signal y(n) is also supplied to the prediction filter 44 where it is filtered to generate another filtered output. The output of the noise shaping filter 45 is added to the input signal x(n) at the first addition stage 46 and the output of the prediction filter 44 is subtracted from the input signal x(n) at the second subtraction stage 49. The resulting signal is input to the quantization unit 42, to generate an output being a quantized version of its input, and also to generate quantization indices i(n) corresponding to the representation levels selected to represent that input in the quantization. The output of the prediction filter 44 is then added back to the output of the quantization unit 42 at the second addition stage 47 to produce the quantized output signal y(n).
Note that, in the encoder, the quantized output signal y(n) is generated only for feedback to the prediction filter 44 and noise shaping filter 45: it is the quantization indices i(n) that are transmitted to the decoder in the encoded signal. The decoder will then reconstruct the quantized signal y(n) using those indices i(n).
FIG. 1e shows another example of a noise shaping quantizer 51 where the quantizer input is whitened using a linear prediction filter P(z). The predictor operates in open-loop manner, meaning that a prediction of the input signal is based on the input signal and a prediction of the output is based on the quantized output signal. The output of the input prediction filter is subtracted from the quantizer input and the output of the output prediction filter is added to the quantizer output to form the quantized output signal.
Referring to FIG. 1e, the noise shaping quantizer 51 comprises a quantization unit 52, a first instance of a prediction filter 54, a second instance of the same prediction filter 54′, a noise shaping filter 55, a first addition stage 56, a second addition stage 57, a first subtraction stage 58 and a second subtraction stage 59. The quantization unit 52, noise shaping filter 55, and first addition and subtraction stages 56 and 58 are arranged to operate similarly to those of FIG. 1d. However, in contrast to FIG. 1d, the output of the first addition stage 54 is supplied to the first instance of the prediction filter 54 where it is filtered to generate a filtered output, and this output of the first instance of the prediction filter 54 is then subtracted from the output of the first addition stage 56 at the second subtraction stage 59 before the resulting signal is input to the quantization unit 52. The output of the second instance of the prediction filter 54′ is added to the output of the quantization unit 52 at the second addition stage 57 to generate the quantized output signal y(n), and this quantized output signal y(n) is supplied to the second instance of the prediction filter 54′ to generate its filtered output.