The general principle of PCM coding/decoding as specified by ITU-T Recommendation G.711 is described with reference to FIG. 1.
The PCM coder 13 includes a quantizing module QPCM 10 that receives as input the input signal S(z). The quantizing index IPCM at the output of the quantizing module 10 is transmitted via the transmission channel 11 to the decoder 14.
The PCM decoder 14 receives as input the indices I′PCM coming from the transmission channel, in a version possibly suffering from binary errors in IPCM, and effects inverse quantization by the inverse quantizing module Q−1PCM 12 to obtain the coded signal {tilde over (S)}PCM (z).
PCM coding as standardized by ITU-T Recommendation G.711 (hereinafter G.711) compresses the amplitude of the signals—which are defined with a minimum bandwidth of [300-3400 Hz] and sampled at 8 kHz—by a logarithmic curve that produces a signal to noise ratio that is practically constant for a wide signal dynamic range. The quantizing step in the domain of the original signal is proportional to the amplitude of the signals.
The compressed signal is quantized on 8 bits (256 levels). In the public switched telephone network (PSTN) these 8 bits are transmitted at a frequency of 8 kHz, yielding a bit rate of 64 kbit/s.
A G.711 quantized signal frame consists of quantizing indices coded on 8 bits. Accordingly, if the inverse quantization is implemented by means of a table, it consists simply in the index pointing to one of 256 possible decoded values.
In the A-law G.711 standard (Europe) or the μ-law G.711 standard (North America and Japan), the 8 bits are distributed in the following manner as represented at 15 in FIG. 1:
One sign bit S, three bits to indicate the segment and 4 bits to indicate the location in the segment.
The quantizing step in the coder generates quantizing noise that consists of the difference between the original signal and the decoded signal.
With a large number (256) of quantizing levels, the quantizing noise has a relatively flat spectrum as shown at 20 in FIG. 2. The spectrum of the signal (here a voiced signal block) having a wide dynamic range (˜40 dB) is represented at 22 in FIG. 2. It can be seen that in areas of low energy the noise is very close to the signal and is therefore not necessarily masked. It can then become audible in these regions (from 2300 to 3500 Hz).
In the case of adaptive predictive speech coders, quantizing noise shaping techniques have been used to mask this noise and as far as possible render it inaudible. Because of the property of the human ear of masking simultaneous frequencies, it is possible to inject more quantizing noise in areas in which the signal has more energy. Noise shaping improves the spectral distribution of the quantizing noise by reducing the quantizing noise level in areas of low energy to redistribute it into areas of higher energy.
Such a technique is described for example in “Adaptive noise spectral shaping and entropy coding in predictive coding of speech” by J. Makhoul, M. Berouti in IEEE Trans. ASSP, Vol. 27-3, June 1979.
That document describes the use of linear filters taking the reconstructed signal into account. The quantizing noise shaping filter is derived from the linear predictive coding (LPC) synthesis filter. Thus the frame obtained at the output of this type of coder includes indices of linear prediction coefficients of the filters, a gain standardization factor index and the quantizing indices.
Moreover, in the above reference, the noise shaping filter is calculated from the synthesis filter reconstructed from the linear prediction coefficient indices. The noise shaping filter will therefore be subject to the coding noise of the linear prediction coefficients. Moreover, in the cited reference, the transfer function of the shaping filter has coefficients only in the numerator, calculated by two cascaded linear predictions. The two cascaded linear predictions each contributing their share of inaccuracy, the result, as clearly indicated in the cited reference, is that noise shaping is effective only for a number of coefficients at most equal to 2.
The paper by Makhoul and Berouti shows that quantizing noise shaping is possible in adaptive predictive systems characterized by a synthesis model consisting of an inverse quantizing module and a short-term predictive filter. Synthesis filters are used in the coding structure to obtain the appropriate shaping.
This technique is therefore not suitable for non-predictive coders that have no synthesis filters, like the PCM coder (in particular G.711 coders). Quantization including shaping—as described in the paper by Makhoul and Berouti—is effected in the domain of the linear prediction (or excitation) residue, i.e. after the original signal is filtered by a predictive filter A(z). The coefficients of the filter A(z) must therefore be sent to the decoder to effect synthesis filtering 1/A(z) after inverse quantization. Moreover, the noise shaping is effected by a second order reduced function B(z) deduced from the function A(z) sent.
The foregoing remark applies to the paper by J. H. Chen. “Novel codec structures for noise feedback coding of speech”, Proc. Of ICASSP, 2006, PP.I-681:I-684, which builds on the paper by Makhoul and Berouti by incorporating a long-term predictor and quantizing noise shaping by a long-term shaping filter. Moreover, it is vectorial quantization that is effected in the paper by Chen et al.