The general principle of embedded-codes ADPCM coding/decoding specified by recommendations ITU-T G.722, ITU-T G.726 or ITU-T G.727 is as described with reference to FIGS. 1 and 2.
FIG. 1 thus represents an embedded-codes coder of ADPCM type (e.g.: G.722 low band, G.727) operating between B and B+K bits per sample; note that the case of nonscalable ADPCM coding (e.g.: G.726, G.722 high band) corresponds to K=0.
It comprises:                a prediction module 110 making it possible to give the prediction of the signal xPB(n) on the basis of the previous samples of the quantized error signal eQB(n′)=yIBB(n′)v(n′)n′=n−1, . . . , n−NZ, where v(n′) is the quantization scale factor, and on the basis of the reconstructed signal rB(n′)n′=n−1, . . . , n−NP where n is the current instant.        a subtraction module 120 which deducts the prediction xPB(n) of the input signal x(n) from the latter to obtain a prediction error signal denoted e(n).        a quantization module 130 QB+K for the error signal which receives as input the error signal e(n) to give quantization indices IB+K(n) consisting of B+K bits. The quantization module QB+K is of embedded-codes type, that is to say it comprises a core quantizer with B bits and quantizers with B+k k=1, . . . , K bits which are embedded in the core quantizer.        
In the case of the ITU-T G.722 standard (coding of the low band), the decision levels and the reconstruction levels for the quantizers QB, QB+1, QB+2 for B=4 are given by tables IV and VI of the overview article describing the G.722 standard by X. Maitre. “7 kHz audio coding within 64 kbit/s”. IEEE Journal on Selected Areas in Communication, Vol. 6-2, February 1988.
The quantization index IB+K(n) of B+K bits at the output of the quantization module QB+K is transmitted via the transmission channel 140 to the decoder as described with reference to FIG. 2.
The coder also comprises:                a module 150 for deleting the K low-order bits of the index IB+K(n) to give a low bitrate index IB(n);        an inverse quantization module 121 (QB)−1 to give as output a quantized error signal eQB(n)=yIBB(n) v(n) on B bits;        a module 170 QAdapt for adaptation of the quantizers and inverse quantizers to give a level control parameter v(n) also called a scale factor, for the following instant;        an addition module 180 for adding the prediction xPB(n) to the quantized error signal to give the low-bitrate reconstructed signal rB(n);        a module 190 PAdapt for adaptation of the prediction module on the basis of the quantized error signal on B bits eQB(n) and of the signal eQB(n) filtered by 1+Pz(z).        
It may be noted that in FIG. 1 the hatched part referenced 155 represents the low-bitrate local decoder which contains the predictors 165 and 175 and the inverse quantizer 121. This local decoder thus makes it possible to adapt the inverse quantizer at 170 on the basis of the low bitrate index IB(n) and to adapt the predictors 165 and 175 on the basis of the low bitrate data reconstructed.
This part is also found identically in the embedded-codes ADPCM decoder as described with reference to FIG. 2.
In the absence of frame losses, the embedded-codes ADPCM decoder of FIG. 2 receives as input the indices IB+k, where 0≦k≦K, arising from the transmission channel 140, version of IB+k possibly disturbed by binary errors. The decoder carries out an inverse quantization with the inverse quantization module 210 (QB)−1 of bitrate B bits per sample to obtain the signal e′QB(n)=yI′BB(n) v′(n). The symbol “′” indicates a value decoded on the basis of the bits received, and which may possibly differ from the value used by the coder on account of transmission errors. The output signal r′B(n) for B bits will be equal to the sum of the prediction x′PB(n) of the signal and of the output e′QB(n) of the B-bits inverse quantizer. This part 255 of the decoder is identical to the low-bitrate local decoder 155 of FIG. 1.
By employing the bitrate indicator mode and the selector 220, the decoder can improve the signal reconstructed.
Indeed if mode indicates that B+1 bits have been transmitted, the output will be equal to the sum of the prediction x′PB(n) and of the output of the inverse quantizer 230 with B+1 bits y′IB+1B+1(n)v′(n).
If mode indicates that B+2 bits have been transmitted then the output will be equal to the sum of the prediction x′PB(n) and of the output of the inverse quantizer 240 with B+2 bits y′IB+2B+2(n)v′(n).
The embedded-codes ADPCM coding of the ITU-T standard G.722 (hereinafter named G.722) carries out a coding of the signals in broad band which are defined with a minimum bandwidth of [50-7000 Hz] and sampled at 16 kHz. The G.722 coding is an ADPCM coding of each of the two signal sub-bands [0-4000 Hz] and [4000-8000 Hz] obtained by decomposing the signal with quadrature mirror filters. The low band is coded by an embedded-codes ADPCM coding on 6, 5 and 4 bits while the high band is coded by an ADPCM coder of 2 bits per sample. The total bitrate will be 64, 56 or 48 bit/s depending on the number of bits used for decoding the low band.
This coding was firstly developed for use in ISDN (Integrated Services Digital Network). It has been recently deployed in telephone applications of improved quality over IP networks.
For a quantizer with a large number of levels, the spectrum of the quantization noise will be relatively flat. However, in the frequency zones where the signal has low energy, the noise may have a greater level than the signal and is therefore no longer necessarily masked. It may then become audible in these regions.
Shaping of the coding noise is therefore necessary. In a coder like G.722, shaping of the coding noise adapted to embedded-codes coding is moreover desirable.
Generally, the aim of coding noise shaping is to obtain quantization noise whose spectral envelope follows the short-term masking threshold; this principle is often simplified so that the spectrum of the noise approximately follows the spectrum of the signal, ensuring a homogeneous signal-to-noise ratio so that the noise remains inaudible even in the lower energy zones of the signal.
A noise shaping technique for a coding of PCM type (for “Pulse Code Modulation”) with embedded codes is described in ITU-T recommendation G.711.1 “Wideband embedded extension for G.711 pulse code modulation” or “G.711.1: A wideband extension to ITU-T G.711”. Y. Hiwasaki, S. Sasaki, H. Ohmuro, T. Mori, J. Seong, M. S. Lee, B. Kövesi, S. Ragot, J.-L. Garcia, C. Marro, L. M., J. Xu, V. Malenovsky, J. Lapierre, R. Lefebvre. EUSIPCO, Lausanne, 2008.
This recommendation describes a coding with shaping of the coding noise by noise feedback such as illustrated in FIG. 3. A perceptual filter F(z) for shaping the coding noise (block 305) is calculated (block 303) on the basis of the decoded signals s′L0(n) with a core bitrate of 64 kbit/s (L0 for Layer 0), arising from an inverse core quantizer (block 301). A core bitrate local decoder (block 301) therefore makes it possible to calculate the noise shaping filter F(z). Thus, at the decoder, it is also possible to calculate this same noise shaping filter on the basis of the core bitrate decoded signals.
A quantizer delivering core bits (block 308) and a quantizer delivering improvement bits (block 309) is used at the G.711.1 coder.
The G.711.1 decoder receiving the core binary stream (L0) and the improvement bits (L1), calculates the filter F(z) for shaping the coding noise in the same manner as at the coder on the basis of the core bitrate (64 kbit/s) decoded signal and applies this filter to the output signal of the inverse quantizer for the improvement bits, the shaped high-bitrate signal being obtained by adding the filtered signal to the decoded core signal.
Noise shaping thus improves the perceptual quality of the core bitrate signal. It offers limited improvement in quality for the improvement bits. Indeed, the coding noise shaping is not performed for the coding of the improvement bits, the input of the quantizer being the same for the core quantization as for the improved quantization.
The decoder must then delete a resulting spurious component through adapted filtering, when the improvement bits are decoded in addition to the core bits.
Noise shaping by noise feedback as implemented in recommendation G.711.1 is generalizable to PCM coders other than G.711 and to coding of ADPCM type.
An exemplary known noise feedback structure in PCM/ADPCM coding is presented in FIG. 4.
Hereinafter the following notation will be used:
s(n): input signal to be coded
s′(n): input signal of the coder (modified signal to be coded)
{tilde over (s)}(n): decoded signal provided by the local decoder
q(n)=s′ (n)−{tilde over (s)}(n): quantization noise of the coder
FIG. 4 illustrates an exemplary implementation of the shaping of the PCM/ADPCM coding noise. This coder comprises a PCM/ADPCM coding block 502 and a local decoder 503. The coding noise qG(n)=s(n)−{tilde over (s)}(n) is filtered (block 504) and reinjected (block 505) onto the signal s(n). The prediction coefficients are estimated (block 500) on the basis of the signal s(n) whereas in G.711.1 (FIG. 3) they are estimated on the basis of the past decoded signal at the core bitrate. In a known manner, the filter A(z/γ) is typically obtained (block 500) on the basis of a linear prediction filter A(z) modeling the short-term correlations of the signal s(n), by attenuating the coefficients of the linear prediction filter A(z). The coding noise will be shaped by the filter
      H    ⁡          (      z      )        =                    P        1            ⁡              (        z        )              =          1              A        ⁡                  (                      z            /            γ                    )                    with γ=0.92 as a typical value.
Indeed, for the scheme of FIG. 4, starting from S′(z)={tilde over (S)}(z)+Q(z), with q(n)=s′ (n)−{tilde over (s)}(n) the PCM/ADPCM quantization noise, it may be shown that in the z-transform domain:
            S      ⁡              (        z        )              -                  S        ~            ⁡              (        z        )              =            Q      ⁡              (        z        )                    A      ⁡              (                  z          /          γ                )            Stated otherwise the “global” coding noise qG(n)=s(n)−{tilde over (s)}(n) corresponds to the PCM/ADPCM quantization noise q(n) filtered (shaped) by
      1          A      ⁡              (                  z          /          γ                )              .
Noise feedback applied to the ADPCM coding is an effective technique for improving the quality of PCM/ADPCM coders, by masking the coding noise, particularly for “natural” audio signals such as speech or music. The scheme of FIG. 4 makes it possible to shape the coding noise according to a masking filter
  1      A    ⁡          (              z        /        γ            )      so as to obtain a more homogeneous signal-to-noise ratio according to the frequencies.
However, for certain less “natural” signals than speech or music, noise feedback can, as is sometimes the case with looped systems, become unstable and lead to degradation or saturation of the decoded signal. Here, saturation has to be taken in the sense that the amplitude of the decoded signal exceeds the maximum values representable at finite precision (example: 16-bit signed integers) and thus leads to clipping of the signal.
Examples of problematic signals in respect of noise feedback are signals exhibiting fast transitions between stationary sequences of large spectral dynamic range, such as for example a series of pure sinusoids of different frequencies separated by short segments of silence.
In particular, “tonal” signals (pure sinusoids) are considered to be signals at risk that may give rise to a problem of instability or of saturation in coding schemes with noise feedback.
For this type of signal, the estimated masking (or shaping) filter
  1      A    ⁡          (              z        /        γ            )      varies rapidly in the transitions between sinusoids and in the attacks, the quantization noise which is reinjected is often very high.
The problem of stability and of saturation which is observed with noise feedback is particularly critical in ADPCM coding. Indeed, the ADPCM coding such as implemented in G.722 relies on a progressive adaptation of the coding parameters (quantization interval, prediction coefficients). This adaptation is done sample by sample according to a principle similar to the LMS (for “Least Mean Square”) algorithm in adaptive filtering, thereby implying that the adaptation does not immediately follow the nonstationary characteristics of the signal to be coded. It is known that for certain signals the adaptation in the ADPCM coding alone (without noise feedback) may drop out (“mistracking”), in the sense that the adaptation diverges before re-converging after a certain time.
For problematic signals, the noise feedback may disturb the adaptation of the ADPCM coding, since—returning to FIG. 4—the signal to be coded s(n) is modified by the reinjected noise d(n) to form the signal s′(n).
When the reinjected noise d(n) is of similar level to the level of the signal s(n)—this often being the case in the fast transitions between stationary sequences of large spectral dynamic range —, the signal s′(n) at the input of the ADPCM coder may become very “unstable” depending on whether the signals s(n) and d(n) are in phase or out-of-phase. If moreover the ADPCM coding has an adaptation which drops out (“mistracking”), the noise feedback will amplify the duration and the magnitude of the dropout.
To show the origin of this phenomenon it is possible to calculate the Perceptual Signal-to-Noise Ratio RSBP (perceptual since it includes the effect of the noise feedback aimed at masking the coding noise):
      RSB    P    =                    ∑                  n          =          0                          N          -          1                    ⁢                        s          2                ⁡                  (          n          )                                    ∑                  n          =          0                          N          -          1                    ⁢                        [                                    s              ⁡                              (                n                )                                      -                                          s                ~                            ⁡                              (                n                )                                              ]                2            It may be shown that:
      RSB    P    =            G      MICDA        ⁡          [                                                  RSB              Q                        -            1                                E            D                          +        1            ]      where GMICDA is the prediction gain of the ADPCM coder, RSBQ the Signal-to-Noise Ratio of the ADPCM quantizer (around 24 dB for a 5-bit Laplace quantizer) and ED the energy of the impulse response fD(n) of the masking filter.
According to this formula, it is seen that the lower the gain GADPCM, and/or the higher the energy ED, the lower is RSBP. These two conditions (low GADPCM and high ED) both hold in situations of transitions between two sequences of pure sinusoids since the gain GADPCM becomes very low (the ADPCM coding adapted to the first pure sinusoid takes a certain time before readapting to the second pure sinusoid) and ED is high since the sinusoids give very resonant reinjection filters. In this case the ADPCM coder will be unstable or close to instability.
Such instability and saturation phenomena are not acceptable since they can generate audible artifacts (e.g.: amplitude spikes localized in time), or indeed “acoustic shocks” in the case of complete saturation of the temporal level of the signal.
There therefore exists a need to forestall and control instability and saturation phenomena in coding structures with feedback, in particular for problematic signals such as series of pure sinusoids at various frequencies.