There are various techniques for digitizing and compressing audio-frequency speech, music, etc. signals. The methods most widely used are:                “waveform coding” methods such as PCM and ADPCM coding;        “parametric analysis-synthesis coding” methods, such as code excited linear prediction (CELP) coding;        “sub-band or transform perceptual coding” methods.        
These classic techniques for coding audio-frequency signals are described in W. B. Kleijn and K. K. Paliwal, Editors, “Speech Coding and Synthesis”, Elsevier, 1995.
As indicated above, the invention is essentially concerned with transform coding techniques.
ITU-T Recommendation G.722.1, “Coding at 24 kbit/s and 32 kbit/s for hands-free operation in systems with low frame loss”, September 1999, describes a transform coder for compressing speech or music audio signals in a pass-band from 50 hertz ′Hz) to 7000 Hz, referred to as the wide band, at a sampling frequency of 16 kilohertz (kHz) and at a bit rate of 24 kilobits per second (kbit/s) or 32 kbit/s. FIG. 1 shows the associated coding scheme, as set out in the aforementioned Recommendation.
As this figure shows, the G.722.1 coder is based on the modulated lapped transform (MLT) The frame length is 20 milliseconds (ms) and the frame contains N=320 samples.
The MLT transform, modulated transform with Malvar overlap, is a variant of the MDCT (modified discrete cosine transform).
FIG. 2 shows in outline the principle of MDCT.
The MDCT transform X(m) of a signal x(n) of length L=2N comprising samples of the current frame and the future frame is defined as follows, where m=0, . . . , N−1:
      X    ⁡          (      m      )        =            ∑              n        =        0                    L        -        1              ⁢                            2                N            ⁢              sin        ⁡                  (                                    π              L                        ⁢                          (                              n                +                0.5                            )                                )                    ⁢              cos        ⁡                  (                                    π              N                        ⁢                          (                              n                +                                  N                  /                  2                                +                0.5                            )                        ⁢                          (                              m                +                0.5                            )                                )                    ⁢              x        ⁡                  (          n          )                    
In the above formula, the sine term corresponds to the windowing shown in FIG. 2. The calculation of X(m) therefore corresponds to the projection of x(n) onto a local cosine base with sinusoidal windowing. Fast MDCT calculation algorithms exist (see for example the paper by P. Duhamel, Y. Mahieux, J. P. Petit, “A fast algorithm for the implementation of filter banks based on time domain aliasing cancellation”, ICASSP, vol. 3, pp. 2209-2212, 1991).
To calculate the spectral envelope of the transform, the values X(0), . . . , X(N−1) derived by MDCT are grouped into 16 sub-bands of 20 coefficients. Only the first 14 sub-bands (14×20=280 coefficients) are quantized and coded, corresponding to the frequency band 0-7000 Hz, the 7000-8000 band (40 coefficients) being ignored.
The value of the spectral envelope for the jth sub-band is defined in the logarithmic domain as follows, where j=0, . . . , 13, the term E serving to avoid log2(0):
      log_rms    ⁢          (      j      )        =            1      2        ⁢                  log        2            ⁡              (                                                            1                20                            ⁢                                                ∑                                      n                    =                    0                                    19                                ⁢                                                      X                    2                                    ⁡                                      (                                                                  20                        ⁢                                                                                                  ⁢                        j                                            +                      n                                        )                                                                                +          ɛ                )            
This envelope therefore corresponds to the root mean square value per sub-band.
The spectral envelope is then quantized in the following manner:                The set of valueslog—rms={log—rms(0)log—rms(1) . . . log—rms(13)}is first rounded to:rms_index={rms_index(0)rms_index(1) . . . rms_index(13)}where the indices rms_index(j) are rounded to the integer closest to log_rms(j)×0.5 for j=0, . . . , 13.        
The quantization step is therefore 20×log10 (20.5)=3.0103 . . . dB. The values obtained are bounded:3≦rms_index(0)≦33(dynamic range 31×3.01=93.31 dB) for j=0; and−6≦rms_index(j)≦33(dynamic range 40×3.01=120.4 dB) for j=1, . . . , 13.
The rms_index values for the last 13 bands are then transformed into differential indices by calculating the difference between the rms values of the spectral envelope of one sub-band and the preceding sub-band:diff—rms_index(j)=rms_index(j)−rms_index(j−1) for j=1, . . . , 13
These differential indices are also bounded:−12.≦diff—rms_index(j)≦11; for j=1, . . . , 13
Below the expression “range of quantization indices” refers to the range of indices that can be represented by binary coding. In the G.722.1 coder, the range of differential indices is limited to the range [−11, 12]. Thus the range of the G.722.1 coder is said to be “sufficient” for coding the differences between rms_index(j) and rms_index(j−1) if−12≦rms_index(j)−rms_index(j−1)≦11
Otherwise, the range of the G.722.1 coder is said to be “insufficient”. Thus spectral envelope coding reaches saturation as soon as the rms difference between two sub-bands exceeds 12×3.01=36.12 decibels (dB).
The quantization index rms_index(0) is transmitted in the G.722.1 coder on 5 bits. The differential quantizing indices diff_rms_index(j) (j=1, . . . , 13) are coded by Huffman coding, each variable having its own Huffman table. This coding is therefore entropic coding of variable length, the principle of which is to assign a code that is short in terms of bits to the most probable differential index values, the least probable differential quantization index values having a longer code. This type of coding is very efficient in terms of mean bit rate, bearing in mind that the total number of bits used to code the spectral envelope in G.722.1 is around 50 bits on average. However, as becomes clear below, the worst case scenario is out of control.
The FIG. 3 table gives for each sub-band the length of the shortest code (Min), and thus that of the most probable value (best case), and that of the longest code (Max), and thus that of the least probable value (worst case). Note that in this table the first sub-band (j=0) has a fixed length of 5 bits, in contrast to the subsequent sub-bands.
With these code length values, it is seen that in the best case encoding the spectral envelope requires 39 bits (1.95 kbit/s) and that the theoretical worst case is 190 bits (9.5 kbit/s).
In the G.722.1 coder, the bits remaining after coding the quantization indices of the spectral envelope are then distributed to code the MDCT coefficients normalized by the quantized envelope. Assignment of bits in the sub-bands is effected by a categorization process that is not related to the present invention and is not described in detail here. The remainder of the G.722.1 process is not described in detail for the same reason.
Coding the MDCT spectral envelope in the G.722.1 coder has a number of drawbacks.
As indicated above, variable length coding can lead to using a very large number of bits for coding the spectral envelope in the worst case. Also, it is also pointed out above that the risk of saturation for some signals of high spectral disparity, for example isolated sinusoids, differential coding does not work because the range ±36.12 dB cannot represent all of the dynamic range of the differences between the rms values.