In the compression field, the coders use the properties of the signal such as its harmonic structure, used by long-term prediction filters, and its local stationarity, used by short-term prediction filters. Typically, the speech signal can be considered as a signal that is stationary, for example, over time slots of 10 to 20 ms. It is therefore possible to analyze this signal in blocks of samples called frames, after appropriate windowing. The short-term correlations can be modeled by linear filters varying in time whose coefficients are obtained using a linear-predictive analysis on frames of short duration (from 10 to 20 ms in the example cited above).
Linear predictive coding is one of the most commonly used digital coding techniques. It consists in performing an LPC analysis of the signal to be coded to determine an LPC filter, then in quantizing this filter on the one hand, and in modeling and coding the excitation signal on the other hand. This LPC analysis is performed by minimizing the prediction error on the signal to be modeled or a modified version of this signal. The autoregressive linear prediction model of order P consists in determining a signal sample at an instant n by a linear combination of the P past samples (principle of prediction). The short-term prediction filter, denoted A (z), models the spectral envelope of the signal:
      A    ⁡          (      z      )        =            ∑              l        =        0            P        ⁢                  -                  a          l                    ×              z                  -          l                    
The difference between the signal at the instant n, denoted S(n), and its predicted value {tilde over (S)}(n) constitutes the prediction error:
      ⅇ    ⁡          (      n      )        =                    S        ⁡                  (          n          )                    -                        S          ~                ⁡                  (          n          )                      =                  S        ⁡                  (          n          )                    +                        ∑                      l            =            1                    P                ⁢                              a            l                    ⁢                      S            ⁡                          (                              n                -                i                            )                                          
The prediction coefficients are calculated by minimizing the energy E of the prediction error given by:
  E  =                    ∑        n            ⁢                        ⅇ          ⁡                      (            n            )                          2              =                  ∑        n            ⁢                        (                                    S              ⁡                              (                n                )                                      +                                          ∑                                  l                  =                  1                                P                            ⁢                                                a                  l                                ⁢                                  S                  ⁡                                      (                                          n                      -                      i                                        )                                                                                )                2            
The resolution of this system is well known, in particular by the Levinson-Durbin algorithm or the Schur algorithm.
The coefficients ai of the filter must be transmitted to the receiver. However, these coefficients do not have good quantization properties, so transformations are preferably used. Among the most common are:                the PARCOR coefficients (standing for “PARtial CORrelation” consisting of reflection coefficients or partial correlation coefficients),        the log area ratios LAR of the PARCOR coefficients,        the line spectral pairs LSP.        
The LSP coefficients are now the ones used most commonly to represent the LPC filter because they are suitable for vector quantization. There are other equivalent representations of the LSP coefficients:                LSF (Line Spectral Frequency) coefficients,        ISP (Immittance Spectral Pair) coefficients,        or even ISF (Immittance Spectral Frequency) coefficients.        
Linear prediction uses the local quasi-stationarity of the signal. However, this local stationarity hypothesis is not always borne out. In particular, if the updating of the LPC coefficients is not done often enough, the quality of the LPC analysis is degraded. Increasing the frequency with which the LPC parameters are calculated obviously improves the quality of the LPC analysis by keeping better track of the spectral variations of the signal. However, this situation leads to an increase in the number of filters to be transmitted and therefore an increase in bit rate.
Furthermore, calculating the LPC parameters too frequently also raises a problem of complexity because determining the LPC parameters is costly in calculation complexity. Normally, it entails:                windowing the signal,        calculating the autocorrelation function of the signal on (P+1) values (P being the prediction order),        determining from the autocorrelations the coefficients ai, for example using the Levinson-Durbin algorithm,        transforming them into a set of parameters having better quantization and interpolation properties,        quantizing and interpolating these transformed parameters,        and performing the reverse transformation.        
For example, in the 8 kbit/s coder standardized by ITU-T G.729, a 10th order LPC analysis is performed every 10 ms (in blocks of 80 samples) and the module for extracting the LPC parameters constitutes almost 15% of the complexity of the 8 kbit/s G.729 coder. If a single analysis is performed for each 10 ms block, the G.729 coder uses an interpolation of the transformed LPC parameters to obtain LPC parameters every 5 ms.
In the ITU-T G.723.1 standardized coder, four 10th order LPC analyses are performed for each 30 ms frame, or one LPC analysis every 7.5 ms (in blocks called subframes of 60 samples), which represents 10% of the complexity of the coder. Nevertheless, to reduce the bit rate, only the parameters of the last subframe are quantized. For the first three subframes, an interpolation of the quantized parameters transmitted is used.
The complexity of the LPC analysis is critical when several codings need to be performed by one and the same processing unit such as a gateway responsible for managing numerous communications in parallel or a server distributing numerous multimedia contents. The complexity problem is further aggravated by the multiplicity of the compression formats of the signals circulating over the networks.
It will therefore be understood that a first problem arises relating to a bit rate/quality/complexity trade-off for the LPC analysis.
To offer mobility and continuity, modern and innovative multimedia communication services need to be able to operate in a wide variety of conditions. The dynamism of the multimedia communication sector and the multivendor nature of the networks, accesses and terminals have led to a proliferation of compression formats requiring, because of their presence in the communication chains, multiple codings either cascaded (code conversion) or in parallel (multiple-format coding or multimode coding).
Code conversion is necessary when, in a transmission chain, a compressed signal frame transmitted by a coder can no longer continue on its path in this format. The code conversion is used to convert this frame to another format compatible with the continuation of the transmission chain. The most basic solution (and the one most commonly used at the present time) is to place a decoder and a coder end to end. The compressed frame arrives in a first format. It is then decompressed. The decompressed signal is then recompressed in a second format accepted by the continuation of the communication chain. This cascade arrangement of a decoder and a coder is called a tandem. Such a solution is very costly in terms of complexity (mainly because of the recoding) and it degrades the quality because the second coding is done on a decoded signal which is a degraded version of the original signal. Moreover, a frame can encounter several tandems before arriving at its destination, bringing about a calculation cost and a loss of quality that are both significant. Furthermore, the delays introduced by each tandem operation are accumulated and can adversely affect the interactivity of the communications.
The complexity also poses a problem in the context of a multiple-format compression system where one and the same content is compressed in several formats. Such is typically the case with content servers that broadcast one and the same content in several formats suited to the access and network conditions and terminals of the various customers. This multiple-coding operation becomes extremely complex as the number of formats required increases, such that the resources of the system rapidly appear limited.
Another case of parallel multiple coding is multimode compression with a posteriori decision which is described as follows. On each signal segment to be coded, several compression modes are performed and the one that optimizes a given criterion or obtains the best bit rate/distortion trade-off is selected. Once again, the complexity of each of the compression modes limits their number and/or leads to the preselection of a very limited number of modes.
Thus, a second problem arises relating to the multiplicity of possible compression formats.
A few attempts from the prior art to resolve these problems are explained below.
Currently, most of these multiple-coding operations take no account of the interactions between the formats on the one hand, and between the format and its content on the other hand. However, some recent so-called “intelligent” code conversion techniques no longer limit themselves to decoding then recoding, but also use the similarities between coding formats and thus make it possible to reduce the complexity and the algorithmic delay while limiting the degradation. Similarly, it has been proposed to exploit the similarities between coding formats to reduce the complexity of the multiple parallel coding operations. For one and the same coding format parameter, the differences between coders lie in the modeling, the method and/or the frequency of calculation or even the quantization. Optimizing the parallel multiple coding of two LPC modelings has been given little study.
Typically, if a parameter is calculated and quantized in the same way by two coding formats respectively denoted A and B, the code conversion of the parameter is done at bit level by copying its bit field from the bitstream of the format A into the bitstream of the format B. If the parameter is calculated in the same way but quantized differently, it is normally essential to requantize it with the method used by the coding format B. Similarly, if the formats A and B do not calculate this parameter at the same frequency (for example, if their frame or subframe lengths are different), this parameter must be interpolated. It is possible to perform this step on the above-mentioned parameter only, without having to work back to the complete signal. The code conversion is then performed only at the parameter level. Moreover, the LSP coefficients are normally code-converted at this “parameter” level.
In the methods of the prior art, to obtain the LPC parameters of a second coding format from the parameters of a first coding format, it is normal to interpolate the LPC parameters of consecutive frames (or subframes) of the first format corresponding to the current frame (or subframe) of the second format. For example, a first method involves calculating the coefficients modeling the LPC filter of the second format for a frame, by interpolating the coefficients of the LPC filters of the second format roughly corresponding to this frame:pB(m)=αpA(n−1)+βpA(n)where pB(m) is the coefficients vector of the second model for its frame (m), pA(n) is the coefficients vector of the first model for its frame n, and α and β are interpolation factors. Normally, β is equal to (1−α).
For example, in the case of the code conversion between the coders TIA-IS127 EVRC and 3GPP NB-AMR, as described in:
“A novel Transcoding Algorithm for AMR and EVRC speech codecs via direct parameter Transformation”, Seongho Seo et al., in Proc. ICASSP 2003, pp. 177-180, vol. II, the LSP coefficients at the frame m of the EVRC coder (pEVRC(m)) are calculated by linearly interpolating the quantized LSP coefficients of the frames m and (m−1) of the AMR coder (pAMR(m) and pAMR(m−1)), the interpolation factor (α=0.84) being empirically chosen:pEVRC(m)=0.84pAMR(m)+0.16pAMR(m−1)
Conversely, the LSP coefficients at the frame m of the AMR coder are calculated by linearly interpolating the quantized LSP coefficients of the frames m and (m−1) of the EVRC coder (with α=0.96):pAMR(m)=0.96pEVRC(m)+0.04pEVRC(m−1)
Here it has been proposed to also optimize the determination of the interpolation factors by a statistical study to take account of the differences in the characteristics of the two LPC analyses (analysis type, length and positioning of the analysis window, extension of the bandwidth applied to the autocorrelation coefficients, and so on).
This simpler case is often used when the two coding formats perform the LPC analysis at the same frequency. In the above example, the two coders perform an LPC analysis once every 20 ms frame. When the two coding formats do not perform the LPC analysis at the same frequency, it is routine to consider larger blocks of a duration that is a multiple common to the respective update times of the LPC parameters of the two formats. The choice of the two frames of the first format used for the interpolation, and the interpolation factors, then depend on the rank of a frame of the second format in this group of frames.
Thus, in the case of the code conversion from the ITU-T G.723.1 coder (30 ms frame) to the EVRC coder (20 ms frame), two G.723.1 frames correspond to three EVRC frames. This code conversion is described in particular in:
“An efficient transcoding algorithm for G723.1 and EVRC speech coders”, Kyung Tae Kim et al., in Proc. IEEE VTS 2001, pp. 1561-1564.
The choices of the two G.723.1 frames used for the interpolation, and the interpolation factors, depend on the rank of an EVRC frame in this group of three frames:pEVRC(3m)=0.5417pG.723.1(2m−1)+0.4583pG.723.1(2m+1)pEVRC(3m+1)=0.8750pG.723.1(2m)+0.1250pG.723.1(2m+1)pEVRC(3m+2)=0.2083pG.723.1(2m)+0.7917pG.723.1(2m+1)
Thus, in these LPC parameter code conversion techniques of the prior art, the set of interpolation factors is set according to the time position of the frame of the second format in its group of frames. Even the more complex code conversion methods, which involve more than two filters of the first format or even past filters of the second format, using a fixed set of interpolation factors.
This “fixed” interpolation leads to a wrong estimation of the filter of the second format in particular in the non-stationary areas. To remedy this, the present invention proposes to use an adaptive (or dynamic) interpolation.