Of the coding techniques used, that designated by LPC analysis, Linear Predictive Coding in English, consists of carrying out a linear prediction of the audio-frequency signal to be encoded, the coding being carried out temporarily by means of a linear filtering prediction applied to the successive blocks of this signal.
Of the aforementioned techniques, that known as CELP coding, Code Excited Linear Prediction, is the most widespread and provides some of the best performance. Other techniques, such as the technique designated by MP-LPC, Multi Pulse Linear Predictive Coding, or the VSELP technique, Vector Sum Excited Linear Prediction in English, are relatively similar to CELP coding.
The aforementioned coding techniques are known as "analysis by synthesis". They have enabled in particular, for audio-frequency signals belonging to the telephonic frequency bandwidth, the transmission output of these signals to be reduced from 64 kb/s (MIC coding) to 16 kb/s with the help of the CELP coding technique and even to 8 kb/s where these encoders use the most recent developments of this coding technique, without any perceptible reduction in the quality of the voice reconstituted after transmission and decoding.
A particularly important area of application for these coding techniques is, in particular, that of mobile telephony. Within this area of application, the necessary limitation of the frequency bandwidth granted to each mobile-telephony operator and the extremely rapid increase in the number of subscribers makes necessary the corresponding reduction of the coding output, while user demands in terms of speech quality continue to grow. Other areas of application of these coding techniques concern, for example, the storage of digital data which represent these signals on memory supports, high-quality telephony for video or audio conference applications, multimedia or digital transmissions via satellite.
The linear prediction filters used in the aforementioned techniques are obtained with the help of an analysis module called "LPC analysis" operating on successive digital signal blocks. These filters are capable, according to the order of analysis, that is, according to the number of filter coefficients, of modeling more or less reliably the contours of the spectrum of frequencies of the signal to be coded. In the case of a speech signal, these contours are called formants.
However, for good quality coding, required by most current applications, the filter thus defined is not sufficient for perfectly modeling the signal. It is therefore essential to code the residue of the linear prediction. One such operating mode relating to linear prediction residue is particularly used by the coding technique, LD-CELP, Low Delay CELP in English, previously mentioned in the description. In this case, the residual signal is modeled by a waveform taken from a stochastic codepage and multiplied by a gain value. The MP-LPC coding technique, for example, models this residue with the help of variable position pulses modified by respective gain values, whereas the VSELP coding technique carries out this modeling by means of a linear combination of pulse vectors taken from appropriate lists.
An explanatory recap of the operating method of LPC analysis and especially "backward" LPC analysis and "forward" LPC analysis will be given below.
The general envelope of the frequency spectrum is modeled by means of a short-term synthesis filter, constituting the LPC filter, the coefficients of which are modeled by means of a linear prediction of the speech signal to be coded. This LPC filter, an autoregressive filter, has a transfer function of the form, equation (1): ##EQU1##
where p designates the name of coefficients, ai of the filter and the order of the linear prediction applied, z designating the transformed variable z of the space of the frequencies.
One method of evaluating the coefficients a.sub.i consists of applying a criterion of minimization of the energy of the error prediction signal of the speech signal over the analysis length of this latter.
The analysis length for a digital speech signal formed of successive samples is, in practical terms, a number N of these samples, constituting a coding frame. The energy of the error prediction signal thus confirms equation (2): ##EQU2##
where s(n) designates the sample of row n in the frame of N samples.
In a block-by-block coding process, the coding frame can be advantageously divided into several subframes or adjacent LPC blocks. The analysis length N then exceeds the length of each block in order to make it possible to take into account a certain number of past or, if applicable, future samples, by means of and at the cost of delaying the appropriate coding.
The analysis is called "forward" LPC when the LPC analysis process is carried out on the block of the current frame of the speech signal to be coded, with the coding taking place at encoder level "in real time", that is, during the block of the current frame, with the only processing delay introduced by the calculation of the filter coefficients. This analysis involves transmitting the calculated values of the filter coefficients to the decoder.
"Backward" LPC analysis, used in the LD-CELP encoder at 16 kb/s is the object of the standard UIT-T G728. This analysis technique consists of carrying out the LPC analysis not on the block of the current frame of the speech signal to be coded, but on the synthesis signal. It is understood that this LPC analysis is actually performed on the synthesis signal of the block preceding the current block, as this signal is available simultaneously at encoder and decoder level. This simultaneous operation in the encoder and decoder thus makes it possible to avoid transmitting from the encoder to the decoder the value obtained in the encoder of the LPC filter coefficients. For this reason, "backward" LPC analysis makes it possible to free up transmission output and the output thus freed can be used, for example to enrich the excitation codepages in the case of CELP coding. "Backward" LPC analysis furthermore allows an increase in the order of analysis; the number of LPC filter coefficients may be as much as 50 in the case of an LD-CELP encoder, compared to 10 coefficients for most encoders using "forward" LPC analysis.
Thus, correct operation of "backward" LPC analysis requires the following conditions:
good quality synthesis signal, very close to the speech signal to be coded, which involves a sufficiently high coding output, higher than 13 kb/s, taking into account the quality of current CELP encoders; PA1 reduced frame and block length due to the delay of one block between the analyzed signal and the signal to be coded. The length of the frame and block should therefore be low in comparison to the mean stationary time of the speech signal to be coded; PA1 reliability of the transmission and conservation of the integrity of the data transmitted between the encoder and the decoder, by introducing few transmission errors. As soon as the synthesis signals differ significantly from the speech signal to be coded, the encoder and decoder cease to calculate the same filter and large divergences may occur, without being able to return to a noticeable similarity of the filters calculated in the encoder or decoder. PA1 "forward" LPC analysis for the coding of the transitions and the non-stationary areas; PA1 "backward" LPC analysis, to a greater extent, for the coding of the stationary areas. PA1 a first criterion based on the prediction gains of the filters; PA1 a second criterion based on a distance parameter between the "forward" LPC filters calculated successively. PA1 for certain signals, the prediction gain values of the "forward" and "backward" LPC filters may oscillate above and below the first threshold value. This phenomenon leads to sudden and frequent changes from "backward" LPC filter to "forward" LPC filter or vice versa. The discontinuity of filtering thus introduced constitutes a source of considerable degradation of the synthesis signal and is not, most of the time, linked to the real spectral modifications of the speech or audio-frequency signal to be coded; PA1 the optimal value of the first threshold which should be established varies considerably according to whether the signal to be coded is stationary, more so when the coding output is low. For a coding delay corresponding to an LPC frame of 10 to 30 ms, or when the transmission output falls, there is a clear divergence between the coding mode of musical signals and speech signals; "forward" LPC analysis is mainly used. PA1 The LPC filter which gives the best subjective quality and which therefore best models the spectrum of the signal to be coded is not always that which has the best prediction gain. Certain switchings from one mode of LPC analysis to another, linked to an instantaneous decision, are therefore useless. PA1 determining the degree of stationarity of the digital audio-frequency signal according to a stationarity parameter whose value is between a maximum stationarity value and a minimum stationarity value; PA1 establishing, based on the stationarity parameter, an analysis choice value, based on a decision function; PA1 applying the analysis choice value to the LPC filtering in order to code the digital audio-frequency signal by means of "forward" LPC filtering on the non-stationary areas of the digital audio-frequency and by means of "backward" LPC filtering on the stationary areas of the synthesis signal.
Due to the respective advantages and disadvantages of the aforementioned "backward" and "forward" types of LPC analysis, one technique consisting of selectively associating "backward" and "forward" LPC analysis was proposed in the article titled "Dual Rate Low Delay CELP Coding (8 kbits/s/16 kbits/s) using a Mixed Backward/Forward Adaptive LPC Prediction", published by S. PROUST, C. LAMBLIN and D. MASSALOUX, Proc. IEEE Workshop Speech Co. Telecomm., September. 1995, pp 37-38.
The conditions mentioned above, regarding the correct functioning of "backward" LPC analysis, show that this type of analysis alone presents the limitations mentioned when operating at transmission outputs appreciably below 16 kb/s. Besides the reduction in the quality of the synthesis signal, which reduces the performance of the LPC filter, it is very often necessary, in order to reduce the transmission output, to operate with a greater LPC frame length, of the order of 10 to 30 ms. It can therefore be seen that, under these conditions, the degradation occurs especially during transitions of the frequency spectrum and, more generally, in the not so stationary areas, since for generally very stationary signals, such as music signals, "backward" LPC analysis holds a considerable advantage over "forward" LPC analysis.
The association of the two aforementioned types of LPC analysis aims to reduce these disadvantages and increase the advantages inherent in each one:
Furthermore, the introduction of LPC frames coded by "forward" LPC analysis into LPC frames coded by "backward" analysis allows the encoder and decoder to re-converge towards the same synthesis signal in the case of a transmission error and therefore offers far greater error protection than coding by "backward" LPC analysis alone.
In general, the above-mentioned mixed "forward"-"backward" LPC analysis consists of carrying out two LPC analyses, a "forward" LPC analysis of the speech signal or audio frequency to be coded and a "backward" LPC analysis of the synthesis signal.
Two filters are calculated for each LPC block, these filters being designated by "forward" LPC filter and "backward" LPC filter, respectively. A procedure of choosing the filter applied to the LPC block, depending on whether the signal is stationary, is therefore applied. This procedure requires two different criteria:
For each of these two criteria, the threshold values are established.
First Criterion:
The choice of "backward" LPC filter is made if the distance between the prediction gain of the "backward" and "forward" LPC filters is greater than a first threshold value.
Second Criterion:
For a current analysis in "backward" LPC analysis mode, prohibition of switching from "backward" LPC analysis mode to "forward" LPC analysis mode if the distance calculated on the vectors of the parameters representing two consecutive "forward" LPC filters is lower than a second threshold value, a distance which is too small characterizing a more or less stationary area, for which reason it is appropriate to avoid changing the LPC analysis mode. The calculated distance is a Euclidean distance between the spectral lines of the speech or audio-frequency signal to be coded.
A more detailed description of the aforementioned mixed LPC analysis method can be found in the article published by S. PROUST, C. LAMBLIN and D. MASSALOUX, mentioned above.
In-depth studies on the above-mentioned mixed analysis operating method have shown the following important disadvantages:
Since music signals are quite stationary, "backward" LPC analysis is used even for long LPC frames. In the case of speech signals, however, the highly stationary areas have a very short duration and their passage in "backward" LPC analysis mode is therefore brief, thus leading to unwanted filter transitions which reduce the quality of the coding. The encoder can thus no longer correct the phenomena generated by the discontinuity introduced by the switching of the filters.