The present invention concerns the field of digital coding of audio signals. It relates more particularly to a decoding method used to reconstitute an audio signal coded using a method employing a xe2x80x9cbackward LPCxe2x80x9d synthesis filter.
Predictive block coding systems analyses successive frames of samples of the audio signal (generally speech or music) to be coded to extract a number of parameters for each frame. Those parameters are quantised to form a bit stream sent over a transmission channel.
Depending on the quality of the channel and the type of transport, the signal transmitted can be subject to interference causing errors in the bit stream received by the decoder. These errors in the bit stream can be isolated. However, they very frequently occur in bursts, especially in mobile radio channels with a high level of interference and in packet mode transmission networks. In this case, an entire packet of bits corresponding to one or more signal frames is erroneous or is not received.
The transmission system employed can frequently detect erroneous or missing frames at the level of the decoder. So-called xe2x80x9cmissing frame recoveryxe2x80x9d procedures are then used. These procedures enable the decoder to extrapolate the missing signal samples from samples recovered in frames preceding and possibly following the areas in which frames are missing.
The present invention aims to improve techniques for recovering missing frames in a manner that strongly limits subjective degradation of the signal perceived at the decoder in the presence of missing frames. It is of more particular benefit in the case of predictive coders using a technique generally known as xe2x80x9cbackward LPC analysisxe2x80x9d continuously or intermittently. The abbreviation xe2x80x9cLPCxe2x80x9d signifies xe2x80x9clinear predictive codingxe2x80x9d and xe2x80x9cbackwardxe2x80x9d indicates that the analysis is performed on signals preceding the current frame. This technique is particularly sensitive to transmission errors in general and to missing frames in particular.
The most widely used linear prediction coding systems are CELP (Code-Excited Linear Predictive) coders. Backward LPC analysis in a CELP coder was used for the first time in the LD-CELP coder adopted by the ITV-T (see ITV-T Recommendation G.728). This coder can reduce the bit rate from 64 kbit/s to 16 kbit/s without degrading the perceived subjective quality.
Backward LPC analysis consists in performing the LPC analysis on the synthesised signal instead of on the current frame of the original audio signal. In reality, the analysis is performed on samples of the synthesised signal from frames preceding the current frame because that signal is available both at the coder (by virtue of local decoding that is generally useful in analysis-by-synthesis coders) and at the remote decoder. Because the analysis is performed at the coder and at the decoder, the LPC coefficients obtained do not have to be transmitted.
Compared to the more conventional xe2x80x9cforwardxe2x80x9d LPC analysis, in which the linear prediction is applied to the signal at the input of the coder, backward LPC analysis provides a higher bit rate, which can be used to enrich the excitation dictionaries in the case of the CELP, for example. Also, and without increasing the bit rate, it significantly increases the order of analysis, the LPC synthesis filter typically having 50 coefficients for the LD-CELP coder as compared to 10 coefficients for most coders using forward LPC analysis.
Because of the higher order of the LPC filter, backward LPC analysis provides better modelling of musical signals, the spectrum of which is significantly richer than that of speech signals. Another reason why this technique is well suited to coding music signals is that music signals generally having a more stationary spectrum than speech signals, which improves the performance of backward LPC analysis. On the other hand, correct functioning of backward LPC analysis requires:
(i) A good quality synthesised signal, which must be very close to the original signal. This imposes a relatively high coding bit rate. Given the quality of current CELP coders, 13 kbit/s would seem to be the lower limit.
(ii) A short frame or a sufficiently stationary signal. There is a delay of one frame between the analysed signal and the signal to be coded. The frame length must therefore be short compared to the average time for which the signal is stationary.
(iii) Few transmission errors between the coder and the decoder. As soon as the synthesised signals are different, the coder and the decoder no longer calculate the same filter. Large divergences can then arise and be amplified, even in the absence of any new interference.
The sensitivity of backward LPC analysis coders/decoders to transmission errors is due mainly to the following recursive phenomenon: the difference between the synthesised signal generated at the coder (local decoder) and the synthesised signal reconstructed at the decoder by a missing frame recovery device causes a difference between the backward LPC filter calculated at the decoder for the next frame and that calculated at the coder, because these filters are calculated on the basis of the different signals. Those filters are used in turn to generate the synthesised signals of the next frame, which will therefore be different at the coder and at the decoder. The phenomenon can therefore propagate, increase in magnitude and cause the coder and decoder to diverge greatly and irreversibly. As backward LPC filters are generally of a high order (30 to 50 coefficients), they make a large contribution to the spectrum of the synthesised signal (high prediction gains).
Many coding algorithms use missing frame recovery techniques. The decoder is informed of a missing frame by one means or another (in mobile radio systems, for example, by receiving frame loss information from the channel decoder which detects transmission errors and can correct some of them). The objective of missing frame recovery devices is to extrapolate the samples of the missing frame from one or more of the most recent preceding frames which are deemed to be valid. Some systems extrapolate these samples using waveform substitution techniques which take samples directly from past decoded signals (see D. J. Goodman et al. : xe2x80x9cWaveform Substitution Techniques for Recovering Missing Speech Segments in Packet Voice Communicationsxe2x80x9d, IEEE Trans. On ASSP, Vol. ASSP-34, No.6, December 1986). In the case of predictive coders, of the CELP type, for example, the samples of missing frames are replaced using the synthesis model used to synthesise the valid frames. The missing frame recovery procedure must then supply the parameters needed for the synthesis which are not available for the missing frames (see, for example, ITV-T Recommendations G.723.1 and G.729). Some parameters manipulated or coded by predictive coders exhibit high correlation between frames. This applies in particular to LPC parameters and to long-term prediction parameters (LTP delay and associated gain) for voiced sounds. Because of this correlation, it is more advantageous to use the parameters of the last valid frame again to synthesise the missing frame rather than to use erroneous or random parameters.
For the CELP coding algorithm, the parameters of the missing frame are conventionally obtained in the following manner:
the LPC filter is obtained from the LPC parameters of the last valid frame, either by merely copying the parameters or introducing some damping;
voiced/non-voiced detection determines the harmonic content of the signal at the level of the missing frame (cf. ITV-T Recommendation G.723.1);
in the non-voiced situation, an excitation signal is generated in a partly random manner, for example by drawing a code word at random and using the past excitation gain slightly damped (cf. ITV-T Recommendation G.729), or random selection in the past excitation (cf. ITV-T Recommendation G.728);
in the case of a voiced signal, the LTP delay is generally that calculated in the preceding frame, possibly with slight xe2x80x9cjitterxe2x80x9d to prevent an excessively prolonged resonant sound, and the LTP gain is made equal to 1 or very close to 1. The excitation signal is generally limited to the long-term prediction based on the past excitation.
In the case of a coding system using forward LPC analysis, the parameters of the LPC filter are extrapolated in a simple manner from parameters of the preceding frame: the LPC filter used for the first missing frame is generally the filter of the preceding frame, possibly damped (i.e. with the contours of the spectrum slightly flattened and the prediction gain reduced). This damping can be obtained by applying a spectral expansion coefficient to the coefficients of the filter or, if those coefficients are represented by LSP (line spectrum pairs), by imposing a minimum separation of the line spectrum pairs (cf. ITV-T Recommendation G.723.1).
The spectral expansion technique is proposed in the case of the coder of ITV-T Recommendation G.728, which uses backward LPC analysis: for the first missing frame, a set of LPC parameters is first calculated on the basis of the past (valid) synthesised signal. An expansion factor of 0.97 is applied to this filter, and this factor is iteratively multiplied by 0.97 for each new missing frame. Note that this technique is employed only if the frame is missing. On the first following frame that is not missing, the LPC parameters used by the decoder are those calculated normally, i.e. on the basis of the synthesised signal.
In the case of forward LPC analysis, there is no error memory phenomenon where the LPC filters are concerned, except on quantising the LPC filters used in a prediction (in which case mechanisms are provided for re-synchronising a predictor at the end of a particular number of valid frames, using leakage factors in the prediction, or an MA type prediction).
In the case of backward analysis, the error is propagated by way of the erroneous synthesised signal which is used at the decoder to generate the LPC filters of valid frames following the missing section. Improving the synthesised signal produced for the missing frame (extrapolation of the excitation signal and the gains) is therefore one way to guarantee that the subsequent LPC filters (calculated on the basis of the preceding synthesised signal) will be closer to those calculated at the coder.
The conditions (i) through (iii) mentioned above show that the limitations of pure backward analysis quickly become apparent for bit rates significantly less than 165 kbit/s. Apart from the reduced quality of the synthesised signal, which degrades the performance of the LPC filter, it is often necessary to accept a greater frame length (from 10 to 30 ms) in order to reduce the bit rate. Note that degradation then occurs primarily at spectrum transitions and more generally in areas which are not particularly stationary. In stationary areas, and for signals that are very stationary overall, such as music, backward LPC analysis has a very clear advantage over forward LPC analysis.
To retain the advantages of backward analysis, in particular good performance in coding musical signals, combined with reducing the bit rate, hybrid xe2x80x9cforward/backwardxe2x80x9d LPC analysis coding systems have been developed (see S. Proust et al.: xe2x80x9cDual Rate Low Delay CELP Coding (8 kbits/s 16 kbits/s) using a Mixed Backward/Forward Adaptive LPC Predictionxe2x80x9d, Proc. Of the IEEE Workshop on Speech Coding for Telecommunications, September 1995, pages 37-38; and French Patent Application No. 97 04684corresponding to co-pending U.S patent application Ser. No. 09/202,753.
Combining both types of LPC analysis obtains the benefit of the advantages of both techniques: forward LPC analysis is used to code transitions and non-stationary areas and backward LPC analysis, of a higher order, is used to code stationary areas.
Introducing forward coded frames into the backward coded frames also enables the coder and the decoder to converge in the event of transmission errors, and therefore offers much greater robustness to such errors than pure backward coding. However, by far the greatest proportion of stationary signals are coded in the backward mode, for which the problem of transmission errors remains critical.
These hybrid forward/backward systems are intended for multimedia applications on networks with limited or shared resources, for example, or for enhanced quality mobile radio communications. In this type of application, the loss of packets of bits is highly probable, which represents an a priori penalty on techniques sensitive to missing frames, such as backward LPC analysis. By strongly reducing the effect of missing frames in systems using backward LPC analysis or hybrid forward/backward LPC analysis, the present invention is particularly suited to this type of application.
There are also other types of audio coding system using both forward LPC analysis and backward LPC analysis. The synthesis filter can in particular be a combination (convolution of the impulse responses) of a forward LPC filter and a backward LPC filter (see EP-A-0 782 128). The coefficients of the forward LPC filter are then calculated by the coder and transmitted in quantised form. The coefficients of the backward LPC filter are determined conjointly at the coder and at the decoder, using a backward LPC analysis process performed as explained above after submitting the synthesised signal to a filter that is the inverse of the forward LPC filter.
The aim of the present invention is to improve the subjective quality of the speech signal produced by the decoder, in predictive block coding systems using backward LPC analysis or hybrid forward/backward LPC analysis, when one or more frames is missing because of poor quality of the transmission channel or because a packet is lost or not received in a packet transmission system.
The invention therefore proposes, in the case of a system continuously using backward LPC analysis, a method of decoding a bit stream representative of an audio signal coded by successive frames, the bit stream being received with a flag indicating any missing frames,
wherein, for each frame, an excitation signal is formed from excitation parameters which are recovered in the bit stream if the frame is valid and estimated some other way if the frame is missing, and the excitation signal is filtered by means of a synthesis filter to obtain a decoded audio signal,
wherein a linear prediction analysis is performed on the basis of the decoded audio signal obtained up to the preceding frame to estimate at least in part a synthesis filter relating to the current frame, the successive synthesis filters used to filter the excitation signal as long as there is no missing frame conforming to the estimated synthesis filters,
and wherein, if a frame n0 is missing, at least one synthesis filter used to filter the excitation signal relative to a subsequent frame n0+i is determined by a weighted combination of the synthesis filter estimated in relation to frame n0+i and at least one synthesis filter that has been used since frame n0.
For a number of frames after one or more missing frames, the backward LPC filters estimated by the decoder on the basis of the past synthesised signal are not those it actually uses to reconstruct the synthesised signal. To synthesise the latter, the decoder uses an LPC filter depending on the backward filter as estimated by this method, and also filters used to synthesise one or more preceding frames, since the last filter calculated on the basis of a valid synthesised signal. This is obtained by means of the weighted combination applied to the LPC filters following the missing frame, which performs a smoothing operation and forces a stationary spectrum, to some degree. This combination can vary with the distance to the last valid frame transmitted. The effect of smoothing the trajectory of the LPC filters used for synthesis after a missing frame is to limit strongly phenomena of divergence and thereby improve significantly the subjective quality of the decoded signal.
The sensitivity of backward LPC analysis to transmission errors is mainly due to the phenomenon of divergence previously explained. The main source of degradation is the progressive divergence of the filters calculated at the remote decoder and the filters calculated at the local decoder, which divergence can cause catastrophic distortion in the synthesised signal. It is therefore important to minimise the difference (in terms of spectral distance) between the two calculated filters and to have the difference tend towards zero as the number of error-free frames following the missing frame(s) increases (re-convergence property of the coding system). Backward filters, which are generally of a high order, have a capital influence on the spectrum of the synthesised signal. The convergence of the filters, which the invention encourages, assures the convergence of the synthesised signals. This improves the subjective quality of the synthesised signal in the presence of missing frames.
If frame n0+1 following a missing frame n0is also missing, the synthesis filter used to filter the excitation signal relating to frame n0+1 is preferably determined from the synthesis filter used to filter the excitation signal relating to frame n0. These two filters can be identical. The second could equally be determined by applying a spectral expansion coefficient, as previously explained.
In a preferred embodiment of the invention, weighting coefficients used in said weighted combination depend on the number i of frames between frame n0+i and the last missing frame no so that the synthesis filter used progressively approaches the estimated synthesis filter.
In particular each synthesis filter used to filter the excitation signal relating to a frame n is represented by K parameters pk(n) (1xe2x89xa6kxe2x89xa6K) and the parameters pk(n0+i) of the synthesis filter used to filter the excitation signal relating to a frame n0+i, following ixe2x88x921 valid frames (ixe2x89xa71) preceded by a missing frame n0, are calculated from the equation:
Pk(n0+i)=[1xe2x88x92xcex1(i)]xc2x7Pk( n0+i)+xcex1(i)xc2x7Pk(n0)
where Pk(n0+i) is the kth parameter of the synthesis filter estimated in relation to frame n0+i and xcex1(i) is a positive or zero weighting coefficient decreasing with i from a value xcex1(1) =xcex1max at most equal to 1.
The decrease in the coefficient xcex1(i) provides, in the first valid frames following a missing frame, a synthesis filter which is relatively close to that used for frame n0, which has generally been determined under good conditions, and enables the memory of that filter to be progressively lost in frame n0 so as to move towards the filter estimated for frame n0+i.
The parameters Pk(n) can be the coefficients of the synthesis filter, i.e. its impulse response. The parameters Pk(n) can equally be other representations of those coefficients, such as those conventionally used in linear prediction coders: reflection coefficients, LAR (log-area-ratio), PARCOR (partial correlation), LSP (line spectrum pairs), etc.
The coefficient xcex1(i) for i greater than 1 can be calculated from the equation:
xcex1(i)=max{0,xcex1(ixe2x88x921)xe2x88x92xcex2}xe2x80x83xe2x80x83(2)
where xcex2 is a coefficient in the range from 0 to 1.
In a preferred embodiment of the invention, the weighting coefficients employed in the weighted combination depend on an estimate (Istat(n)) of the degree to which the spectrum of the audio signal is stationary so that, in the case of a weakly stationary signal, the synthesis filter used to filter the excitation signal relating to a frame n0+i following a missing frame n0 (ixe2x89xa71) is closer to the estimated synthesis filter than in the case of a highly stationary signal.
The slaving of the backward LPC filter, and the resulting stationary spectrum, are therefore adapted as a function of a measured real average stationary signal spectrum. The smoothing (and therefore the stationary spectrum) is greater if the signal is really very stationary and reduced in the contrary case. The successive backward filters vary very little in the event of a very stationary spectrum. The successive filters can therefore be highly slaved. This limits the risk of divergence and assures the required stationary spectrum.
The degree to which the spectrum of the audio signal is stationary can be estimated from information included in each valid frame of the bit stream. In some systems, there is the option to set aside bit rate for transmitting this type of information, enabling the decoder to determine how stationary the spectrum of the coded signal is.
As an alternative to this, the degree to which the spectrum of the audio signal is stationary can be estimated from a comparative analysis of the successive synthesis filters used by the decoder to filter the excitation signal. It can be measured by various methods of measuring the spectral distances between the successive backward LPC filters used by the decoder (for example the Itakura distance).
The degree to which the spectrum of the signal is stationary can be allowed for in calculating the parameters of the synthesis filter using equation (1) above. The weighting coefficient xcex1(i) for i greater than 1 is then an increasing function of the estimated degree to which the spectrum of the audio signal is stationary. The signal used by the decoder therefore approaches the estimated filter more slowly when the spectrum is highly stationary is high than when it is not very stationary.
In particular, when xcex1(i) is calculated from equation (2), the coefficient xcex2 can be a decreasing function of the estimated degree to which the spectrum of the audio signal is stationary.
As stated above, the method of the invention can be applied to systems using only backward LPC analysis, for which the synthesis filter has a transfer function of the form 1/AB(z), where AB(Z) is a polynomial in zxe2x88x921 whose coefficients are obtained by the decoder from the linear predictive analysis of the decoded audio signal.
It can also be applied to systems in which backward LPC analysis is combined with forward LPC analysis, with convolution of the impulse responses of the forward and backward LPC filters, in the manner described in EP-A-0 782 128. In this case, the synthesis filter has a transfer function of the form 1/[AF(Z)xc2x7AB(Z)], where AF(Z) and AB(z) are polynomials in zxe2x88x921, the coefficients of the polynomial AF(z) being obtained from parameters included in valid frames of the bit stream and the coefficients of the polynomial (AB(z) being obtained by the decoder from the linear prediction analysis applied to a signal obtained by filtering the decoded audio signal using a filter with the transfer function AF(Z).
In the context of a hybrid forward/backward LPC analysis coding system, the present invention proposes a method of decoding a bit stream representative of an audio signal coded by successive frames, the bit stream being received with a flag indicating any missing frames, each valid frame of the bit stream including an indication of which coding mode was applied to code the audio signal relating to the frame, which is either a first coding mode in which the frame contains spectral parameters or a second coding mode,
wherein, for each frame, an excitation signal is formed from excitation parameters which are recovered in the bit stream if the frame is valid and estimated some other way if the frame is missing, and the excitation signal is filtered by means of a synthesis filter to obtain a decoded audio signal,
the synthesis filter used to filter the excitation signal being constructed from said spectral parameters if the bit stream indicates the first coding mode,
wherein a linear prediction analysis is performed on the basis of the decoded audio signal obtained as far as the preceding frame to estimate at least in part a synthesis filter relating to the current frame and wherein, so long as no frame is missing and the bit stream indicates the second coding mode, the successive synthesis filters used to filter the excitation signal conform to the estimated synthesis filters,
and wherein, if a frame n0 is missing, the bit stream having indicated the second coding mode for the preceding valid frame and frame n0 being followed by a plurality of valid frames for which the bit stream indicates the second coding mode, at least one synthesis filter used to filter the excitation signal relative to a subsequent frame n0+i is determined by a weighted combination of the synthesis filter estimated in relation to frame n0+i and at least one synthesis filter that has been used since frame n0.
The above features cover the situation of missing frames in periods in which the coder is operating in the backward mode, in essentially the same manner as in systems using only backward coding.
The preferred embodiments described above for systems using only backward coding can be transposed directly to the situation of hybrid forward/backward systems.
It is interesting to note that the degree to which the spectrum of the audio signal is stationary, when used, can be estimated from information present in the bit stream to indicate the mode of coding the audio signal frame by frame.
The estimated degree to which the spectrum of the signal is stationary can in particular be deduced by counting down frames processed by the second coding mode and frames processed by the first coding mode belonging to a time window preceding the current frame and having a duration in the order of N frames, where N is a predefined integer.
In the event of missing frames when the coder is changing from the forward mode to the backward mode, if a frame n0 is missing, the bit stream having indicated the first coding mode (or the second coding mode) for the preceding valid frame, the frame n0 being followed by at least one valid frame for which the bit stream indicates the second coding mode, then the synthesis filter used to filter the excitation signal relating to the next frame n0+1 can be determined from the estimated synthesis filter relating to frame n0. The filter used to filter the excitation signal relating to the next frame n0+1 can in particular be taken as identical to the estimated synthesis filter relating to frame n0.