1. Field of the Invention
The present invention relates to techniques for concealing consecutive transmission errors in transmission systems using digital coding of any type on a speech and/or sound signal.
It is conventional to distinguish between two major categories of coder:                “time” coders which compress digitized signal samples on a sample-by-sample basis (as applies to pulse code modulation (PCM) and to adaptive differential PCM (ADPCM) [DAUMER] [MAITRE], for example); and        parametric coders which analyze successive frames of signal samples for coding in order to extract from each frame a certain number of parameters which are then coded and transmitted (as applies to vocoders [TREMAIN], IMBE coders [HARWICK], or transform coders [BRANDENBURG])        
There also exist intermediate categories which associate the coding of representative parameters as performed by parametric coders, with the coding of a residual time waveform. To simplify, such coders can be included within the category of parametric coders.
This category includes predictive coders and in particular the family of coders performing analysis by synthesis such as RPE-LTP ([HELLWIG]) or code excited linear prediction (CELP) ([ATAL]).
For all such coders, the coded values are subsequently transformed into a binary string which is transmitted over a transmission channel. Depending on the quality of the channel and on the type of transport, disturbances may affect the signal as transmitted and produce errors on the binary string received by the decoder. These errors may occur in isolated manner in the binary string, but very frequently they occur in bursts. It is then a packet of bits corresponding to an entire portion of the signal which is erroneous or not received. This type of problem is to be encountered for example in transmission on mobile telephone networks. It is also to be encountered in transmission over packet-switched networks, and in particular networks of the Internet type.
When the transmission system or the modules dealing with reception make it possible to detect that the data being received is highly erroneous (for example in mobile networks), or when a block of data is not received (e.g. as occurs in packet transmission systems), then procedures for concealing errors are implemented. Such procedures enable the decoder to extrapolate missing signal samples on the basis of the available signals and of data coming from earlier frames, and possibly also from frames that follow the zones that have been lost.
Such techniques have already been implemented, mainly for parametric coders (techniques for recovering erased frames). They make it possible to limit to a very large extent the subjective degradation of the signal perceived at the decoder in the presence of erased frames. Most of the algorithms that have been developed rely on the techniques used by the coder and the decoder, and they thus constitute an extension of the decoder.
A general object of the invention is to improve the subjective quality of a speech signal as played back by a decoder in any system for compressing speech or sound, in the event that a set of consecutive coded data items have been lost due to poor quality of a transmission channel or following the loss or non-reception of a packet in a packet transmission system.
To this end, the invention proposes a technique enabling successive transmission errors (error packets) to be concealed regardless of the coding technique used, and the technique proposed is suitable for use, for example, in time coders whose structure, a priori, lends itself less well to concealing packets of errors.
2. Description of the Related Art
Most coding algorithms of the predictive type propose techniques for recovering erased frames ([GSM-FR], [REC G.723.1A], [SALAMI], [HONKANEN], [COX-2], [CHEN-2], [CHEN-3], [CHEN-4], [CHEN-5], [CHEN-6], [CHEN-7], [KROON], [WATKINS]). The decoder is informed that an erased frame has occurred in one way or another, for example in the case of radio mobile systems by a frame-erasure flag being forwarded from the channel decoder. Devices for recovering erased frames seek to extrapolate the parameters of an erased frame on the basis of the most recent frame(s) that is/are considered as being valid. Some of the parameters manipulated or coded by predictive coders present a high degree of correlation between frames (this applies, for example, both to short-term predictive parameters also referred to as “linear predictive coding” (LPC) (see [RABINER]) which represent the spectral envelope, and to long-term prediction parameters for voiced sounds). Because of this correlation, it is much more advantageous to reuse the parameters of the most recent valid frame for the purpose of synthesizing the erased frame than it is to use parameters that are erroneous or random.
For CELP coding (refer to [RABINER]), the parameters of the erased frame are conventionally obtained as follows:                the LPC filter is obtained from the LPC parameters of the most recent valid frame, either by copying the parameters or after applying a certain amount of damping (cf. G723.1 coder [REC G.723.1A]);        voicing is detected to determine the degree of signal harmonicity in the erased frame ([SALAMI]) where such detection takes place as follows:                    for a non-voiced signal:             an excitation signal is generated in random manner (randomly drawing a code word and using lighted damped past excitation gain [SALAMI], randomly selecting from within the past excitation [CHEN], using transmitted codes that are possibly completely erroneous [HONKANEN], . . . );            for a voiced signal:             the LTP delay is generally the delay calculated for the preceding frame, possibly accompanied by a small amount of “jitter” ([SALAMI]), where LTP gain is taken to be very close to 1 or being equal to 1. The excitation signal is limited to long-term prediction performed on the basis of past excitation.                        
In all of the examples mentioned above, the procedures for concealing erased frames are strongly linked to the decoder and make use of decoder modules such as the signal synthesis module. They also use intermediate signals that are available within the decoder such as the past excitation signal as stored while processing valid frames preceding the erased frames.
Most of the methods used for concealing the errors produced by packets lost during the transport of data coded by time type coders rely on techniques for substituting waveforms such as those described in [GOODMAN], [ERDÖL], [AT&T]. Methods of that type reconstitute the signal by selecting portions of the signal as decoded prior to the period that has been lost and they do not make any use of synthesis models. Smoothing techniques are also implemented to avoid the artifacts that would otherwise be produced by concatenating different signals.
For transform coders, the techniques for reconstructing erased frames also rely on the structure of the coding used: algorithms such as [PICTEL, MAHIEUX-2] rely on regenerating transform coefficients that have been lost on the basis of the values taken by those coefficients prior to erasure.
The method described in [PARIKH] can be applied to any type of signal; it relies on constructing a sinusoidal model on the basis of the valid signal as decoded prior to erasure, in order to generate the missing signal portion.
Finally, there exists a family of techniques for concealing erased frames that have been developed together with the channel coding. Those methods, such as that described in [FINGSCHEIDT] make use of information provided by the channel decoder, e.g. information concerning the degree of reliability of the parameters received. They are fundamentally different from the present invention which does not presuppose the existence of a channel coder.
The prior art that can be considered as being the closest to the present invention is that described in [COMBESCURE], which proposes a method of concealing erased frames equivalent to that used in CELP coders for a transform coder. The drawbacks of the method proposed lie in the introduction of audible spectral distortion (a “synthetic” voice, parasitic resonances, . . . ), due specifically to the use of poorly-controlled long-term synthesis filters (a single harmonic component in voiced sounds, excitation signal generation restricted to the use of portions of the past residual signal). In addition, energy control is performed in [COMBESCURE] at excitation signal level, with the energy target for said signal being kept constant throughout the duration of the erasure, and that also gives rise to troublesome artifacts.