The present invention relates to signal correction, particularly in a decoder when there is frame loss in the signal received by the decoder.
The signal is in the form of a succession of samples, divided into successive frames where the term frame means a signal segment composed of at least one sample (having a frame contain a single sample then simply corresponds to a signal in the form of a succession of samples).
The invention lies in the field of digital signal processing, particularly but not exclusively in the field of encoding/decoding an audio signal. Frame loss occurs when a communication (either transmitted in real time or stored for later transmission) using a coder and decoder is disrupted by channel conditions (due to radio issues, network congestion, etc.).
In this case, the decoder uses packet loss correction mechanisms (or “masking”) in an attempt to substitute a reconstructed signal for the missing signal, using information available in the decoder (such as the already decoded signal or the parameters received in previous frames). This technique allows maintaining a good quality of service despite degraded channel performance.
Frame loss correction techniques are often highly dependent on the type of coding used.
In the case of coding a speech signal based on CELP technology (for “Code Excited Linear Prediction”), the frame loss correction applies the CELP model. For example, when coding according to Recommendation G722.2, the solution for replacing a lost frame (or “packet”) is to prolong the use of a long-term prediction (LTP) gain by attenuating it, as well as to prolong the use of each ISF parameter (for “Imittance Spectral Frequency”) by bringing them towards their respective averages. The pitch period of the speech signal (designated “LTP-Lag”) is also repeated. In addition, the decoder is supplied random values for parameters characterizing the “innovation” (excitation in CELP coding).
It should be noted that applying this type of method for transform coding or for PCM (“Pulse Code Modulation”) coding requires CELP coding in the decoder, which introduces additional complexity.
In ITU-T Recommendation G.711 for a waveform coder, the processing for frame loss correction (exemplified in Appendix I of that recommendation) finds a pitch period in the speech signal already decoded and repeats the last pitch period with overlap-add between the already decoded signal and the repeated signal. This treatment “erases” audio artifacts but requires additional time in the decoder (time corresponding to the duration of the overlap).
The technique most often used to correct frame loss in transform coding consists of repeating the spectrum decoded in the last frame received. For example, in the case of coding according to Recommendation G.722.1, the MLT (“modulated lapped transform”), equivalent to a modified discrete cosine transform (MDCT) with 50% overlap and sinusoidal windows, ensures a transition (between the last frame lost and the repeated frame) which is sufficiently slow to erase artifacts due to simple repetition of the frame.
Advantageously, this technology does not require any additional time because it exploits the temporal aliasing of the MLT transform to create an overlap-add with the reconstructed signal. This is a very inexpensive technique in terms of resources.
However, it has a flaw related to the temporal inconsistency between the signal just before the frame loss and the repeated signal. This results in an audible phase discontinuity that can produce significant audio artifacts if the overlap between the two frames is small (as is the case when “low-delay” MDCT windows are used). This situation with a short overlap is illustrated in FIG. 1B for the case of a low-delay MLT transform, for comparison with the usual situation of FIG. 1A where long sine windows are used according to Recommendation G.722.1 (then offering a long overlap period ZRA, with very gradual modulation). It appears that modulation by a low-delay window produces an audible phase shift due to the short overlap area ZRB, as represented in FIG. 1B.
In this case, even when a solution is implemented that combines pitch detection (the case when coding according to Recommendation G.711—Appendix I) and an overlap-add produced by the window of an MDCT transform, this would not be sufficient to eliminate audio artifacts related to the phase shift.
Another frame loss correction technique is to generate a synthesis signal from a signal structure extracted from a pitch period. Pitch period is understood to mean a fundamental period, particularly in the case of a voiced speech signal (the inverse of the fundamental frequency of the signal). However, the signal may also come from a music signal for example, having an overall tone which is associated with a fundamental frequency and a fundamental period that can correspond to said repetition period.
However, the physical properties of the synthesized signal do not match those of the original signal (some frames have been lost) and are the cause of unpleasant auditory defects. This introduces additional errors compared to the original signal. In addition, the energy of the correctly received signal and that of the signal reconstructed from the structure described above may be substantially different. These differences can cause an auditory sensation of “noise jump”, where the noise level changes sporadically. For example, for a signal in which the noise signal equates to background noise, the listener would hear jumps in this background noise.
More generally, we note that in the current state of the art, the generation of the synthesis signal to fill the frames replacing lost frames introduces a periodicity which, in complex signals such as music, does not fit with the range of all signal components to be replaced.
For example, with reference to FIG. 1C, a signal S0 is repeated 7 times in windows F1 to F7. As the time characteristics (window start times v1 to v7 and window duration L0 to L7) of the windows are identical, periodization is introduced.
This systematic and inadequate periodization results in a “metallic” and artificial sound (therefore unpleasant to the listener) with each frame loss. It is therefore necessary to improve existing replication methods, including but not limited to contexts of decoding with overlap-add.