A conventional audio communication system transmits speech and audio signals in frames, meaning that the sending side first arranges the audio signal in short segments, i.e. audio signal frames, of e.g. 20-40 ms, which subsequently are encoded and transmitted as a logical unit in e.g. a transmission packet. A decoder at the receiving side decodes each of these units and reconstructs the corresponding audio signal frames, which in turn are finally output as a continuous sequence of reconstructed audio signal samples.
Prior to the encoding, an analog to digital (A/D) conversion may convert the analog speech or audio signal from a microphone into a sequence of digital audio signal samples. Conversely, at the receiving end, a final D/A conversion step typically converts the sequence of reconstructed digital audio signal samples into a time-continuous analog signal for loudspeaker playback.
However, a conventional transmission system for speech and audio signals may suffer from transmission errors, which could lead to a situation in which one or several of the transmitted frames are not available at the receiving side for reconstruction. In that case, the decoder has to generate a substitution signal for each unavailable frame. This may be performed by a so-called audio frame loss concealment unit in the decoder at the receiving side. The purpose of the frame loss concealment is to make the frame loss as inaudible as possible, and hence to mitigate the impact of the frame loss on the quality of the reconstructed signal.
Conventional frame loss concealment methods may depend on the structure or the architecture of the codec, e.g. by repeating previously received codec parameters. Such parameter repetition techniques are clearly dependent on the specific parameters of the used codec, and may not be easily applicable to other codecs with a different structure. Current frame loss concealment methods may e.g. freeze and extrapolate parameters of a previously received frame in order to generate a substitution frame for the lost frame.
The standardized linear predictive codecs AMR and AMR-WB are parametric speech codecs which freeze the earlier received parameters or use some extrapolation thereof for the decoding. In essence, the principle is to have a given model for coding/decoding and to apply the same model with frozen or extrapolated parameters.
Many audio codecs apply for coding a frequency domain-technique, which involves applying a coding model on a spectral parameter after a frequency domain transform. The decoder reconstructs the signal spectrum from the received parameters and transforms the spectrum back to a time signal. Typically, the time signal is reconstructed frame by frame, and the frames are combined by overlap-add techniques and potential further processing to form the final reconstructed signal. The corresponding audio frame loss concealment applies the same, or at least a similar, decoding model for lost frames, wherein the frequency domain parameters from a previously received frame are frozen or suitably extrapolated and then used in the frequency-to-time domain conversion.
However, conventional audio frame loss concealment methods may suffer from quality impairments, e.g. since the parameter freezing and extrapolation technique and re-application of the same decoder model for lost frames may not always guarantee a smooth and faithful signal evolution from the previously decoded signal frames to the lost frame. This may lead to audible signal discontinuities with a corresponding quality impact. Thus, audio frame loss concealment with reduced quality impairment is desirable and needed.