The present invention relates generally to the concealment of errors in decoded acoustic signals caused by encoded data representing the acoustic signals being partially lost or damaged. More particularly the invention relates to a method of receiving data in the form of encoded information from a transmission medium and an error concealment unit according to the preambles of claims 1 and 39 respectively. The invention also relates to decoders for generating an acoustic signal from received data in the form of encoded information according to the preambles of claims 41 and 42 respectively, a computer program according to claim 37 and a computer readable medium according to claim 38.
There are many different applications for audio and speech codecs (codec=coder and decoder). Encoding and decoding schemes are, for instance, used for bit-rate efficient transmission of acoustic signals in fixed and mobile communications systems and in videoconferencing systems. Speech codecs can also be utilised in secure telephony and for voice storage.
Particularly in mobile applications, the codecs occasionally operate under adverse channel conditions. One consequence of such non-optimal transmission conditions is that encoded bits representing the speech signal are corrupted or lost somewhere between the transmitter and the receiver. Most of the speech codecs of today""s mobile communication systems and Internet applications operate block-wise, where GSM (Global System for Mobile communication), WCDMA (Wideband Code Division Multiple Access), TDMA (Time Division Multiple Access) and IS95 (International Standard-95) constitute a few examples. The block-wise operation means that an acoustic source signal is divided into speech codec frames of a particular duration, e.g. 20 ms. The information in a speech codec frame is thus encoded as a unit. However, usually the speech codec frames are further divided into sub-frames, e.g. having a duration of 5 ms. The sub-frames are then the coding units for particular parameters, such as the encoding of a synthesis filter excitation in the GSM FR-codec (FR=Full Rate), GSM EFR-codec (EFR=Enhanced Full Rate), GSM AMR-codec (AMR=Adaptive Multi Rate), ITU G.729-codec (ITU=International Telecommunication Union) and EVRC (Enhanced Variable Rate Codec).
Besides the excitation parameters, the above codecs also model acoustic signals by means of other parameters like, for instance, LPC-parameters (LPC=Linear Predictive Coding), LTP-lag (LTP=Long Term Prediction) and various gain parameters. Certain bits of these parameters represent information that is highly important with respect to the perceived sound quality of the decoded acoustic signal. If such bits are corrupted during the transmission the sound quality of the decoded acoustic signal will, at least temporarily, be perceived by a human listener as having a relatively low quality. It is therefore often advantageous to disregard the parameters for the corresponding speech codec frame if they arrive with errors and instead make use of previously received correct parameters. This error concealment technique is applied, in form or the other, in most systems through which acoustic signals are transmitted by means of non-ideal channels.
The error concealment method normally aims at alleviating the effects of a lost/damaged speech codec frame by freezing any speech codec parameters that vary comparatively slowly. Such error concealment is performed, for instance, by the error concealment unit in the GSM EFR-codec and GSM AMR-codec, which repeats the LPC-gain and the LPC-lag parameters in case of a lost or damaged speech codec frame. If, however, several consecutive speech codec frames are lost or damaged various muting techniques are applied, which may involve repetition of gain parameters with decaying factors and repetition of LPC-parameters moved towards their long-term averages. Furthermore, the power level of the first correctly received frame after reception of one or more damaged frames may be limited to the power level of the latest correctly received frame before reception of the damaged frame(s). This mitigates undesirable artefacts in the decoded speech signal, which may occur due to the speech synthesis filter and adaptive codebook being set in erroneous states during reception of the damaged frame(s).
Below is referred to a few examples of alternative means and aspects of ameliorating the adverse effects of speech codec frames being lost or damaged during transmission between a transmitter and a receiver.
The U.S. Pat. No. 5,907,822 discloses a loss tolerant speech decoder, which utilises past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. A multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression parameters extracts the necessary parameters in case of a lost frame and produces a replacement frame.
The European patent, B1, 0 665 161 describes an apparatus and a method for concealing the effects of lost frames in a speech decoder. The document suggests the use of a voice activity detector to restrict updating of a threshold value for determining background sounds in case of a lost frame. A post filter normally tilts the spectrum of a decoded signal. However, in case of a lost frame the filtering coefficients of the post filter are not updated.
The U.S. Pat. No. 5,909,663 describes a speech coder in which the perceived sound quality of a decoded speech signal is enhanced by avoiding a repeated use of the same parameter at reception of several consecutive damaged speech frames. Adding noise components to an excitation signal, substituting noise components for the excitation signal or reading an excitation signal at random from a noise codebook containing plural excitation signals accomplishes this.
The known error concealment solutions for narrow-band codecs generally provide a satisfying result in most environments by simply repeating certain spectral parameters from a latest received undamaged speech codec frame during the corrupted speech codec frame(s). In practice, this procedure implicitly retains the magnitude and the shape of the spectrum of the decoded speech signal until a new undamaged speech codec frame is received. By such preservation of the speech signal""s spectral magnitude and the shape, it is also implicitly assumed that an excitation signal in the decoder is spectrally flat (or white).
However, this is not always the case. An Algebraic Code Excited Linear Predictive-codec (ACELP) may, for instance, produce non-white excitation signals. Furthermore, the spectral shape of the excitation signal may vary considerably from one speech codec frame to another. A mere repetition of spectral parameters from a latest received undamaged speech codec frame could thus result in abrupt changes in the spectrum of the decoded acoustic signal, which, of course, means that a low sound quality is experienced.
Particularly, wide-band speech codecs operating according to the CELP coding paradigm have proven to suffer from the above problems, because in these codecs the spectral shape of the synthesis filter excitation may vary even more dramatically from one speech codec frame to another.
The object of the present invention is therefore to provide a speech coding solution, which alleviates the problem above.
According to one aspect of the invention the object is achieved by a method of receiving data in the form of encoded information and decoding the data into an acoustic signal as initially described, which is characterised by, in case of received damaged data, producing a secondary reconstructed signal on basis of a primary reconstructed signal. The secondary reconstructed signal has a spectrum, which is a spectrally adjusted version of the spectrum of the primary reconstructed signal where the deviation with respect to spectral shape to a spectrum of a previously reconstructed signal is less than a corresponding deviation between the spectrum of the primary reconstructed signal and the spectrum of the a previously reconstructed signal.
According to another aspect of the invention the object is achieved by a computer program directly loadable into the internal memory of a computer, comprising software for performing the method described in the above paragraph when said program is run on the computer.
According to a further aspect of the invention the object is achieved by a computer readable medium, having a program recorded thereon, where the program is to make the computer perform the method described in the penultimate paragraph above.
According to still a further aspect of the invention the object is achieved by an error concealment unit as initially described, which is characterised in that, in case of received damaged data, a spectral correction unit produces a secondary reconstructed spectrum based on a primary reconstructed signal such that the spectral shape of the secondary reconstructed spectrum deviates less with respect to spectral shape from a spectrum of a previously reconstructed signal than a spectrum based on the primary reconstructed signal.
According to yet another aspect of the invention the object is achieved by a decoder for generating an acoustic signal from received data in the form of encoded information. The decoder includes a primary error concealment unit to produce at least one parameter. It also includes a speech decoder to receive speech codec frames, the at least one parameter from the primary error concealment and to provide in response thereto an acoustic signal. Furthermore, the decoder includes the proposed error concealment unit wherein the primary reconstructed signal constitutes the decoded speech signal produced by the speech decoder and the secondary reconstructed signal constitutes an enhanced acoustic signal.
According to still another aspect of the invention the object is achieved by a decoder for generating an acoustic signal from received data in the form of encoded information. The decoder includes a primary error concealment unit to produce at least one parameter. It also includes an excitation generator to receive speech codec parameters and the at least one parameter and to produce an excitation signal in response to the at least one parameter from the primary error concealment unit. Finally, the decoder includes the proposed error concealment unit wherein the primary reconstructed signal constitutes the excitation signal produced by the excitation generator and the secondary reconstructed signal constitutes an enhanced excitation signal.
The proposed explicit generation of a reconstructed spectrum as a result of lost or received damaged data ensures spectrally smooth transitions between periods of received undamaged data and periods of received damaged data. This, in turn, provides an enhanced perceived sound quality of the decoded signal, particularly for advanced broadband codecs, for instance, involving ACELP-coding schemes.