As a result of the increasingly widespread use of modern audio encoders and the corresponding audio decoders, which operate according to one of the MPEG standards, the transmission of encoded audio signals over radio networks or line-based net-works such as the internet has already become very important. The transmission channel involved in the transmission of encoded audio signals by means of digital radio or over line-based networks is not ideal, which can result in encoded audio signals being adversely affected during the transmission. The decoder is therefore confronted with the question of how to deal with transmission errors, i.e. how these transmission errors are to be “concealed”. The objective of error concealment is to manipulate transmission errors in such a way as to improve the subjective auditory sensation arising from such an error-afflicted decoded audio signal.
Many error concealment methods are already known. The simplest type of error concealment is that of “muting”. When a decoder recognizes that data are missing or are erroneous, it interrupts the reproduction. The missing data are thus replaced by a zero signal. In this way the decoder is prevented from issuing sounds which, due to a transmission error, would be found too loud or disconcerting. Because of psychoacoustic effects, however, the resulting sudden fall in the signal energy and its sudden rise when the decoder issues error-free data again is found disconcerting.
Another known method which avoids the sudden fall and subsequent rise in the signal energy is that of data repetition. If e.g. one or more blocks of audio data are missing, part of the data last transmitted are repeated in a loop until error-free, i.e. intact, audio data are available again. This method produces disturbing artefacts, however. If only short parts of the audio signal are repeated, the repeated signal sounds mechanical whatever the original signal may have been like, having a basic frequency equal to the repetition frequency. If longer parts are repeated, certain echo effects arise which are also found disturbing.
In block-oriented transform encoders/decoders that employ a spectral representation of a temporal audio signal, the possibility would also exist of performing a spectral value prediction in the case of erroneous audio data. If it is established that spectral values in a block are erroneous, these spectral values can be predicted, i.e. estimated, on the basis of the spectral values of a preceding frame or a number of preceding frames. The predicted spectral values correspond within certain limits to the erroneous spectral values if the audio signal is relatively steady, i.e. if the audio signal is not subject to any very fast changes in the signal envelope. If e.g. a method employing the MPEG AAC standard (ISO/IEC 13818-7 MPEG-2 Advanced Audio Coding)] is considered, a normal block or frame of encoded audio data has 1024 spectral values. For the method of spectral value prediction 1024 parallel operating predictors will therefore be needed in the decoder so that, if a complete frame is lost, all the spectral values can be predicted.
A disadvantage of this method is the relatively high computational effort, which makes a real-time decoding of a received multimedia or audio data signal impossible at present.
A further important disadvantage of this method results from the transform algorithm, namely the modified discrete cosine transform (MDCT)], which is used. It is generally known that the MDCT algorithm does not provide an ideal Fourier spectrum but a “spectrum” which deviates from an ideal Fourier spectrum. Investigations have shown that a sine time function e.g., which has a Fourier spectrum with a single spectral line at the frequency of the sine function, has an MDCT “spectrum” which, while it has a dominant spectral coefficient at the frequency of the sine function, also has in addition further spectral coefficients at other frequency values. Furthermore, the height of an MDCT “spectrum” of a sine function does not remain the same from one frame to another but varies from frame to frame. Another fact is that the MDCT transform is not strictly energy conserving. What can be stated, therefore, is that, while the MDCT transform works exactly in conjunction with an inverse MDCT transform, the MDCT spectrum differs considerably from a Fourier spectrum. A spectral value prediction of MDCT spectral coefficients has thus shown itself to be inadequate when high precision is required.
A further disadvantage of spectral value prediction, particularly in connection with modern audio coding methods, is that modern audio coding methods use different window lengths or window shapes. To prevent the quantization noise arising from the quantization of the MDCT spectral coefficients being “smeared” over a long block, i.e. the occurrence of pre-echoes, when there are rapid changes (transients or “attacks”)] in the audio signal to be encoded, modern transform encoders use short windows for transient audio signals, i.e. audio signals with “attacks”, to increase the temporal resolution at the expense of the frequency resolution. This means, however, that for a spectral value prediction both the window length and the window shape (in addition there are transition windows to initiate windowing from short to long blocks and vice versa)] must be constantly taken into account, which also increases the complexity of the spectral value prediction and would greatly affect the computational efficiency.
DE 40 34 017 A1 relates to a method for detecting errors in the transmission of frequency coded digital signals. From the frequency coefficients or previous and, in some cases, future frames, an error function is formed on the basis of which the occurrence of an error can be detected. An erroneous frequency coefficient is no longer included in the evaluation of subsequent frames.
DE 197 35 675 A1 discloses a method for concealing errors in an audio data stream. The spectral energy of a subgroup of intact audio data is calculated. After producing a pattern for substitute data using the spectral energy calculated for the subgroup of intact audio data, substitute data for erroneous or missing audio data corresponding to the subgroup are generated according to the pattern.