In traditional waveform based audio coding schemes such as MPEG-LII, mp3 and AAC (MPEG-2 Advanced Audio Coding), stereo signals are encoded by encoding two monaural audio signals into one bit-stream. However, by exploiting inter-channel correlation and irrelevancy with techniques such as mid/side stereo coding and intensity coding bit rate savings can be made.
In the case of mid/side stereo coding, stereo signals with a high amount of mono content can be split into a sum M=(L+R)/2 and a difference S=(L−R)/2 signal. This decomposition is sometimes combined with principle component analysis or time-varying scale-factors. The signals are then coded independently, either by a parametric coder or a waveform coder (e.g. transform or subband coder). For certain frequency regions this technique can result in a slightly higher energy for either the M or S signal. However, for certain frequency regions a significant reduction of energy can be obtained for either the M or S signal. The amount of information reduction achieved by this technique strongly depends on the spatial properties of the source signal. For example, if the source signal is monaural, the difference signal is zero and can be discarded. However, if the correlation of the left and right audio signals is low (which is often the case for the higher frequency regions), this scheme offers only little advantage.
In the case of intensity stereo coding, for a certain frequency region, only one signal I=(L+R)/2 is encoded along with intensity information for the L and R signal. At the decoder side this signal I is used for both the L and R signal after scaling it with the corresponding intensity information. In this technique, high frequencies (typically above 5 kHz) are represented by a single audio signal (i.e., mono), combined with time-varying and frequency-dependent scale-factors.
Parametric descriptions of audio signals have gained interest during the last years, especially in the field of audio coding. It has been shown that transmitting (quantized) parameters that describe audio signals requires only little transmission capacity to re-synthesize a perceptually equal signal at the receiving end. However, current parametric audio coders focus on coding monaural signals, and stereo signals are often processed as dual mono.
EP-A-1107232 discloses a parametric coding scheme to generate a representation of a stereo audio signal which is composed of a left channel signal and a right channel signal. To efficiently utilize transmission bandwidth, such a representation contains information concerning only a monaural signal which is either the left channel signal or the right channel signal, and parametric information. The other stereo signal can be recovered based on the monaural signal together with the parametric information. The parametric information comprises localization cues of the stereo audio signal, including intensity and phase characteristics of the left and the right channel.
In binaural stereo coding, similar to intensity stereo coding, only one monaural channel is encoded. Additional side information holds the parameters to retrieve the left and right signal. European Patent Application No. 02076588.9 filed April, 2002 discloses a parametric description of multi-channel audio related to a binaural processing model presented by Breebaart et al in “Binaural processing model based on contralateral inhibition. I. Model setup”, J. Acoust. Soc. Am., 110, 1074-1088, August 2001 and “Binaural processing model based on contralateral inhibition. II. Dependence on spectral parameters”, J. Acoust. Soc. Am., 110, 1089-1104, August 2001, and “Binaural processing model based on contralateral inhibition. III. Dependence on temporal parameters”, J. Acoust. Soc. Am., 110, 1105-1117, August 2001 discloses a binaural processing model. This comprises splitting an input audio signal into several band-limited signals, which are spaced linearly at an (Equivalent Rectangular Bandwidth) ERB-rate scale. The bandwidth of these signals depends on the center frequency, following the ERB rate. Subsequently, for every frequency band, the following properties of the incoming signals are analyzed:
the interaural level difference (ILD) defined by the relative levels of the band-limited signal stemming from the left and right ears,
the interaural time (or phase) difference (ITD or IPD), defined by the interaural delay (or phase shift) corresponding to the peak in the interaural cross-correlation function, and
the (dis)similarity of the waveforms that can not be accounted for by ITDs or ILDs, which can be parameterized by the maximum interaural cross-correlation (i.e., the value of the cross-correlation at the position of the maximum peak). It is therefore known from the above disclosures that spatial attributes of any multi-channel audio signal may be described by specifying the ILD, ITD (or IPD) and maximum correlation as a function of time and frequency.
This parametric coding technique provides reasonably good quality for general audio signals. However, particularly for signals having a higher non-stationary behaviour, e.g. castanets, harpsichord, glockenspiel, etc, the technique suffers from pre-echo artifacts.
It is an object of this invention to provide an audio coder and decoder and corresponding methods that mitigate the artifacts related to parametric multi-channel coding.