There is a high market need to transmit and store audio signals at low bit rates while maintaining high audio quality. Particularly, in cases where transmission resources or storage is limited low bit rate operation is an essential cost factor. This is typically the case, for example, in streaming and messaging applications in mobile communication systems such as GSM, UMTS, or CDMA.
A general example of an audio transmission system using multi-channel coding and decoding is schematically illustrated in FIG. 1. The overall system basically comprises a multi-channel audio encoder 100 and a transmission module 10 on the transmitting side, and a receiving module 20 and a multi-channel audio decoder 200 on the receiving side.
The simplest way of stereophonic or multi-channel coding of audio signals is to encode the signals of the different channels separately as individual and independent signals, as illustrated in FIG. 2. However, this means that the redundancy among the plurality of channels is not removed, and that the bit-rate requirement will be proportional to the number of channels.
Another basic way used in stereo FM radio transmission and which ensures compatibility with legacy mono radio receivers is to transmit a sum and a difference signal of the two involved channels.
State-of-the art audio codecs such as MPEG-1/2 Layer III and MPEG-2/4 AAC make use of so-called joint stereo coding. According to this technique, the signals of the different channels are processed jointly rather than separately and individually. The two most commonly used joint stereo coding techniques are known as ‘Mid/Side’ (M/S) Stereo and intensity stereo coding which usually are applied on sub-bands of the stereo or multi-channel signals to be encoded.
M/S stereo coding is similar to the described procedure in stereo FM radio, in a sense that it encodes and transmits the sum and difference signals of the channel sub-bands and thereby exploits redundancy between the channel sub-bands. The structure and operation of a coder based on M/S stereo coding is described, e.g. in reference [1].
Intensity stereo on the other hand is able to make use of stereo irrelevancy. It transmits the joint intensity of the channels (of the different sub-bands) along with some location information indicating how the intensity is distributed among the channels. Intensity stereo does only provide spectral magnitude information of the channels, while phase information is not conveyed. For this reason and since temporal inter-channel information (more specifically the inter-channel time difference) is of major psycho-acoustical relevancy particularly at lower frequencies, intensity stereo can only be used at high frequencies above e.g. 2 kHz. An intensity stereo coding method is described, e.g. in reference [2].
A recently developed stereo coding method called Binaural Cue Coding (BCC) is described in reference [3]. This method is a parametric multi-channel audio coding method. The basic principle of this kind of parametric coding technique is that at the encoding side the input signals from N channels are combined to one mono signal. The mono signal is audio encoded using any conventional monophonic audio codec. In parallel, parameters are derived from the channel signals, which describe the multi-channel image. The parameters are encoded and transmitted to the decoder, along with the audio bit stream. The decoder first decodes the mono signal and then regenerates the channel signals based on the parametric description of the multi-channel image.
The principle of the Binaural Cue Coding (BCC) method is that it transmits the encoded mono signal and so-called BCC parameters. The BCC parameters comprise coded inter-channel level differences and inter-channel time differences for sub-bands of the original multi-channel input signal. The decoder regenerates the different channel signals by applying sub-band-wise level and phase and/or delay adjustments of the mono signal based on the BCC parameters. The advantage over e.g. M/S or intensity stereo is that stereo information comprising temporal inter-channel information is transmitted at much lower bit rates. However, BCC is computationally demanding and generally not perceptually optimized.
Another technique, described in reference [4] uses the same principle of encoding of the mono signal and so-called side information. In this case, the side information consists of predictor filters and optionally a residual signal. The predictor filters, estimated by an LMS algorithm, when applied to the mono signal allow the prediction of the multi-channel audio signals. With this technique one is able to reach very low bit rate encoding of multi-channel audio sources, however at the expense of a quality drop.
The basic principles of such parametric stereo coding are illustrated in FIG. 3, which displays a layout of a stereo codec, comprising a down-mixing module 120, a core mono codec 130, 230 and a parametric stereo side information encoder/decoder 140, 240. The down-mixing transforms the multi-channel (in this case stereo) signal into a mono signal. The objective of the parametric stereo codec is to reproduce a stereo signal at the decoder given the reconstructed mono signal and additional stereo parameters.
For completeness, a technique is to be mentioned that is used in 3D audio. This technique synthesizes the right and left channel signals by filtering sound source signals with so-called head-related filters. However, this technique requires the different sound source signals to be separated and can thus not generally be applied for stereo or multi-channel coding.
Rapid changes in the filter characteristics between consecutive frames create disturbing aliasing artifacts and instability in the reconstructed stereo image. To overcome this problem, filter smoothing has been introduced. However, conventional filter smoothing generally leads to a rather large performance reduction since the filter coefficients no longer are optimal for the present frame. In particular, traditional filter smoothing generally leads to an overall reduction of the stereo image width.
Thus there is a general need for improved filter smoothing in multi-channel encoding and/or decoding processes.