There is a high market need to transmit and store audio signals at low bit rate while maintaining high audio quality. Particularly, in cases where transmission resources or storage is limited low bit rate operation is an essential cost factor. This is typically the case, e.g. in streaming and messaging applications in mobile communication systems such as GSM, UMTS, or CDMA.
Today, there are no standardized codecs available providing high stereophonic audio quality at bit rates that are economically interesting for use in mobile communication systems. What is possible with available codecs is monophonic transmission of the audio signals. To some extent also stereophonic transmission is available. However, bit rate limitations usually require limiting the stereo representation quite drastically.
The simplest way of stereophonic or multi-channel coding of audio signals is to encode the signals of the different channels separately as individual and independent signals. Another basic way used in stereo FM radio transmission and which ensures compatibility with legacy mono radio receivers is to transmit a sum and a difference signal of the two involved channels.
State-of-the-art audio codecs, such as MPEG-1/2 Layer III and MPEG-2/4 AAC make use of so-called joint stereo coding. According to this technique, the signals of the different channels are processed jointly, rather than separately and individually. The two most commonly used joint stereo coding techniques are known as “Mid/Side” (M/S) stereo coding and intensity stereo coding, which usually are applied on sub-bands of the stereo or multi-channel signals to be encoded.
M/S stereo coding is similar to the described procedure in stereo FM radio, in a sense that it encodes and transmits the sum and difference signals of the channel sub-bands and thereby exploits redundancy between the channel sub-bands. The structure and operation of an encoder based on M/S stereo coding is described, e.g. in U.S. Pat. No. 5,285,498 by J. D. Johnston.
Intensity stereo on the other hand is able to make use of stereo irrelevancy. It transmits the joint intensity of the channels (of the different sub-bands) along with some location information indicating how the intensity is distributed among the channels. Intensity stereo does only provide spectral magnitude information of the channels. Phase information is not conveyed. For this reason and since the temporal inter-channel information (more specifically the inter-channel time difference) is of major psycho-acoustical relevancy particularly at lower frequencies, intensity stereo can only be used at high frequencies above e.g. 2 kHz. An intensity stereo coding method is described, e.g. in the European patent 0497413 by R. Veldhuis et al.
A recently developed stereo coding method is described, e.g. in a conference paper with the title “Binaural cue coding applied to stereo and multi-channel audio compression”, 112th AES convention, May 2002, Munich, Germany by C. Faller et al. This method is a parametric multi-channel audio coding method. The basic principle is that at the encoding side, the input signals from N channels c1, c2, . . . cN are combined to one mono signal m. The mono signal is audio encoded using any conventional monophonic audio codec. In parallel, parameters are derived from the channel signals, which describe the multi-channel image. The parameters are encoded and transmitted to the decoder, along with the audio bit stream. The decoder first decodes the mono signal m′ and then regenerates the channel signals c1′, c2′, . . . , cN′, based on the parametric description of the multi-channel image.
The principle of the Binaural Cue Coding (BCC) method is that it transmits the encoded mono signal and so-called BCC parameters. The BCC parameters comprise coded inter-channel level differences and inter-channel time differences for sub-bands of the original multi-channel input signal. The decoder regenerates the different channel signals by applying sub-band-wise level and phase adjustments of the mono signal based on the BCC parameters. The advantage over e.g. M/S or intensity stereo is that stereo information comprising temporal inter-channel information is transmitted at much lower bit rates.
A problem with the state-of-the-art multi-channel coding techniques described above is that they require high bit rates in order to provide good quality. Intensity stereo, if applied at low bit rates as low as e.g. only a few kbps suffers from the fact that it does not provide any temporal inter-channel information. As this information is perceptually important for low frequencies below e.g. 2 kHz, it is unable to provide a stereo impression at such low frequencies.
BCC is able to reproduce the multi-channel image even at low frequencies at low bit rates of e.g. 3 kbps since it also transmits temporal inter-channel information. However, this technique requires computational demanding time-frequency transforms on each of the channels, both at the encoder and the decoder. Moreover, BCC optimizes the mapping in a pure mathematical manner. Characteristic artifacts immanent in the coding method will, however, not disappear.
Another technique, described in U.S. Pat. No. 5,434,948 by C. E. Holt et al. uses a similar approach of encoding the mono signal and side information. In this case, side information consists of predictor filters and optionally a residual signal. The predictor filters, estimated by a least-mean-square algorithm, when applied to the mono signal allow the prediction of the multi-channel audio signals. With this technique one is able to reach very low bit rate encoding of multi-channel audio sources, however, at the expense of a quality drop.
An approach similar to the above filtering approach is described in WO 03/090206 by Breebaart and Groenendaal. However, this approach uses a fixed filter applied to the mono signal and combined together with the non filtered mono signal via a matrixing operation. The matrixing operation is dependent upon a received correlation parameter and a received level parameter. The objective of such signal synthesis is to restore the correlation and the level difference of the original two channels. Because of the inherently fixed filtering operation, the signal synthesis has a very limited potential for signal reproduction and does not adapt to the signal characteristics. The approach can be regarded as an extension of the intensity stereo coding method discussed above, in which now a temporal component is conveyed to the decoder. Still, only the level and the correlation parameters allow a certain degree of adaptivity through a matrixing operation. This operation consists of a mere rotation and scaling of statically filtered signals, thus limiting the polyphonic reproduction ability. Another drawback of the approach is the fact that it is not based on a fidelity criterion, e.g. signal-to-noise ratio, which limits its scalability to transparent quality.
Finally, for completeness, a technique is to be mentioned that is used in 3D audio. This technique synthesizes the right and left channel signals by filtering sound source signals with so-called head-related filters. However, this technique requires the different sound source signals to be separated and can thus not generally be applied for stereo or multi-channel coding.