In recent years, the distribution and storage of content signals in digital form has increased substantially. Accordingly, a large number of encoding standards and protocols have been developed.
One of the most widespread coding standards for digital audio encoding of audio signals is the Motion Picture Expert Group Level 3 standard generally referred to as MP3. As an example, MP3 allows, a 30 or 40 megabyte digital PCM (Pulse Code Modulation) audio recording of a song to be compressed into e.g. a 3 or 4 megabyte MP3 file. The exact compression rate depends on the desired quality of the MP3 encoded audio.
Audio encoding and compression techniques such as MP3 provide for very efficient audio encoding which allows audio files of relatively low data size and high quality to be conveniently distributed through data networks such as the Internet.
Many encoding protocols provide for efficient encoding of stereo channels. Stereo coding aims at removing redundancy and irrelevancy from the stereo signal to attain lower bit rates than the sum of the bit rates of the separate channels for a given quality level.
A number of different stereo encoding algorithms and techniques are known. One technique is known as intensity stereo coding. Intensity stereo coding allows a great reduction in bit rate compared to independent coding of audio channels. In intensity stereo, a mono audio signal is generated for the higher frequency range of the signal. In addition, separate intensity parameters are generated for the different channels. Typically, the intensity parameters are in the form of left and right scale factors which are used in the decoder to generate the left and right output signals from the mono audio signal. A variation is the use of a single scale factor and a directional parameter.
The intensity stereo coding technique has however several disadvantages. First of all, the encoder discards time- and phase information for the higher frequencies. The decoder therefore cannot reproduce the time- or phase channel differences that are present in the original audio material. Furthermore, in general, the encoding cannot preserve the correlation between the audio channels. Accordingly, a quality degradation of the stereo signal generated by the encoder cannot be avoided.
Another technique is known as Mid/Side (MS) coding wherein a Mid signal component may be generated by adding the left and right channel signals and the Side channel may be generated by subtracting the left and right channel signals. As the correlation between the left and right signals typically is high, this usually results in a high signal energy of the Mid signal component and a low signal energy of the Side signal. The Mid and Side signals are then encoded using different encoding parameters where the encoding of the Side signal is typically such that it reduces the data rate for the Side signal.
A disadvantage of MS coding is that the bit rate efficiency of MS coding is generally significantly lower than for example intensity stereo encoding thereby resulting in increased data rates. In a worst case situation, MS coding does not provide any gain in bit rate compared to independent coding of left and right channels.
Another stereo encoding technique is known as linear prediction techniques wherein the left and right channels are linearly combined into a complex signal. A complex linear prediction filter is then used to predict the complex signal and the resulting residual signal is encoded. An example of such an encoder is given in “An experimental audio codec based on warped linear prediction of complex valued signals” by Härmä, Laine and Karjalainen, Proceedings of ICASSP-97, page 323-326 Munich Germany, April 1997.
A problem associated with the current linear prediction proposals is that combining the left and right channels into a complex signal imposes a temporal association of the left and right channels which results in a limitation in the available degrees of freedom for the prediction. Accordingly, the prediction is not able to attain maximum removal of redundant information. Furthermore, the techniques do not identify or construct a main and side signal for which encoding can be individually optimized. Additionally, the prediction criteria used are based on simple prediction filtering which do not result in optimal prediction. Accordingly, the achievable data rate for a given signal quality is not optimal.
A different encoding technique utilizes a rotation of frequency bands or subbands. In such a technique bandfilters may be used to generate a plurality of subband signals for the left and right channel. Each subband of one channel is paired with a subband of the other channel and a principal component analysis is performed. The parameters per subband are applied in the encoder to generate a main and side signal per subband by rotation. The parameters are also stored in the data stream such that the decoder can apply the inverse process.
A problem with such a rotator technique is that it does not take into account possible time-differences between the left and right signal and accordingly does not achieve optimum performance. Secondly, due to overlap-add analysis and synthesis, perfect reconstruction of the subband signals is not possible even in the absence of signal quantisation.
Currently, the most promising technique for low data rate stereo encoding appears to be perceptual stereo coding in which perceptual models and information is used to reduce the encoded data rate. Thus, rather than attempting to represent the waveform of the original stereo signal as closely as possible, perceptual stereo encoding aims at generating a signal that the decoder can use to generate an output signal that results in the same audio perception for a user.
A problem which is inherent in this approach is that even in the absence of signal quantisation, the original signal can not be reconstructed perfectly. This may in particular be due to the overlap-add procedures which are used in the analysis and synthesis systems. Accordingly, for high data rate applications, the performance of perceptual stereo encoding tends to provide a lower quality of the reconstructed signal.
Accordingly an improved system for multi-channel encoding and/or decoding would be advantageous and in particular a system allowing increased flexibility, reduced data rate, increased quality and/or reduced complexity would be advantageous. Specifically, a system allowing high signal quality at high data rates and efficient encoding at low data rates would be advantageous.