Today, two techniques dominate for exploiting the stereo redundancy and irrelevancy contained in stereophonic audio signals. Mid-Side (M/S) stereo coding, primarily aims at redundancy removal, and is based on the fact that since the two channels are often fairly correlated, it is better to encode the sum, and the difference between the two. More bits (relatively) can then be spent on the high power sum signal, than on the low power side (or difference) signal. Intensity stereo coding, on the other hand, achieves irrelevancy removal by, in each subband, replacing the two signals by a sum signal and an azimuth angle. At the decoder, the azimuth parameter is used to control the spatial location of the auditory event represented by the subband sum signal. Mid-Side, and Intensity stereo are both used extensively in existing audio coding standards.
A problem with the M/S approach towards redundancy exploitation, is that if the two components are out of phase (one is delayed relative the other), the M/S coding gain vanishes. This is a conceptual problem, since time delays are frequent in real audio signals. For example, spatial hearing relies much on time differences between signals (especially at low frequencies)). In audio recordings, time delays may stem from both stereophonic microphone setups, and from artificial post processing (sound effects) . In Mid-Side coding, an ad-hoc solution is often used for the time delay issue: M/S coding is only employed when the power of the difference signal is less than a constant factor of that of the sum signal. The alignment problem is better addressed in an article to H. Fuchs, entitled “Improving Joint Stereo Audio Coding by Adaptive Inter-Channel Prediction”, Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1993, pp. 39 - 42, where one of the signal components is predicted from the other. The prediction filters are derived on a frame-by-frame basis in the encoder, and are transmitted as side information. In another article to H. Fuchs, entitled “Improving MPEG Audio Coding by Backward Adaptive Linear Stereo Prediction”, Preprint 4086, 99 th AES Convention, 1995, a backward adaptive alternative is considered. It is noted that the performance gain is heavily dependent on the signal type, but for certain types of signals, a dramatic gain compared to M/S stereo coding is obtained.
Parametric stereo coding has received much attention lately. Based on a core mono (single channel] coder, such parametric schemes extract the stereo (multi channel) component, and encode it separately at a relatively low bitrate. This can be seen as a generalization of Intensity stereo coding. Parametric stereo coding methods are particularly useful in the low bitrate range of audio coding, where it results in a significant increase in quality of spending only a small part of the total bit budget on the stereo component. Parametric methods are also attractive since they are extendible to the multi channel (more than two channels) case, and have the ability to offer backward compatibility: MP3 surround is one such example where the multi channel data is encoded and transmitted in the auxiliary field of the data stream. This allows receivers without multi channel capabilities to decode a normal stereo signal, whereas surround enabled receivers can enjoy multi channel audio. Parametric methods often rely on extraction and encoding of different psycho acoustical cues, primarily Inter-Channel Level Differences (ICLD's) and Inter-Channel Time Differences (ICTD's). In an article to J. Breebaart et al., entitled “High-Quality Parametric Spatial Audio Coding at Low Bitrates”, Preprint 6072, 116th AES Convention, 2004, it is reported that a coherence parameter is important for a natural sounding result. However, parametric methods are limited in the sense that at higher bit rates, the coders are not able to reach transparent quality due to the inherent modeling constraint.
The problems related to parametric multi channel encoders are that their maximum obtainable quality value is limited to a threshold, which is significantly below the transparent quality. The parametric quality threshold is shown at 1100 in FIG. 11. As can be seen from a schematic curve representing the quality/bitrate dependence of a BCC enhanced mono coder (1102), the quality can not cross the parametric quality threshold 1100 irrespective of the bitrate. This means that even with an increased bitrate, the quality of such a parametric multi channel encoder cannot increase anymore.
The BCC enhanced mono coder is an example for the currently existing stereo coders or multi channel coders, in which a stereo-downmix or a multi channel downmix is performed. Additionally, parameters are derived describing inter channel level relations, inter channel time relations, inter channel coherence relations etc.
The parameters are different from a waveform signal such as a side signal of a Mid/Side encoder, since the side signal describes a difference between two channels in a waveform-style format compared to the parametric representation, which describes similarities or dissimilarities between two channels by giving a certain parameter rather than a sample-wise waveform representation. While parameters require a low number of bits for being transmitted from an encoder to a decoder, waveform-descriptions, i.e., residual signals being derived in a waveform-style require more bits and allow, in principle, a transparent reconstruction.
FIG. 11 shows a typical quality/bitrate dependence of such a waveform-based conventional stereo coder (1104). It becomes clear from FIG. 11, that, by increasing the bitrate more and more, the quality of the conventional stereo coder such as a Mid/Side stereo coder increases more and more until the quality reaches the transparent quality. There is a kind of a “cross-over bitrate”, at which the characteristic curve 1102 for the parametric multi channel coder and the curve 1104 for the conventional waveform-based stereo coder cross each other.
Below this cross-over bitrate, the parametric multi channel encoder is much better than the conventional stereo coder. When the same bitrate for both encoders is considered, the parametric multi channel coder provides a quality, which is higher than the quality of the conventional waveform-based stereo coder by the quality difference 1108. Stated in other words, when one wishes to have a certain quality 1110, this quality can be achieved using the parametric coder by a bitrate which is reduced by a difference bitrate 1112 compared to a conventional waveform-based stereo coder.
Above the cross-over bitrate, however, the situation is completely different. Since the parametric coder is at its maximum parametric coder quality threshold 1100, a better quality can only be obtained by using a conventional waveform-based stereo coder using the same number of bits as in the parametric coder.