Audio signals, like music or speech, are encoded for example for enabling an efficient transmission or storage of the audio signals. The audio signals may be mono signals using a single channel or stereophonic signals using two or more channels. The latter are also referred to as stereo audio signals or multichannel audio signals.
Stereophonic signals have mostly replaced mono audio signals in television, radio, internet audio, video streaming and clips etc. The same transformation may be expected in speech communication.
A stereophonic signal may be encoded by encoding each channel separately or by using a combined encoding. In both cases, the encoding typically includes a quantization.
An exemplary separate encoding can be for instance an L/R coding, which includes a separate coding of a left (L) channel signal and of a right (R) channel signal of a two-channel stereo signal.
An exemplary combined coding is a mid channel and side channel (M/S) coding. For M/S coding, a mono downmix mid (M) channel signal is created as a mixture of a left channel signal and a right channel signal of a stereo input signal. In addition, a side (S) channel signal is created as a different mixture of the left and right channel signals. A receiver may then reconstruct the left and right channel signals from the mid and side channel signals.
An encoder may also be designed to choose between L/R and M/S coding depending on the signal characteristics of a respective stereophonic signal. Firstly, the signal may be divided into short blocks in the time domain. The blocks may have a length of 5-50 ms and they may overlap. Secondly, the blocks may be transformed into the frequency domain using a short time Fourier transform (STFT) or any other kind of transform. In the frequency domain, the switch between L/R and M/S coding may then be performed independently for different frequency bands. There may be for instance approximately 50 frequency bands.
Typically, M/S channel coding is only selected when the left and right channel signals are strongly correlated, that is, if left and right channel signals are very similar. In this case, M/S coding concentrates most of the total energy to the mid channel signal, leaving little energy to the side channel signal. Source coding such mid and side channel signals requires fewer bits than source coding the corresponding left and right channel signals.
Moreover, if left and right channel signals are strongly correlated, the audio signal is perceived to be coming from a direction between left and right channels. Since left and right channel signals are correlated, the mid channel signal has more energy than the side channel signal and the quantization error of the mid channel signal usually dominates over the quantization error from the side channel signal. After conversion back to left and right channel signals, the larger quantization error from the mid channel signal will dominate over the quantization error from the side channel signal. The quantization error from the mid channel signal will be distributed to the reconstructed left and right channels so that the quantization error is approximately the same in left and right channels. The quantization error will not be exactly the same, because the side channel signal usually has a small nonzero quantization error, and the contribution of the left and right channels to mid and side channel signals might have been selected not to be exactly equivalent. Still, the quantization error after M/S coding will correlate in the reconstructed left and right channel signals. Thus, the quantization error will be perceived to be coming from the same direction as the audio signal. Therefore, the audio signal masks the quantization error better with M/S coding than with a separate coding of left and right channel signals.
L/R coding may be selected when the left and right channel signals are uncorrelated. L/R encoding of uncorrelated left and right channel signals may require less bits that M/S coding. Furthermore, using M/S encoding with uncorrelated left and right channel signals may lead to situations in which the quantization error will be perceived as coming from a different direction than the audio signal in a stereo image. This may make the resulting quantization noise more audible than a quantization noise that is perceived to come from the same direction as the audio signal as in the case of L/R coding.