A stereophonic audio signal is made up from a plurality of audio signals (or audio “channels”). For example a stereophonic audio signal may be recorded by using a plurality of microphones at different locations whereby each microphone provides a separate audio signal which is captured at its respective location. The individual audio signals can be combined to provide a more complete sounding, stereophonic audio signal. Humans often perceive stereophonic audio signals to be at a higher audio quality than each of the individual audio signals which make up the stereophonic audio signal. Stereophonic audio signals can be output from a plurality of speakers to provide a stereophonic audio signal to a user.
In one example, a stereophonic audio signal comprises a “left” signal (L) and a “right” signal (R). The terms “left” and “right” used herein do not necessarily indicate relative positions of the signals. Such a stereophonic audio signal may be output from two speakers which are located at different positions in order to provide a stereophonic experience to a user listening to the outputted stereophonic audio signal. It may be desired to transmit or store the stereophonic audio signal, and in order to do this the stereophonic audio signal may be encoded (e.g. in the digital domain). The two signals, L and R, may be encoded separately using respective mono encoders. This provides a simple, efficient method for encoding the audio signals. Separately encoding the left and right channels with two mono codecs in this way is known as “dual-mono coding”.
When encoding the stereophonic audio signal, a first aim is to keep the audio quality of the stereophonic audio signal as high as possible. That is when the encoded stereophonic audio signal is subsequently decoded it should be as close as possible to the original stereophonic audio signal. However, a second aim is for the encoded stereophonic audio signal to be represented using a small amount of data (i.e. it is desirable to have high coding efficiency). High coding efficiency is desirable for storing and transmitting the encoded stereophonic audio signal. The first and second aims may be conflicting.
A drawback of the dual-mono coding technique described above is that when the left and right channels are correlated, as is often the case, the encoded stereophonic audio signal is not efficiently coded. In other words, the dual-mono coding technique does not exploit the redundancy between the L and R channels and has thus suboptimal coding efficiency. Moreover, the two mono codecs may introduce quantization error components with a correlation that differs from the correlation between the L and R audio signal components. As a result those error components will appear separately from the signal in the spatial stereo image and thereby become more noticeable to a human listener. This effect is known as binaural unmasking. As described in “Sum-Difference Stereo Transform Coding” J. D. Johnston, A. J. Ferreira, IEEE International Conference on Acoustics, Speech and Signal Processing, March 1992, binaural unmasking relates to the perceptual system in human listeners being able to isolate noise spatially, and thereby unmask a noise component that is uncorrelated from a signal component that is correlated in two channels of a stereophonic audio signal (or unmask a noise component that, is correlated from a signal component that is uncorrelated in two channels of a stereophonic audio signal). In other words, if the correlation of the error components between the L and R signals does not match the correlation of the actual L and R audio signals then the errors are perceptually greater to human listeners.
An alternative coding technique to the dual-mono coding technique described above is a Mid/Side coding technique (described in “Sum-Difference Stereo Transform Coding” J. D. Johnston, A. J. Ferreira, IEEE International Conference on Acoustics, Speech and Signal Processing, March 1992), in which the left and right channels are converted to mid (M) and side (S) channels according to the formulas:M=½(L+R) andS=½(L−R).
The signals on the mid and side channels are coded separately by mono codecs. It will be appreciated that the mid signal, M, represents the average of the left and right signals and the side signal, S, represents half of the difference between the left and right signals. The M and S signals can be encoded separately, e.g. for storage or transmission. In order to recover the stereophonic audio signal, a decoder can transform the signals on the M and S channels back to the left and right channel representations. For example, if a decoder receives a signal M′ on the mid channel and a signal S′ on the side channel, the signals on the left and right channels (L′ and R′) can be determined using the formulas:L′=M′+S′ andR′=M′−S′. 
When compared with the dual-mono coding technique described above, the M/S coding technique improves coding efficiency and audio quality when the left and right signals are very similar to each other. This is because in this case, the side signal, S, will take a small value which can be represented using a small amount of data (e.g. a small number of bits) as compared to the amount of data required to represent either the left or right signal.
However, the M/S coding technique may not provide improved coding efficiency and audio quality when the L and R signals are not very similar.