In most methods among existing stereo encoding methods, left and right sound channel signals are downmixed to obtain a mono signal, and sound field information of left and right sound channels is transmitted as a sideband signal. The sound field information of the left and right sound channels generally includes an energy ratio of the left sound channel to the right sound channel, a phase difference between the left and right sound channels, a cross-correlation parameter of the left and right sound channels, and a parameter of a phase difference between a first sound channel or a second sound channel and a downmixed signal. In the existing methods, the parameters are used as side information, and are coded and sent to a decoding end, to restore a stereo signal.
In these kinds of methods, downmixing methods and extraction and synthesis of the sound field information of the left and right sound channels are all core technologies, and currently there are many research results in the industry. Existing stereo downmixing methods may be classified into two kinds, namely, passive downmixing and active downmixing.
A passive downmixing algorithm is simple and has a short time delay, and calculation is generally performed by using 0.5 as a downmixing factor:m(n)=0.5·(x1(n)+x2(n))
where x1(n) and x2(n) represent a left sound channel signal and a right sound channel signal respectively, and m(n) represents a downmixed signal.
When left and right sound channels have completely opposite phases and have a same amplitude, the downmixed signal is 0, and a decoding end is incapable of restoring the left and right sound channels. Even if the phases are not completely opposite to each other, energy missing of the downmixed signal may still be caused.
In order to resolve the problem of the energy missing of the downmixed signal caused by the passive algorithm, in an active downmixing algorithm, a time-frequency transform is performed on left and right signals first, and an amplitude and/or a phase of the signal is adjusted in a frequency domain, so as to keep energy of the downmixed signal as much as possible. The following is an example of phase adjustment.
First, a time-frequency transform is performed on a left signal and a right signal to obtain X1(k) and X2(k), and a phase difference in each sub-band is calculated in a frequency domain; then phase rotation is performed on the right signal according to the phase difference, to obtain a signal X2r(k) after the phase rotation. After the rotation, a phase of the right sound channel signal keeps consistent with a phase of the left signal. Then, X2r(k) and X1(k) with the adjusted phases are added and then multiplied by 0.5 to obtain a downmixed signal of the frequency domain according to the following formula: M(k)=0.5·(X2r(k)+X1(k)); finally, a downmixed signal of a time domain is obtained through a time-frequency inverse transform. This kind of method can resolve the problem of energy missing caused by opposite phases of left and right sound channel signals.
However, the existing downmixing method has a problem that downmixing performance of a stereo signal is affected by factors that phases of left and right sound channels are opposite and undergo transition frequently and a phase difference between the left and right sound channels changes quickly, thereby lowering subjective quality of stereo encoding and decoding.